Question: In scATAC-seq, how are the z-scores for transcription factor (TF) motif enrichment calculated?
Answer: It is possible to calculate the TF z-score from the TF-Barcode matrix. The following slide summarizes the interpretation of an entry in this matrix:
Suppose we are interested in calculating the z-score for TF #2:
- Barcode #3 has three cut sites that match TF #2. Suppose barcode #3 has a total of 60 cut sites (if we sum up column 3 of the matrix). That means 3/60, or 0.05, is the proportion of cut sites in barcode #3 that match TF #2.
- Similar to above, we calculate the proportion of cut sites that match TF #2 for all other barcodes. If there are 1000 barcodes in the library, we will end up with 1000 proportions.
- Calculate the median, and the median absolute deviation (https://en.wikipedia.org/wiki/Median_absolute_deviation) of these 1000 proportions.
- Compute z-score using median and MAD, instead of mean and standard deviation.