Question: How are the UMI counts normalized before PCA and differential expression?
Answer: Please note that the UMI counts output in the gene barcode matrices are not normalized. Loupe Cell Browser also shows raw UMI counts.
However, the UMI counts are normalized before secondary analysis as follows:
PCA
Before PCA, the following is done to normalize UMI counts:
- The total UMI counts for each cell-associated barcode are normalized towards the grand median UMI counts per cell by a scaling factor (computed as median_UMI_counts_per_barcode / UMI_counts_per_barcode).
- The matrix is log-transformed then mean-centered and scaled per-gene such that the mean is 0 and the standard deviation is 1.
The PCA-transformed values are used for clustering and t-SNE.
Differential Expression
In differential expression analysis, normalization is implicit in that the per-cell library-size parameter is incorporated as part of the test.
For more details on the algorithms for secondary analysis please see our Algorithms Overview.