Question: How does cellranger aggr
normalize for sequencing depth among multiple gene expression libraries?
Answer: When aggregating data from different libraries, cellranger aggr
normalizes for effective sequencing depth by subsampling the reads.
By default, cellranger aggr
computes the subsampling rate for each library based on the mean number of filtered reads (identified as in cells) mapped confidently to transcriptome per cell for each library. Libraries other than the one with lowest values are downsampled. In the example below, all libraries are normalized to the effective depth of Sample 4:
Sample | Filtered Reads Confidently Mapped to Transcriptome per Cell |
Fraction of Reads Kept (subsampling rate) |
Sample1 | 5825 | 79.5 |
Sample2 | 4890 | 94.7 |
Sample3 | 7134 | 64.9 |
Sample4 | 4630 | 100 |
Sample5 | 12106 | 38.2 |
Smallest Value | 4630 | 100 |
The subsampling is done from all reads with valid barcodes and valid UMIs.
It is also possible to perform aggregation without normalizing data. For more details, please see the "Depth Normalization" section of this page.