Question: How much sequencing saturation should I aim for?
Answer: Sequencing saturation is a measure of the fraction of library complexity that was captured during sequencing. Depending on the goals of the experiment, you may or may not need high sequencing saturation. If you are aiming to cluster cells into populations for downstream analysis, it is not necessary to detect every unique transcript (UMI count) in each cell and a lower sequencing saturation may be sufficient. However, if you are trying to recover very lowly expressed transcripts, higher sequencing saturation may be required to detect these transcripts.
Primary cells (e.g. PBMCs) generally have lower RNA content and may require less sequencing to achieve sequencing saturation rates of >90%.
1 / (1 - sequencing saturation) can be roughly interpreted as the number of additional reads it would take to detect a new transcript. If sequencing saturation is at 50%, it means that there is 1 UMI count (unique transcript in a cell barcode) for every 2 reads (in cell barcodes and confidently mapped to transcriptome). In contrast, 90% sequencing saturation means that there is 1 UMI count for every 10 reads. If the sequencing saturation is high, additional sequencing would not recover much new information for the library.
Note: In earlier versions of the Cell Ranger pipeline, the sequencing saturation metric was referred to as cDNA PCR duplication rates. The previous term may be more intuitive to some people. We see a wide range of cDNA PCR duplication rates. These represent reads that map to the same reference gene, have the same cell barcode, AND the same transcript UMI.
Related resources
Article last updated January 26, 2023
Products: Single Cell Gene Expression, Single Cell Immune Profiling