Question: How much sequencing saturation should I aim for?
Answer: Sequencing saturation is a measure of the fraction of library complexity that was captured during sequencing. Depending on the goals of the experiment, you may or may not need high sequencing saturation. If you are aiming to cluster cells into populations for downstream analysis, it is not necessary to detect every unique transcript (UMI count) in each cell and a lower sequencing saturation may be sufficient. However, if you are trying to recover very lowly expressed transcripts, higher sequencing saturation may be required to detect these transcripts.
Primary cells (e.g. PBMCs) generally have lower RNA content and may require less sequencing to achieve sequencing saturation rates of >90%.
1 / (1 - sequencing saturation) can be roughly interpreted as the number of additional reads it would take to detect a new transcript. If sequencing saturation is at 50%, it means that every 2 new reads will result in 1 new UMI count (unique transcript) detected. In contrast, 90% sequencing saturation means that 10 new reads are necessary to obtain one new UMI count. If the sequencing saturation is high, additional sequencing would not recover much new information for the library.
Note: In earlier versions of the Cell Ranger pipeline, the sequencing saturation metric was referred to as cDNA PCR duplication rates. The previous term may be more intuitive to some people. We see a wide range of cDNA PCR duplication rates. These represent reads that map to the same reference gene, have the same cell barcode, AND the same transcript UMI.
Article last updated January 26, 2023
Products: Single Cell Gene Expression, Single Cell Immune Profiling