Question: What is sequencing saturation?
Answer: Sequencing saturation is a measure of the fraction of library complexity that was sequenced in a given experiment. The inverse of one minus the sequencing saturation can be interpreted as the number of additional reads it would take to detect a new transcript.
Sequencing saturation is dependent on the library complexity and sequencing depth. Different cell types will have different amounts of RNA and thus will differ in the total number of different transcripts in the final library (also known as library complexity). The figure below illustrates the median number of genes recovered from different cell types. As sequencing depth increases, more genes are detected, but this reaches saturation at different sequencing depths depending on cell type.
Sequencing depth also affects sequencing saturation; generally, the more sequencing reads, the more additional unique transcripts you can detect. However, this is limited by the library complexity.
Figure 1. Plot of the median number of genes detected per cell as a function of sequencing depth for Single Cell 3' v2 libraries. Primary cell types such as PBMC and embryonic mouse neurons have lower RNA content and thus require less reads per cell. Cell lines such as HEK293T and 3T3 cells express high levels of RNA and additional sequencing may detect additional genes per cell.
Note: In earlier versions of the Cell Ranger pipeline, the sequencing saturation metric was referred to as cDNA PCR duplication rates. The previous term may be more intuitive to some people. We see a wide range of cDNA PCR duplication rates. These represent reads that map to the same reference gene, have the same cell barcode, AND the same transcript UMI.
Related resources
Products: Single Cell Gene Expression, Single Cell Immune Profiling