Question: What is sequencing saturation?
Answer: Sequencing saturation is a measure of the fraction of library complexity that was sequenced in a given experiment. The inverse of the sequencing saturation can be interpreted as roughly the number of new transcripts you expect to find with one new read. If sequencing saturation is at 50%, it means that every 2 new reads will result in 1 new UMI count (unique transcript) detected. In contrast, 90% sequencing saturation means that 10 new reads are necessary to obtain one new UMI count.
Sequencing saturation is dependent on the library complexity and sequencing depth. Different cell types will have different amounts of RNA and thus will differ in the total number of different transcripts in the final library (also known as library complexity). The figure below illustrates the median number of genes recovered from different cell types. As sequencing depth increases, more genes are detected, but this reaches saturation at different sequencing depths depending on cell type.
Sequencing depth also affects sequencing saturation; generally the more sequencing reads, the more additional unique transcripts you can detect. However, this is limited by the library complexity.
Figure 1. Plot of the median number of genes detected per cell as a function of sequencing depth. Primary cell types such as PBMC and embryonic mouse neurons have lower RNA content and thus require less reads per cell. Cell lines such as HEK293T and 3T3 cells express high levels of RNA and additional sequencing may detect additional genes per cell.
In earlier versions of the Cell Ranger pipeline, the sequencing saturation metric was referred to as cDNA PCR duplication rates. The previous term may be more intuitive to some people. We see a wide range of cDNA PCR duplication rates. These represent reads that map to the same reference gene, have the same cell barcode, AND the same transcript UMI.
Products: Single Cell 3', VDJ