Question: How is "Sequencing Saturation" calculated?
Answer: The web_summary.html output from cellranger count
includes a metric called "Sequencing Saturation". This metric quantifies the fraction of reads originating from an already-observed UMI. More specifically, this is the fraction of confidently mapped, valid cell-barcode, valid UMI reads that are non-unique (match an existing cell-barcode, UMI, gene combination).
The formula for calculating this metric is as follows:
Sequencing Saturation = 1 - (n_deduped_reads / n_reads)
where
n_deduped_reads = Number of unique (valid cell-barcode, valid UMI, gene) combinations among confidently mapped reads.
n_reads = Total number of confidently mapped, valid cell-barcode, valid UMI reads.
Note that the numerator of the fraction is n_deduped_reads, not the non-unique reads that are mentioned in the definition. n_deduped_reads is a degree of uniqueness, not a degree of duplication/saturation. Therefore we take the complement of (n_deduped_reads / n_reads) to measure saturation.