Question: How can I calculate TPM or FPKM units instead of counts for my 10x Genomics Gene Expression data?
Answer: In 10x Genomics Gene Expression assays, each transcript is tagged with a sequence serving as a Unique Molecular Identifier (UMI). These UMIs enable accurate quantitation of gene expression levels because we can tell which reads are generated from the same mRNA molecule. Therefore, Cell Ranger and Space Ranger perform UMI counting (not read counting) for measuring gene expression level, and all secondary analysis steps are performed based on UMI counts.
In traditional RNA-seq data, complete transcripts are fragmentedfollowed by cDNA synthesis, end repair, and adapter ligation. In this workflow, the probability of sampling a fragment from a long transcript is higher than from a short one. Therefore, it makes sense to normalize read counts by transcript length (e.g., TPM, RPKM, FPKM). However, in 10x gene expression assays, this gene-length bias does not exist. Therefore, we do not advise on normalizing UMI counts by gene length.