Question: I have done all my previous analyses with the default mode of counting exonic reads only. Why should I include introns for my single cell whole transcriptome Gene Expression data analysis?
Answer: In 10x Genomics Gene Expression data, intronic mapped reads account for 20-40% of the reads. These reads have traditionally not been counted by default in Cell Ranger. Recent data has indicated that intronic reads are usable data (they arise from polyA tracts in the transcripts, and are not generated via priming from genomic DNA).
For more details on the potential mechanisms for the presence of intronic reads, please see our TechNote here.
For more details on the effect of including introns in your data, please see our TechNote here.
In summary, below figures (Figure1 and Figure2) show gains in “Reads Mapped confidently to Transcriptome” and “Median Gene per Cell” for 3’ gene expression data of multiple sample types when counting intronic reads. The data indicates that including introns in analysis affords increases in usable data and library complexity across multiple sample types and species. Samples with higher intronic fractions such as PBMCs will see higher gains as compared to samples with lower intronic reads.
Figure 1: The figure shows the "Reads Mapped Confidently to Transcriptome" metric when run with and without counting intronic reads. The data indicates an increase in the metric across multiple sample types and species.
Figure 2: The figure shows the "Median Genes per Cell" metric when run with and without counting intronic reads. The data indicates an increase in this metric across multiple sample types and species.
Gene expression data derived using the Single Cell 5’ Gene Expression assay showed a lower fraction of intronic UMIs compared to Single Cell 3’ Gene Expression data, likely due to different mechanisms of antisense and intronic read generation. As a result, the impact on complexity metrics such as mean genes per cell is lower for 5’ as compared to 3’ gene expression, seen in below figure.
Figure 3: . This figure shows the fold change in mean genes-per-cell as a function of the fraction of sense UMIs that are intronic. The data indicates a lower increase in mean genes-per-cell for 5' data as compared with 3'. Dotted lines connect paired cell-nuclei experiments performed using the same assay, cell load, and sample type.
Stay tuned for more details on the downstream effect of including intronic reads in your analysis.
Related Articles:
Should I reanalyze all my older data with intronic reads included?
I do not want to include intronic reads in my analysis. Can I do that in the upcoming Cell Ranger?