In the web summary generated by Cell Ranger v7.1+, I see metrics regarding genomic DNA (gDNA). Is it expected to see a significant background signal from genomic DNA in Chromium Fixed RNA Profiling (FRP) data? Should I worry about it? Is there a way to filter data that potentially come from gDNA?
No, we do not expect a significant background signal from gDNA in FRP data. As long as the standard workflow is followed, probe accessibility to gDNA should be limited, and the expectation is that sequenced ligation events contributed by open gDNA should be negligible (< 1%). Two exceptions are (1) if samples are de-crosslinked during an antigen retrieval step (> 70 °C) as well as (2) samples that are relatively low in complexity (low RNA content).
In web summaries of FRP data with good quality samples, Estimated UMIs from Genomic DNA should be negligible, usually less than 1%. Please see this page on how the metrics are calculated. In general, FRP users should not have to worry about the background signal from gDNA.
If you wish to filter background from gDNA (which is usually not necessary) before moving on to downstream analysis, below are some potential approaches.
- Exclude genes with a total UMI count lower than the result from:
the Estimated UMIs from genomic DNA per unspliced probe (shown in the web summary) multiplied by the number of exon-junction spanning (unspliced) probes for the specific gene.
Whether a probe spans the exon junction is indicated in the region column in the probe set reference csv (v1.0.1).
- Exclude probes that (1) are not spanning exon junctions AND (2) have a total UMI count lower than the Estimated UMIs from genomic DNA per unspliced probe. The UMI count per probe can be found in the molecule_info.h5 file produced by Cell Ranger starting in version 7.1 (see here). After removing the counts from these probes, the UMI counts for some genes will need to be re-calculated to produce the feature barcode matrix.
Note that the above proposed methods are merely suggestions and have not been tested by 10x Genomics, as this is usually not required for FRP data analysis.