Question: I am studying an exogenous gene with a custom reference genome. I can see hits for reads mapping to this gene in my FASTQ/BAM files, so I know it must be expressed. However, I am not seeing any hits in my filtered gene-barcode matrix. Why do I have zero UMI counts for my marker gene?
Answer: There are three main ways in which you can "lose" expression of your exogenous gene, assuming it was actually expressed biologically in sufficient quantity to be detectable with our assay.
1. The gene may have been expressed, but due to errors in how you constructed your custom reference, or errors in the reference or annotations themselves, it may have not be counted. Please make sure that you created your custom reference following these instructions.
In particular, please double check that the seqname in the GTF is identical to the name of the matching FASTA record in the header, the feature type is 'exon' to be counted for transcriptome alignment, and the right coordinates and strand are denoted. Also ensure that directionality/orientation of the FASTA sequence is correct and matches what is specified in the GTF.
2. Your gene may have been expressed, but the reads coming from this gene might be filtered out for various reasons. Cell Ranger only includes reads for UMI counting which meet the following criteria:
- Read must have a valid cell-barcode.
- Read must have a valid UMI.
- Read must have MAPQ 255.
- Read maps to a single gene: Number of gene IDs in the semicolon-delimited list of the GX tag is 1 (maps to a single gene).
- At least 50% of the read must overlap with an exon and the read must be consistent with annotated splice junctions.
Multiple reads that have the same UMI, barcode, and gene will only count once.
To test these hypotheses you can look in the BAM file, which contains all reads. There are custom 10x barcode and alignment tags in addition to the ones added by STAR.
3. The reads mapping to your gene could have been filtered out as background noise during cell calling. To test this hypothesis you would compare the filtered vs. unfiltered gene-barcode matrices. You should have already checked the BAM file as suggested in #2 and found reads that meet the criteria. If so, you can take the corrected barcode (CB) for the reads found and check if they are in the unfiltered gene barcode matrices.
If you have questions about this please email firstname.lastname@example.org.