Question: Why is there no representation for markers like GFP, mCherry, tdTomato, Cre or Oxt in the Loupe Browser for my data?
Answer: In general there are few issues that might result in not finding a specific marker expressed in single cell data and one of them could stem from workflow or sample sorting challenges, or from using incorrect sequences associated with these markers. We have discussed the reasons why our pipeline reports 0 UMI counts for some genes in this Knowledge Base article. The following investigations can assist in identifying the underlying reasons behind such issues.
Step 1:
Examine the filtered feature barcode matrix file to confirm for the presence of particular gene(s) of interest (such as GFP, mCherry, tdTomato etc) and their UMI counts(>0). You can use the steps in this KB article to convert the matrix file into a text format and extract the corresponding UMI values for these genes.
Step 2:
Examine the GTF coordinates for all markers especially if multiple genes were custom added to the reference.
If you notice any instances of overlapping GTF coordinates (an example is shown below which is an overlapping phenotype) where the gene coordinates are overlapping to any other genes in the pre-built reference, then it is recommended to create a custom reference by eliminating the associated overlapping coordinates and using one custom gene at a time.
XXX unknown exon 1 1523 . + . geneid "XXX"; transcriptid "XXX"; genename "XXX"; genebiotype "proteincoding";
YYY unknown exon 1 1431 . + . geneid "YYY"; transcriptid "YYY"; genename "YYY"; genebiotype "proteincoding";
ZZZ unknown exon 1 555 . + . geneid "ZZZ"; transcriptid "ZZZ"; genename "ZZZ"; genebiotype "protein_coding";
Step 3:
Explore the BAM file (possorted_bam.bam if count
pipeline was used, or sample_alignments.bam if multi
pipeline was used). Extracting the reads associated to these markers from the BAM file can share some insights into the reasons behind low expression of these genes. Below are some proposed commands accompanied by comments using tdTomato gene as an example that you can try out.
#step-1 dissecting tdTomato genes aligned reads from BAM file. you can repeat these steps for mCherry as well.
samtools view possorted_genome_bam.bam | grep -E 'tdTomato' > tdt.sam
#step-2 reads with valid barcode
grep 'CB:Z' tdt.sam | wc -l
#step-3 from above checking for reads aligned to antisense reads that are not used in final results
grep 'CB:Z' tdt.sam | grep 'UB:Z' | grep -v 'AN' | wc -l
grep 'CB:Z' tdt.sam | grep 'UB:Z' | grep -v 'AN' > tdt_filtreads.sam
#step-4 extract the barcodes from tdt_filtreads.sam. sometimes CB:Z:<barcodes> are in different columns depending on the number of fields(NF) in the file.
# Below is an example, where the barcodes are from located in 25th & 26th column for two NFs:
less tdt_filtreads.sam | awk '{ if (NF == 28) print $25; else print $26}' > tdtbarcodes.txt
sort tdtbarcodes.txt | uniq -c | wc -l
If the results from the above tests indicate a low number of overall reads aligned to a specific marker, this could stem from one of the below scenarios:
(a) Workflow challenges -- It is possible that the sequencing saturation for this experiment could be low (verify this from the web summary HTML report). Even if the marker is present in only a small number of cells with very low expression levels it should still have been detected. Enhancing sequencing saturation may help with this issue. Additionally it is advisable to verify whether the sample was sorted for specific markers; if the sorting did not work well, consider reaching out to Technical Support at support@10xgenomics.com.
(b) Reference used for analysis -- It's good to confirm if the specific marker sequence used in the reference accurately represents the sequence being expressed in your sample. Inaccurate sequences can lead to low alignment rates. In rare cases, one possibility is that these transcripts could have ended up aligning to unexpected genome regions. To investigate this, consider aligning the R2 FASTQ reads to just the specific marker sequence (excluding other sequences in the reference being used) to gain a better understanding of how many reads actually originates from these gene transcripts.
Related KB article: Why do I have zero UMI counts for my marker gene
Products: Single Cell Gene Expression
Last Updated: Aug 2023