Question: How to find multi-mapped reads from the possorted_bam.bam
file? Why are some of the genes which are already on my reference does not express?
Answer:
Cell Ranger has specific criteria on what reads are considered for UMI counting (described here) and one of them is, it does not include reads that are assigned to multiple genes. At times there are high-level in multi-mapped genes for a specific dataset and in the Loupe Browser, when you lookup for a specific gene of interest, they might have low UMI counts, chances are that you should be looking if this gene has multi-mapped reads in the 10x BAM file (possorted_bam.bam
) generated by Cell Ranger pipeline.
A read that is multi-mapped would have a mapping quality score (i.e. MAPQ) less than 255 and a bit-wise flag more than 255.
Eg:
samtools view possorted_bam.bam | awk '{ if ($2 > 255 && $5 < 255) print $_}' > multireads.sam
The 2nd column represents the bit-wise flag and the 5th column represents the MAPQ scores.
Please note that multi-mapping is not exactly the same as "reads that are assigned to multiple genes". The latter can be deduced from the GN tag.
Products: Single Cell Gene Expression