Question: How does cellranger count calculate multiplets?
Answer: For an experiment comprised only of cells from one organism, Cell Ranger cannot identify if an individual gelbead-in-emulsion (GEM) contained more than a single cell.
Therefore, Cell Ranger supports multi-genome experiments, also known as "barnyard" experiments, where cells from two different organisms can be mixed and analyzed together. This allows a subset of multiplets to be detected on the basis that some reads with a given cell barcode will align to one reference genome, and a different set of reads with the same cell barcode will align to the other reference genome. In this way, individual barcodes can be assigned to either or both species. Typically this is done with a 50:50 mixture of mouse and human cells. The further away from a 50:50 mixture, the less accurate the barcode classification is.
The algorithm for classifying a barcode is as follows:
- Take the 10th percentile of all barcodes where (mouse>human UMI counts). That becomes the threshold for calling a barcode a mouse cell.
- Take the 10th percentile of all barcodes where (human>mouse UMI counts). That becomes the threshold for calling a barcode a human cell.
- Any barcode where both mouse and human counts exceed their thresholds is classified as a multiplet.
Note that this algorithm underestimates the true multiplet rate by 50%, because GEMs that contained multiple cells from the same organism will not be detected by this method. That is, we can detect a M:H GEM and a H:M GEM, but not H:H or M:M GEMs. The multiplet rate reported in the web_summary.html file makes a correction for this.
The actual classifications from the experiment can be found in the gem_classification.csv file.