Question: How are barcodes classified as cell-associated?
Answer: The following algorithm is used to determine the cutoff for calling a barcode cell-associated:
- Rank-sort the total UMI counts across all of the detected barcodes that passed the 16bp barcode whitelist filtering criteria.
- Determine the 99th percentile of the UMI counts among the top N barcodes where N is the provided 'expected recovered cell' parameter passed to the pipeline (--expect-cells=3000 by default).
- All barcodes with total UMI counts greater than or equal to 10% of the 99th percentile value are classified as cells.
Intuitively, the idea is that barcodes for cells should have significantly more transcript counts associated with them than the background barcodes.
This can be visualized in a UMI vs Barcode plot. In the example plot below, UMI counts are on the y-axis ranging from 0 to 10,000 in log scale. Barcodes are on the x-axis, ranked from 0 to 100,000 also in log scale. Cellular barcodes as determined by above algorithm are in green while background barcodes are in gray.
A steep slope in the barcode-UMI count rank plot suggests a clear separation of cellular barcodes from background partitions.
For more information, please see the "Calling Cell Barcodes" section in the algorithms overview page.