Question: How are barcodes classified as cell-associated?
Answer: The answer depends on which version of Cell Ranger you are using. In Cell Ranger v3.x, a two-step process is used for cell calling:
- Step 1: Identify first mode of high RNA content cells.
- Select cells based on threshold based on total UMI counts per cell. The threshold is calculated as it was done in Cell Ranger v2.x (See below).
- Step 2: Find additional cells based on their RNA profile.
- Select a set of low UMI barcodes to represent background GEMs.
- Generate the ambient RNA profile from the selected barcodes.
- Take top 20000 remaining barcode
- Discard barcodes with total UMI count < 500 or < median(initial_cell_umis) * 1%
- Compare the RNA profile of selected barcodes with ambient profile.
- Call barcodes with RNA profile significantly different from ambient as cells.
Please see the "Calling Cell Barcodes" section in the algorithms overview page.
In Cell Ranger v2.x and earlier, the following algorithm is used to determine the cutoff for calling a barcode cell-associated:
- Rank-sort the total UMI counts across all of the detected barcodes that passed the 16bp barcode whitelist filtering criteria.
- Determine the 99th percentile of the UMI counts among the top N barcodes where N is the provided 'expected recovered cell' parameter passed to the pipeline (--expect-cells=3000 by default).
- All barcodes with total UMI counts greater than or equal to 10% of the 99th percentile value are classified as cells.
Intuitively, the idea is that barcodes for cells should have significantly more transcript counts associated with them than the background barcodes.
This can be visualized in a UMI vs Barcode plot. In the example plot below, UMI counts are on the y-axis ranging from 0 to 10,000 in log scale. Barcodes are on the x-axis, ranked from 0 to 100,000 also in log scale. Cellular barcodes as determined by above algorithm are in green while background barcodes are in gray.
A steep slope in the barcode-UMI count rank plot suggests a clear separation of cellular barcodes from background partitions.
For more information on the 2.x algorithm, please see the "Calling Cell Barcodes" section in the algorithms overview page.