Question: For 3’ GEX with v3 chemistry, 10x Genomics states there are ~3.6 million unique sequences. However, the 3M-february-2018.txt
whitelist in Cell Ranger has ~6.8 million sequences. Why is there a discrepancy?
Answer: Each 3’ Single Cell Gene Expression Solution's Gel Bead contains two variants of the embedded oligos, one set of ~3 million barcodes for capturing gene expression data and a slightly different set of ~3 million barcodes for Feature Barcoding technology (including Cell Surface Protein with TotalSeqB and TotalSeqC, CRISPR, CellPlex). The complete set of unique variants are included in the v3 barcode whitelist, however because there is some overlap in the two sets, only ~6.8M and not ~7.4M barcodes are present.
The whitelist comes with Cell Ranger (3.0 or higher). For Cell Ranger v4.0 and above, the whitelist file is located at,
cellranger-x.y.z/lib/python/cellranger/barcodes/3M-february-2018.txt.gz
For older releases of Cell Ranger, the whitelist file is located as follows:
cellranger-x.y.z/cellranger-cs/x.y.z/lib/python/cellranger/barcodes/3M-february-2018.txt.gz
(where x.y.z is the Cell Ranger version).
For example, a Gel Bead has the following:
- GEX Capture Sequence with: TruSeq + Barcode_Variant1
- Example of Barcode_Variant1: AAACCCAAGAAACACT
- Feature Barcoding Capture Sequence with: Nextera + Barcode_Variant2
- Example of Barcode_Variant2: AAACCCATCAAACACT
You can map the Feature Barcoding variant to the gene expression variant and vice versa using a lookup table that is provided with Cell Ranger installation. For Cell Ranger versions 4 and above, the table is located here:
cellranger-x.y.z/lib/python/cellranger/barcodes/translation/3M-february-2018.txt.gz
For older releases of Cell Ranger, the table is located here:
cellranger-x.y.z/cellranger-cs/x.y.z/lib/python/cellranger/barcodes/translation/3M-february-2018.txt.gz
In the output count matrices, the barcode sequence shown to represent all the data for a Gel Bead is the gene expression barcode variant. Similarly, in the output BAM file, reads from the Feature Barcoding data will show the gene expression variant in the “CB” tag to make it easier to find reads from one GEM. The “CR” tag is an unaltered sequence.
Note The barcode translation described in this article applies to Cell Surface Protein with TotalSeqB and TotalSeqC, CRISPR, and CellPlex. TotalSeqA (unsupported) does not require barcode translation.