Question: For 3’ GEX with v3 chemistry, 10x Genomics states there are ~3.6 million unique sequences. However, the 3M-february-2018.txt
inclusion list in Cell Ranger has ~6.8 million sequences. Why is there a discrepancy?
Answer: Each 3’v3 Single Cell Gene Expression Solution's Gel Bead contains two variants of the embedded oligos, one set of ~3 million barcodes for capturing gene expression data and a slightly different set of ~3 million barcodes for Feature Barcoding technology (including Cell Surface Protein with TotalSeqB and TotalSeqC, CRISPR, CellPlex). The complete set of unique variants are included in the v3 barcode inclusion list, however because there is some overlap in the two sets, only ~6.8M and not ~7.4M barcodes are present.
The inclusion list comes with Cell Ranger (3.0 or higher). For Cell Ranger v4.0 and above, the inclusion list file is located at,
cellranger-x.y.z/lib/python/cellranger/barcodes/3M-february-2018.txt.gz
For older releases of Cell Ranger, the inclusion list file is located as follows:
cellranger-x.y.z/cellranger-cs/x.y.z/lib/python/cellranger/barcodes/3M-february-2018.txt.gz
(where x.y.z is the Cell Ranger version).
For example, a Gel Bead has the following:
- GEX Capture Sequence with: TruSeq + Barcode_Variant1
- Example of Barcode_Variant1: AAACCCAAGAAACACT
- Feature Barcoding Capture Sequence with: Nextera + Barcode_Variant2
- Example of Barcode_Variant2: AAACCCATCAAACACT
You can map the Feature Barcoding variant to the gene expression variant and vice versa using a lookup table that is provided with Cell Ranger installation. For Cell Ranger versions 4 and above, the table is located here:
cellranger-x.y.z/lib/python/cellranger/barcodes/translation/3M-february-2018.txt.gz
For older releases of Cell Ranger, the table is located here:
cellranger-x.y.z/cellranger-cs/x.y.z/lib/python/cellranger/barcodes/translation/3M-february-2018.txt.gz
In the output count matrices, the barcode sequence shown to represent all the data for a Gel Bead is the gene expression barcode variant. Similarly, in the output BAM file, reads from the Feature Barcoding data will show the gene expression variant in the “CB” tag to make it easier to find reads from one GEM. The “CR” tag is an unaltered sequence.
Note
- The barcode translation described in this article applies to Cell Surface Protein with TotalSeqB and TotalSeqC, CRISPR, and CellPlex. TotalSeqA (unsupported) does not require barcode translation.
- Starting from Cell Ranger v9, we now have separate inclusion list depending on the capture strategies used. Each inclusion list contains 3,686,400 sequences (not 7.4M). See this article for more details.
Products: Universal 3' Gene Expression, Universal 5' Gene Expression
Last Updated: Jan 2025