Question: For 3’ GEX with v3 chemistry, 10x Genomics states there are ~3.7 million unique sequences. However, the ‘3M-february-2018.txt’ whitelist in Cell Ranger v3 has ~6.8 million sequences. Why is there a discrepancy?
Answer: Each 3’ Single Cell Gene Expression Solution's Gel Bead contains two variants of the embedded oligos, one for capturing gene expression data, and another for Feature Barcoding technology. The complete set of unique variants are included in the v3 barcode whitelist, though because there is some overlap in the two sets, only ~6.8M and not ~7.4M barcodes are present.
The whitelist comes with Cell Ranger (3.0 or higher) and is located as shown below (where x.y.z is the Cell Ranger version)
For example a Gel Bead has following
- GEX Capture Sequence with: TruSeq + Barcode_Variant1
- Example of Barcode_Variant1: AAACCCAAGAAACACT
- Feature Barcoding Capture Sequence with: Nextera + Barcode_Variant2
- Example of Barcode_Variant2: AAACCCATCAAACACT
You can map the Feature Barcoding variant to the gene expression variant and vice versa using a lookup table that is provided with Cell Ranger installation. The table is located here:
In the output count matrices, the barcode sequence shown to represent all the data for a Gel Bead is the gene expression barcode variant. Similarly, in the output BAM file, reads from the Feature Barcoding data will show the gene expression variant in the “CB” tag to make it easier to find reads from one GEM. The “CR” tag is unaltered sequence.