Question: I am confused by the process of demultiplexing by sample index and barcode. Can you walk me through how this works?
Answer: A common source of confusion is the difference between a sample index and a barcode. These terms are sometimes used interchangeably in the genomics world - for example, what Illumina's Sequencing Analysis Viewer refers to as barcodes, are what we call sample indices - but in the context of our products, they have distinct meanings.
The i7 sample indices are the added on the Illumina sequencing primers so that multiple libraries can be multiplexed on the same flow-cell or lane of a flow cell. The user can use their own custom indices, but most often they choose to use the bundled index sets, each of which consist of four oligos. Our mkfastq
pipeline, a thin wrapper around Illumina's bcl2fastq, demultiplexes based on these index sets as it converts the raw base call files (BCL) files, organized per cycle, to FASTQ files, organized by read. It can also automatically recognize the names of our sample index sets (eg SA-GA-A1) and merge the FASTQ files resulting from those four oligos.
The barcode is specific to 10x Genomics and is used to identify individual gelbeads in emulsion (GEMS), which correspond with either cells or DNA molecules for the single-cell and genome product lines, respectively. Barcodes are dealt with differently depending on which pipeline you are using, but always after demultiplexing.
During library preparation, the barcoding steps (GEM creation) are done before the indexing steps (final PCR step), and this order is reversed during the bioinformatics pipeline - we first need to demultiplex by sample index with mkfastq
to separate reads into their respective libraries before dealing with the library-specific barcodes in subsequent steps.