Question: I have a dual-indexed library. However, I sequenced 8bp of the index reads (i7 and i5). How can I demultiplex my data?
Answer: When planning sequencing experiments, it is best to follow the official sequencing read length requirements specified for each product, e.g. 3' Gene Expression Sequencing Requirements. However, sometimes there are circumstances out of our control and sequencing read lengths end up shorter than the recommendations.
It is possible to demultiplex the library when only the first 8bp (instead of the full 10bp) of the indices were sequenced. This applies to Single Cell 3' Gene Expression, Single Cell 5' Immune Profiling, and Visium Spatial Gene Expression.
To demultiplex the data, you will need to specify the first 8 bases of each sample index in the sample sheet. The sample indices can be found on the support site:
- 3' Gene Expression (and Feature Barcode) sample indices
- 5' Immune Profiling (and Feature Barcode) sample indices
- Visium sample indices (Fresh Frozen)
- Visium sample indices (FFPE)
- Fixed RNA Profiling sample indices
Suppose you used the index SI-TT-A1, which consist of sequences as below:
"SI-TT-A1": { "index": "GTAACATGCG", "index2_workflow_a": "AGTGTTACCT", "index2_workflow_b": "AGGTAACACT"
There are two entries for index 2. The index2_workflow_b entry is the reverse complement of the index2_workflow_a entry. The sequence you use will depend on the sequencer that generated the BCL files.
(forward strand workflow) |
(reverse complement workflow) |
NovaSeq 6000 with v1.0 reagent kits MiniSeq with rapid reagent kits MiSeq HiSeq 2500 HiSeq 2000 |
NovaSeq 6000 with v1.5 reagent kits MiniSeq with standard kits iSeq 100 NextSeq HiSeq X HiSeq 3000 HiSeq 4000 |
Running mkfastq:
You can still use mkfastq
pipeline and simple CSV samplesheet. However, you cannot use the index name (SI-TT-A1) in the sample sheet and the 8bp index sequence needs to be specified.
Example 1: If your library was sequenced on a HiSeq 2500, and you sequenced 8bp of the index reads, your samplesheet could look like this:
Lane,Sample,Index,Index2 *,My_sample,GTAACATG,AGTGTTAC
Example 2: If you had used a NextSeq for the sequencing, your samplesheet could look like this:
Lane,Sample,Index,Index2 *,My_sample,GTAACATG,GTAACACT
Running bcl2fastq:
Using bcl2fastq directly for demultiplexing is also an option. Please see your support site for instructions:
- bcl2fastq for 3' Gene Expression (and Feature Barcode) libraries
- bcl2fastq for 5' Immune Profiling (and Feature Barcode) libraries
- bcl2fastq for Visium libraries
Note: The information above does not apply to any ATAC libraries. For single assay ATAC or multiome ATAC sequencing, it is mandatory to have 8 bp on i7 reads for demultiplexing, and 16bp on i5 reads for the cellular barcodes.
Products: Single Cell Gene Expression, Single Cell Immune Profiling, Visium Spatial Gene Expression
Related Articles:
I sequenced my dual indexed library as 8bp single index. Can I rescue my data?
How to demultiplex a single indexed library on a dual indexed flow cell?
How to use masking parameter while demultiplexing 10x sequencing data ?