Question: How do I solve for the mkfastq "[error] Barcode out of bounds"?
Answer: The error stems from greater index read lengths (I1:0-14 or I2:0-16) or from an index read length that is less than the minimum required length (I2:0-16). The solution differs for (I1:0-14) versus (I2:0-16).
(I1:0-14)
The error occurs when a single index library with I1 (i7 index) length 8 or 10 is sequenced in a dual index configuration. This can also happen with a dual indexed configuration if the I2 (i5 index) is longer than 10 bp. For example, when an ATAC library is sequenced on the same flow cell as a Gene Expression library. Cell Ranger mistakes the chemistry for Single Cell 3' v1 requiring a 14 base barcode on I1 (i7 index) and triggers the error.
To solve the error, there are a few options:
1. See instructions in the article, How to demultiplex a single indexed library on a dual indexed flow cell.
2. If the I2 (i5 index) was sequenced longer than 10 bp, it is necessary to directly run Illumina's bcl2fastq. See support site documentation page: Using bcl2fastq for the specific platform, i.e. Support > {Platform} > Software > Advanced > Using bcl2fastq.
(I2:0-16)
The data's I2 (i5 index) read length is less than the expected minimum 16 bases. To avoid this error, make sure sequencing meets the minimum read length requirements for a platform chemistry version, and use the correct tool for mkfastq demultiplexing (see Table below). For example, using cellranger-atac mkfastq on single cell dual index gene expression data triggers the error. Switch to using mkfastq from the correct tool, cellranger. Sequencing ATAC libraries in the single cell gene expression configuration also triggers the error. Here, the shortened I2 (i5 index) is detrimental for ATAC data analysis as it contains the 10x barcode. In this case, re-sequence the library with the correct configuration.
Table. Sequencing configurations as of April 2021, matched to analysis tool.
For each platform, sequencing configurations are given at Support > {Platform} > Sequencing > Specifications > Sequencing Requirements.
Some background
Mkfastq expects certain minimum read lengths based on the analysis tool, e.g. cellranger versus cellranger-atac, and derives the BCL data's read lengths from the sequencing configuration given in the RunInfo.xml file. Mkfastq checks whether index read lengths are equal to or greater than those expected for a chemistry before proceeding. In contrast, bcl2fastq has no such checks. When mkfastq finds fewer bases than expected, it errors and the error message indicates the minimum expected bases for the read type in question. The minimum required barcode lengths enable robust single cell algorithmic analyses.
Other tips
- To demultiplex dual indexed GEX data, be sure to use cellranger-x v4+.
- For libraries sequenced on newer sequencing tech, be sure bcl2fastq is update to date, e.g. v2.20+. The bcl2fastq version is still relevant for mkfastq, as the module uses bcl2fastq under the hood.