Question: I am running data with Cell Ranger 5.0. It produces the following error message:
"ERROR: We detected a mixture of different R1 lengths ([26-28]), which breaks assumptions in how UMIs are tabulated and corrected. To process these data, you will need to truncate to the shortest observed R1 length by providing 26 to the --r1-length argument".
Can you tell me why it is required in Cell Ranger 5.0.0 that we have to set the R1 length to be the same across all reads?
Answer: In Cell Ranger 5.0, we have implemented a new check where it looks for the same read length across all R1 reads. The reason for this check is that we want to protect against double-counting variable-length UMIs as different UMIs. For example, with 3' gene expression data, if you have R1 reads with both 26 bp and 28 bp lengths for some reason, this means that some reads will have UMIs that are 10 bp long and some will have the expected 12 bp UMIs. If there are 2 reads, Read A and Read B from the same molecule:
- Read A has a R1 that is 26 bp long and has a UMI that is 10 bp long, for example, AACCGGTTAA
- Read B has a R1 that is 28 bp long and has a UMI that is 12 bp long, for example, AACCGGTTAACC
Because the UMIs are different, due to different R1 lengths, Cell Ranger will treat them as separate UMI, which can lead to double-counting. To prevent such double-counting in this special case, we require that UMIs must be the same length.
In most cases, demultiplexing using the
cellranger mkfastq pipeline will not lead to variable read lengths. The following scenarios may produce unequal read lengths:
- If you have trimmed the reads after running the
- If you use Illumina's
bcl2fastqsoftware tool directly to generate the FASTQ files and use the adapter trimming option
- If you have sequenced the same library on a different flow cell and the sequencing instrument run parameters were set differently for each flow cell
If you encounter this error, please update Cell Ranger to the latest version if possible. In versions, 5.0.1 and later, Cell Ranger does not require all R1 to be of the same length; it only requires all R1 to cover the UMI sequences.
If for some reason, you need to continue using Cell Ranger 5.0.0, you can trim the R1 reads in the Cell Ranger pipeline by using the following options:
- For 3' chemistry, add the
--r1-lengthoption to your
- For 5' chemistry, add the
r1-lengthoption to your
cellranger multiconfig CSV file