Question: I am running data with Cell Ranger 5.0. It emits an error:
"ERROR: We detected a mixture of different R1 lengths ([26-28]), which breaks assumptions in how UMIs are tabulated and corrected. To process these data, you will need to truncate to the shortest observed R1 length by providing 26 to the --r1-length argument".
Can you tell me why it is required in Cell Ranger 5.0.0 that we have to set the R1 length to be the same across all reads?
Answer: In Cell Ranger 5.0, we have implemented a new check where it looks for the same read length across all R1 reads. The reason for this check is that we want to protect against double-counting variable-length UMIs as different UMIs. For example, with 3' gene expression data, if you have R1 reads with both 26 bp and 28 bp long for some reason, this means that some reads will have UMIs that are 10 bp long and some will have the expected 12 bp UMIs. If there are 2 reads, Read A and Read B from the same molecule:
- Read A has an R1 that is 26 bp long and has a UMI that is 10 bp long, for example, AACCGGTTAA
- Read B has an R1 that is 28 bp long and has a UMI that is 12 bp long, for example, AACCGGTTAACC
Because the UMIs are different, due to different R1 lengths, Cell Ranger will treat them as separate UMI, which can lead to double-counting. To prevent such a kind of double-counting in this special case, we require that UMIs must be the same length.
In most cases, demultiplexing using the cellranger mkfastq pipeline will not lead to variable read lengths. Unequal read lengths may be obtained if you have trimmed the reads after running the "cellranger mkfastq" pipeline; or, if you use Illumina's bcl2fastq software tool directly to generate the FASTQ files and use the adapter trimming option; or, if you have sequenced the same library on a different flow cell and the sequencing instrument run parameters were set differently for each flow cell.
If you encounter this error, please update Cell Ranger to the latest version if possible. In versions, 5.0.1 and later, Cell Ranger does not require all R1 to be of the same length; it only requires all R1 to cover the UMI sequences.
If for some reason, you need to continue using Cell Ranger 5.0.0, you can trim the R1 reads in the Cell Ranger pipeline by using the following options:
- For 3' chemistry, add the "--r1-length" option to your "cellranger count" command
- For 5' chemistry, add the "r1-length" option to your "cellranger multi" command