Question: I have a web summary with an alert "Low Fraction Valid UMIs". Sequencing was fine and showed good quality on the instrument. What happened and how can I fix this?
Answer: Sometimes software settings get turned on during demultiplexing that can affect the output FASTQ files, despite good sequencing quality shown on the instrument. This article discusses reasons why this can happen and provides solutions to fix it.
Reason for Alert:
Adapter trimming settings were turned on in the FASTQ demultiplexing software, affecting the output of the FASTQ reads.
In the R1, following the first 16 bp of 10x barcodes, the UMIs are located in the last 12 bp of the read. When adapters are trimmed, reads can get trimmed unnecessarily and data are lost.
User Guide, CG000204_ChromiumNextGEMSingleCell3'v3.1_Rev_D.pdf (p. 16)
Solutions:
1. Re-run FASTQ demultiplexing with a revised sample sheet. Do not use the adapter trim settings.
2. If it is not possible to re-run demultiplexing, run cellranger count
with --r1-length=26
. The side effect of this may be lower UMI counts because of collisions.
Details of the Issue:
Generating FASTQ files for your 10x libraries requires running one of these demultiplexing software tools: cellranger
mkfastq
, Illumina's bcl2fastq
, or Illumina's bcl-convert
.
- Is mkfastq really needed to demultiplex, or can we use bcl2fastq?
- Direct Demultiplexing with Illumina Software
The cellranger mkfastq
simple sample sheet does not allow for adapter trimming settings. However, Illumina's IEM sample sheet CSV files used for bcl2fastq
and bcl-convert
do allow for these optional settings to be turned on.
See this article for more information on demultiplexing parameters: What adapters should I use in my IEM sample sheet?
To tell whether adapter settings have been used during FASTQ demultiplexing, run the FastQC diagnostic tool on your FASTQ reads:
Looking at the output html reports, the R1 profile for "Per base sequence content" will look like this, indicating a drop in the percentage of one of the bases in the last base of the read. In this example, the % of A has dropped to 0% in the 28th base:
The R2 profile will look like this, indicating there are a mixture of R2 read lengths spanning from 28 to 90 bp:
If you have an R1 profile with a mixture of read lengths, cellranger will produce this error:
[error] Pipestance failed. Error log at: CR_sample1_mkfastq/SC_RNA_COUNTER_CS/SC_MULTI_CORE/MULTI_GEM_WELL_PROCESSOR/COUNT_GEM_WELL_PROCESSOR/_BASIC_SC_RNA_COUNTER/_MATRIX_COMPUTER/MAKE_SHARD/fork0/join-ub321ecfb57/_errors Log message: ERROR: We detected a mixture of different R1 lengths ([26-28]), which breaks assumptions in how UMIs are tabulated and corrected. To process these data, you will need to truncate to the shortest observed R1 length by providing 26 to the --r1-length argument if are running count/vdj, or via the r1-length parameter in each of the following tables of your multi config CSV if you are running multi: [gene-expression]
How to resolve this error "We detected a mixture of different R1 lengths"?
Taking a look at the original sample sheet CSV used for running demultiplexing software, you will notice "Adapter Settings", either bcl2fastq:
Or bcl-convert
A mixture of R2 read lengths and the Per Base Sequence content matching the profile listed above diagnose this issue. Your R1 may or may not have a mixture of read lengths.
If your R1 reads all have the same length, then cellranger count will run without error. An alert for 'Low Fraction of Valid UMIs' may be produced because the valid UMIs in the R1 reads have been affected by adapter trimming.
Solutions To Fix This Issue:
Re-running the demultiplexing software with the adapter settings turned off will fix the issue. Here are example sample sheets to use for Illumina's software tools:
[Data] Lane,Sample_ID,index,index2 1,sample1,ATGGAGGGAG,AATGGGTTAT
[Header] FileFormatVersion,2 [BCLConvert_Settings] CreateFastqForIndexReads,0 [BCLConvert_Data] Lane,Sample_ID,Index,Index2 1,sample1,ATGGAGGGAG,AATGGGTTAT
This will produce a typical R1 profile that looks like this:
The R2 fastqc profile will show that the sequence lengths for the reads are all uniformly 90 bp:
Re-running cellranger with the new FASTQ reads should alleviate the alert indicating Low Fraction of Valid UMIs.
Or, run cellranger with the --r1-length=26
option, like this:
cellranger count --id=sample1_output \ --r1-length=26 \ --transcriptome=/path/to/refdata-gex-GRCh38-2020-A/ \ --fastqs=/path/to/fastqs/ \ --sample=sample1
The effect of this option is having a 10 bp UMI instead of a 12 bp UMI. Unless the complexity of the library is very high, there should be minimal impact of having a 10 bp UMI compared to 12 bp UMI. The fraction of reads incorrectly flagged UMI duplicates due to UMI collision will be slightly higher. The effect is a slight depression in UMI counts.
How to resolve this error "We detected a mixture of different R1 lengths"?
If you have any questions or concerns, please contact support@10xgenomics.com.
Products: All