Question: Why do I see an FASTQ header mismatch error when running Cell Ranger or Space Ranger pipelines?
Answer:
If you see a FASTQ header mismatch error, then it means the header of the reads in the Read 1 and Read 2 FASTQ files do not match. If this is the case, then any of the Ranger pipelines results in an error message similar to the one shown below:
FASTQ header mismatch detected at line 8 of input files "/mnt/users/sample_S1_L001_R1_001.fastq.gz" and "/mnt/users/sample_S1_L001_R2_001.fastq.gz": file: "/mnt/users/sample_S1_L001_R1_001.fastq.gz", line: 8
The FASTQ header mismatch can be verified using one of the below methods:
Method 1: Inspect the specific line number in the error message and look for inconsistencies in the corresponding header of the specific records.
For example, the below command will print the 8th line in the Read 1 FASTQ file and outputs the corresponding record's header starting with '@'. Repeat it with the Read 2 FASTQ file and compare the headers. Please change the line_num
value to the specific line number reported in the error message.
less sample_S1_L001_R1_001.fastq.gz | awk -v line_num=8 'BEGIN { block_num = int((line_num-1)/4) + 1 } NR == ((block_num-1)*4+1) { header_line = $0 } NR == line_num { print header_line "\n"}'
Method 2: Count the number of records in both the files and compare the results.
For example, the below command divides the total number of lines by 4 because each record in a Read 1 FASTQ file consists of 4 lines and outputs the number of records. Repeat it with the Read 2 FASTQ file and compare the number of records.
awk 'END {print NR/4}' sample_S1_L001_R1_001.fastq.gz
Below are some common issues related to this error:
a. Mismatched Read 1 and Read 2 FASTQ files. This could be verified by checking if the input files are from the same experiment or if there has been a mix-up.
b. FASTQs downloaded from the original source is incomplete. This can be resolved using the suggestions in the knowledge base article, IO error in FASTQ files.
c. Merging multiple FASTQ files from different sequencing instruments, leading to header format mismatches. This can be verified by checking if the header format is appropriate for merging and using custom scripts or third party tools like seqtk.
d. In rare cases, the sample sheet used for demultiplexing, is not accurate. This can be verified by checking for inaccuracies in the metadata information in the samplesheet.
Products: All