Question: Why did cellranger count
fail in the CHUNK_READS stage?
Answer: Corrupt or incomplete FASTQ files are a common cause for pipeline failure in the cellranger count
stage. Users may notice a 'process called error' returned by Cell Ranger at CHUNK_READS stage with non-zero exit status 1. Below are messages in the errors
and stdout
files in the failed job chunk associated with this failure mode:
[errors]
...
CalledProcessError: Command '['chunk_reads', '--reads-per-fastq', '5000000', '/pipestance_id/SC_RNA_COUNTER_CS/SC_RNA_COUNTER/_BASIC_SC_RNA_COUNTER/CHUNK_READS/fork0/chnkX-XXXX/files/', 'fastq_chunk', '--martian-args', 'chunk_args.json', '--compress', 'lz4']' returned non-zero exit status 1
[stdout] ... error: fastq parsing error caused by: corrupt deflate stream running chunk reads: [['chunk_reads', '--reads-per-fastq', '5000000', '/pipestance/SC_RNA_COUNTER_CS/SC_RNA_COUNTER/_BASIC_SC_RNA_COUNTER/CHUNK_READS/fork0/chnkX-XXXX/files/', 'fastq_chunk', '--martian-args', 'chunk_args.json', '--compress', 'lz4']]
Some of the causes are explained as follows:
(1) IF it was a corrupt gzip that causes the error, the stderr logs in the CHUNK_READS stage shows the following error lines:
error: corrupt gzip stream does not have a matching checksum
caused by: corrupt gzip stream does not have a matching checksum
In such cases, it is advised for the users to verify if the input FASTQ files downloaded from the source was complete by using file checksums such as md5sum
.
(2) In some other cases, the stderr in the CHUNK_READS stage reveals that IF it was a corrupt input file, it will have the following error lines:
error: corrupt deflate stream
caused by: corrupt deflate stream
In such cases, it is advised to verify if the input FASTQ files follow a proper format as illustrated below:
@<identifier and expected information>
<sequence>
+<identifier and other information OR empty string>
<quality>