Question: Why do I see an IO error in FASTQ files when running Cell Ranger or Space Ranger pipelines?
If you see an IO error in FASTQ files, then it means the input FASTQ files provided to the pipeline were incomplete or corrupt. If this is the case, then any of the Ranger pipe-stances will result in the following error:
Log message: IO error in FASTQ file '"/mnt/users/PBMC5k_S1_L001_R1_001.fastq.gz"', line: 264305748: unexpected end of file
Below are some common issues related to this error:
a. The FASTQs downloaded from the original source were incomplete or corrupt.
b. The FASTQs uploaded to the server were incomplete.
c. The FASTQ generation was not successful.
In case of a and b, you can verify it by following one of the below methods:
Method 1: Check the file size - an unusually small size or different sizes could indicate an incomplete file. You can issue one of the commands below to verify this. For example,
# md5checksum command to compare the hash values of the original files and compare it to the hash value of the downloaded files.
#ls command to check the file size of the original and the downloaded files
ls -lh sample_S1_L001_R1_001.fastq.gz
Alternatively you can also use the
zcat utility to verify for any form of corruption in the Read 1 and Read 2 files, if it occurs at the end of the file. For example,
zcat sample_S1_L001_R1_001.fastq.gz | tail
zcat sample_S1_L001_R2_001.fastq.gz | tail
This command can be used to view the last few lines of a compressed FASTQ file and check for any corruption issues. This can also be used on Mac OS. If
zcat is not available on Mac OS terminal, then it can be installed using a package manager like Homebrew.
Method 2: Check the file integrity - If your FASTQ files are compressed using gzip, then the below command will help to verify the integrity of the file and report any errors found. For example,
gzip -t sample_S1_L001_R2_001.fastq.gz
Method 3: Use FASTQC to generate a quality report. This will give you an overview of the quality and completeness of your files, as well as any potential problems. Please follow this KB article for more details.
In case of c, you can verify with the Sequencing core if the demultiplexing was successful (or) if you performed the demultiplexing, then the log files should show a successful pipestance.
Note: If you encounter this error while running analysis on the Cloud platform, it is possible that the input files were not uploaded successfully, in addition to the issues mentioned earlier. To get more information, please check out the knowledge base article linked here: Why am I having trouble uploading my FASTQ files on 10x Genomics Cloud Analysis?