Question: How do I prepare Sequence Read Archive (SRA) data from NCBI for Cell Ranger?
Answer: One of the beauties of open source data in the sequencing age is the ability to reanalyze data generated by other researchers. The primary source of these publicly available data sets in the United States is the Sequence Read Archive (SRA) maintained by NCBI. Using these data in Cell Ranger requires some pre-processing. Before downloading SRA data, first, identify the platform and version of the chemistry used to generate the data. The following fix has been tested on Chromium v2 and v3 chemistry.
First, use the NCBI fastq-dump utility with the
--split-files argument to retrieve the FASTQ files. The command may look like this:
fastq-dump --split-files SRR6334436
The output would be two FASTQ files:
Cell Ranger requires FASTQ file names to follow the
bcl2fastq file naming convention.
Read Type is one of:
I1: Sample index read (optional)
R1: Read 1
R2: Read 2
Changing the file names will allow Cell Ranger (version >=2.1.1) to accept this data as inputs.
For more information on FASTQ format requirements, please see Specifying FASTQ files.