Question: How do I prepare Sequence Read Archive (SRA) data from NCBI for Cell Ranger?
Answer: One of the beauties of open source data in the sequencing age is the ability to reanalyze data generated by other researchers. The primary source of these publicly available data sets in the United States is the Sequence Read Archive (SRA) maintained by NCBI. Using these data in Cell Ranger requires some pre-processing. Before downloading SRA data, first, identify the platform and version of the chemistry used to generate the data. The following fix has only been tested on Chromium v2 chemistry.
Use the NCBI fastq-dump utility with the
--split-files argument to retrieve the FASTQ files. The command may look like this:
The output would be two FASTQ files:
Cell Ranger requires FASTQ file names to follow the
bcl2fastq file naming convention.
Read Type is one of:
I1: Sample index read (optional)
R1: Read 1
R2: Read 2
Changing the file names will allow Cell Ranger (version >=2.1.1) to accept this data as inputs.
For more information on FASTQ format requirements, please see Specifying FASTQ files.