Question:
I want to use a 10x Genomics BAM file with gene expression and Feature Barcoding data to generate FASTQs. When I use the 10x Genomics bamtofastq
tool, it outputs multiple folders labeled with numbers. How do I know which FASTQs are associated with which library type?
Answer:
When analyzing gene expression data with 10x Genomics Feature Barcoding technology, Cell Ranger outputs one combined BAM file which contains reads from all libraries generated for a given sample. You can convert this BAM file back into FASTQ files using the 10x Genomics bamtofastq
tool. Depending on the experimental design of that run, bamtofastq
may produce two or more folders of FASTQ files. The folder names may be in the format as follows:
[sample_name]_[library_id]_[gem_group]_[flowcell_id]
For example:
sample1_0_1_H7MHGDSXY
sample1_1_1_H7MHGDSXY
If this BAM file was generated from a cell multiplexing experiment with a gene expression assay, you will see two folders of FASTQs. If, on top of that, a cell surface protein or CRISPR screening library was also prepared for this sample, then you may see three folders of FASTQs.
To determine which folder of FASTQ files represents data from which library, please follow the steps below.
1. You will need samtools
, a copy of which can be found in your Cell Ranger installation. You can source the following file so Linux knows where to find samtools
. Please be sure to edit the bolded part below depending on where Cell Ranger is installed:
source /PATH/TO/CellRanger/sourceme.bash
2. Run samtools view -H
on the original BAM file. You will find lines with a @CO
tag. For example:
@CO library_info:{"library_id":0,"library_type":"Multiplexing Capture","gem_group":1,"target_set_name":null}
@CO library_info:{"library_id":1,"library_type":"Gene Expression","gem_group":1,"target_set_name":null}
From this example, we see that library_id 0 corresponds to the multiplexing library and library_id 1 corresponds to the gene expression library. Thus, you can tell that sample1_0_1_H7MHGDSXY
contains cell multiplexing library FASTQ files and sample1_1_1_H7MHGDSXY
contains gene expression library FASTQ files.
Last updated: June 2023