Question: Is there way to filter the BAM file produced by 10x pipelines, so that it only contains alignments from a list of barcodes?
Answer: There are times when it is desirable to focus on alignments from a small subset of barcodes. For example, one may export a list of barcodes that belong to a cluster of interest from Loupe browser, or obtain a set of barcode that express a gene of interest from the feature-barcode matrix.
There are 2 ways to do this:
Go to https://github.com/10XGenomics/subset-bam/releases to download a pre-compiled binary for Linux or MacOS of
subset-bam. The page also documents on how to run this tool.
First, put the desired barcodes in filter.txt. For
cellranger-dna, it is recommended to include "CB:Z:" to make sure the filter applies exclusively to that tag in the BAM file. For
longranger, please include "BX:Z" instead.
Second, set $BAM_FILE to the name of the BAM file you wish to filter. For example, use the following command if the BAM file were named 'possorted_genome_bam.bam'.
Next, set up the environment so the shell can find
samtools. If your 10x pipeline is installed at $10X_PATH, you should type the following:
Then copy and paste the entire code block at once into a bash shell and hit ENTER:
# Save the header lines
samtools view -H $BAM_FILE > SAM_header
# Filter alignments using filter.txt. Use LC_ALL=C to set C locale instead of UTF-8
samtools view $BAM_FILE | LC_ALL=C grep -F -f filter.txt > filtered_SAM_body
# Combine header and body
cat SAM_header filtered_SAM_body > filtered.sam
# Convert filtered.sam to BAM format
samtools view -b filtered.sam > filtered.bam
The file filtered.bam will only contain alignments from the list of desired barcodes.
Disclaimer: This article and code-snippet are provided for instructional purposes only. 10x Genomics does not support or guarantee the code.