Question: Is there way to filter the BAM file produced by 10x pipelines, so that it only contains alignments from a list of barcodes?
Answer: There are times when it is desirable to focus on alignments from a small subset of barcodes. For example, one may export a list of barcodes that belong to a cluster of interest from Loupe browser, or obtain a set of barcode that express a gene of interest from the feature-barcode matrix.
There are 2 ways to do this:
Method 1:
Go to https://github.com/10XGenomics/subset-bam to download a pre-compiled binary for Linux or MacOS of subset-bam
. The page also documents on how to run this tool.
Method 2:
First, put the desired barcodes in filter.txt. For cellranger
, cellranger-atac
, and cellranger-dna
, it is recommended to include "CB:Z:" to make sure the filter applies exclusively to that tag in the BAM file. For longranger
, please include "BX:Z" instead.
Please be noted that there should be NO additional line break in the file filter.txt, as any additional line break will generate a line with empty character and accordingly cause issues when using the following commands.
CB:Z:GCCAAATTCACATACG-1
CB:Z:GGACATTGTGATGATA-1
CB:Z:TCAGGTACATTAGGCT-1
...
Second, set $BAM_FILE to the name of the BAM file you wish to filter. For example, use the following command if the BAM file were named 'possorted_genome_bam.bam'.
export BAM_FILE='possorted_genome_bam.bam'
Next, set up the environment so the shell can find samtools
. If your 10x pipeline is installed at $10X_PATH, you should type the following:
source $10X_PATH/sourceme.bash
Then copy and paste the entire code block at once into a bash shell and hit ENTER:
# Save the header lines
samtools view -H $BAM_FILE > SAM_header
# Filter alignments using filter.txt. Use LC_ALL=C to set C locale instead of UTF-8
samtools view $BAM_FILE | LC_ALL=C grep -F -f filter.txt > filtered_SAM_body
# Combine header and body
cat SAM_header filtered_SAM_body > filtered.sam
# Convert filtered.sam to BAM format
samtools view -b filtered.sam > filtered.bam
The file filtered.bam will only contain alignments from the list of desired barcodes.
Filtering Feature tags:
Since Cell Ranger also adds CB tag to feature barcode read, it is possible to use the following command to split CB filtered BAM files into two separate files; one for gene expression and another for feature barcode.
Eg for feature tag:
samtools view filtered.bam | LC_ALL=C grep -F fb:Z | samtools view -b -o features.bam
Disclaimer: This article and code-snippet are provided for instructional purposes only. 10x Genomics does not support or guarantee the code.