Question: Is there a compact and self-contained representation of the peak-barcode matrix from Cell Ranger ATAC 1.x that I can open in a text editor?
Answer: The MEX matrix format is compact but it does require users to integrate information from three different files. Luckily, there is a way to use the contents of the peak-barcode matrix folder to create a self-contained five-column CSV file.
First, go to the directory containing the peak-barcode matrix data (e.g. ANALYSIS/
outs/filtered_peak_bc_matrix), then copy and paste the entire code block at once into a bash shell and hit ENTER.
# Print line number along with contents of barcodes.tsv and peaks.bed
awk -F "\t" 'BEGIN { OFS = "," }; {print NR,$1}' barcodes.tsv | sort -t, -k 1b,1 > numbered_barcodes.csv
awk -F "\t" 'BEGIN { OFS = "," }; {print NR,$1,$2,$3}' peaks.bed | sort -t, -k 1b,1 > numbered_peaks.csv
# Skip the header lines and sort matrix.mtx
tail -n +4 matrix.mtx | awk -F " " 'BEGIN { OFS = "," }; {print $1,$2,$3}' | sort -t, -k 1b,1 > peak_sorted_matrix.csv
tail -n +4 matrix.mtx | awk -F " " 'BEGIN { OFS = "," }; {print $1,$2,$3}' | sort -t, -k 2b,2 > barcode_sorted_matrix.csv
# Use join to replace line number with barcodes and peaks
join -t, -1 1 -2 1 numbered_peaks.csv peak_sorted_matrix.csv | cut -d, -f 2,3,4,5,6 | sort -t, -k 4b,4 | join -t, -1 1 -2 4 numbered_barcodes.csv - | cut -d, -f 2,3,4,5,6 > final_matrix.csv
# Remove temp files
rm -f barcode_sorted_matrix.csv peak_sorted_matrix.csv numbered_barcodes.csv numbered_peaks.csv
The column definitions of the output final_matrix.csv
are as follows:
- 10x cellular barcode
- Peak chromosome
- Peak start position
- Peak end position
- # of cut sites within peak
Here is a sample of what final_matrix.csv
looks like:
AAACGAAAGCGCAATG-1,chr10,100009056,100010815,2
AAACGAAAGCGCAATG-1,chr10,100266784,100268265,2
AAACGAAAGCGCAATG-1,chr10,100285547,100287509,2
AAACGAAAGCGCAATG-1,chr10,100346083,100348584,2
AAACGAAAGCGCAATG-1,chr10,100481215,100483827,6
AAACGAAAGCGCAATG-1,chr10,100561809,100563388,2
AAACGAAAGCGCAATG-1,chr10,100912098,100913900,6
AAACGAAAGCGCAATG-1,chr10,100986729,100988099,2
AAACGAAAGCGCAATG-1,chr10,101012542,101015536,2
AAACGAAAGCGCAATG-1,chr10,101059713,101063979,2
...
Disclaimer: This article and code-snippet are provided for instructional purposes only. 10x Genomics does not support or guarantee the code.