Question: Is there a compact and self-contained representation of the feature-barcode matrix from Space Ranger 1.x that I can open in a text editor?
Answer: The MEX matrix format is sparse but it does require users to integrate information from three different files. However, there is a way to use the contents of the feature-barcode matrix folder to create a self-contained five-column CSV file.
First, go to the directory containing the feature-barcode matrix data (e.g. ANALYSIS/outs/filtered_feature_bc_matrix
), then copy and paste the entire code block at once into a bash shell and hit ENTER.
# Print line number along with contents of barcodes.tsv.gz and genes.tsv.gz
zcat barcodes.tsv.gz | awk -F "\t" 'BEGIN { OFS = "," }; {print NR,$1}' | sort -t, -k 1b,1 > numbered_barcodes.csv
zcat features.tsv.gz | awk -F "\t" 'BEGIN { OFS = "," }; {print NR,$1,$2,$3}' | sort -t, -k 1b,1 > numbered_features.csv
# Skip the header lines and sort matrix.mtx.gz
zcat matrix.mtx.gz | tail -n +4 | awk -F " " 'BEGIN { OFS = "," }; {print $1,$2,$3}' | sort -t, -k 1b,1 > feature_sorted_matrix.csv
zcat matrix.mtx.gz | tail -n +4 | awk -F " " 'BEGIN { OFS = "," }; {print $1,$2,$3}' | sort -t, -k 2b,2 > barcode_sorted_matrix.csv
# Use join to replace line number with barcodes and genes
join -t, -1 1 -2 1 numbered_features.csv feature_sorted_matrix.csv | cut -d, -f 2,3,4,5,6 | sort -t, -k 4b,4 | join -t, -1 1 -2 4 numbered_barcodes.csv - | cut -d, -f 2,3,4,5,6 | sort > intermediate_matrix.csv
# Use join to insert spatial coordinate information
sort ../spatial/tissue_positions_list.csv | join -t, -1 1 -2 1 intermediate_matrix.csv - | awk -F "," 'BEGIN { OFS = "," }; {print $1,$7,$8,$2,$3,$4,$5}' > final_matrix.csv
# Remove temp files
rm -f barcode_sorted_matrix.csv feature_sorted_matrix.csv numbered_barcodes.csv numbered_features.csv intermediate_matrix.csv
The column definitions of the output final_matrix.csv
are as follows:
- Spatial barcode
- Spatial row
- Spatial column
- Feature ID
- Feature name
- Feature type
- UMI count
Here is a sample of what final_matrix.csv
looks like:
AAACAAGTATCTCCCA-1,50,102,ENSMUSG00000000088,Cox5a,Gene Expression,13
AAACAAGTATCTCCCA-1,50,102,ENSMUSG00000000149,Gna12,Gene Expression,1
AAACAAGTATCTCCCA-1,50,102,ENSMUSG00000000168,Dlat,Gene Expression,1
AAACAAGTATCTCCCA-1,50,102,ENSMUSG00000000171,Sdhd,Gene Expression,2
AAACAAGTATCTCCCA-1,50,102,ENSMUSG00000000197,Nalcn,Gene Expression,1
AAACAAGTATCTCCCA-1,50,102,ENSMUSG00000000253,Gmpr,Gene Expression,2
AAACAAGTATCTCCCA-1,50,102,ENSMUSG00000000308,Ckmt1,Gene Expression,2
AAACAAGTATCTCCCA-1,50,102,ENSMUSG00000000326,Comt,Gene Expression,4
...
Disclaimer: This article and code-snippet are provided for instructional purposes only. 10x Genomics does not support or guarantee the code.