Question: How can I get the corresponding read counts for the observed UMIs?
Answer: There are two Cell Ranger output files with read-level information:
1. BAM: Indexed BAM file containing position-sorted reads aligned to the genome and transcriptome.
2. Molecule Info: HDF5 file containing per-molecule information for all molecules that contain a valid cell-barcode and valid UMI.
Obtaining the read-level information from each file will require some custom coding.
For example, from the molecule_info.h5 file, you can get read-level support for each valid barcode, valid UMI, and gene combination. If you are comfortable working with H5 files in R or Python, you can aggregate the confidently mapped reads per cell, gene, and UMI .
The latest version of cellrangerRkit also has some support for reading in molecule_info.h5 using the custom function
Once you load the molecule info file, you have confidently mapped read count data for each valid barcode, gem group, gene, and valid UMI (assuming single genome reference).
> t_3k <- load_cellranger_matrix("data/t_3k")
Searching for genomes in: data/t_3k/outs/filtered_gene_bc_matrices
Using GRCh38 in folder: data/t_3k/outs/filtered_gene_bc_matrices/GRCh38
Loaded matrix information
Loaded gene information
Loaded barcode information
Loaded summary information
> t_3k <- load_molecule_info(t_3k)
Loaded molecule information
In load_molecule_info(t_3k) :
Loading the molecule info is only necessary if you are subsampling reads. This method of normalizing matrices is deprecated.
Please use the `cellranger aggr` pipeline (new in cellranger 1.2.0), which can combine arbitrary gene-barcode matrices
and produce a combined, depth-normalized matrix.
Classes ‘data.table’ and 'data.frame': 17626174 obs. of 5 variables:
$ barcode :integer64 24674393 24674393 24674396 24674396 24674396 24674396 24674396 24674396 ...
$ gem_group:integer64 1 1 1 1 1 1 1 1 ...
$ gene :integer64 30532 33654 4742 4995 4996 5538 5611 6477 ...
$ umi :integer64 44083 55575 212865 910565 338608 672212 467484 710293 ...
$ reads :integer64 9 20 1 2 1 24 14 1 ...
- attr(*, "sorted")= chr "barcode" "gem_group" "gene" "umi"
- attr(*, ".internal.selfref")=<externalptr>
Disclaimer: This article and code-snippet are provided for instructional purposes only. 10x Genomics does not support or guarantee the code.