Question: How can I get the corresponding read counts for the observed UMIs?
Answer: There are two Cell Ranger output files with read-level information:
1. BAM: Indexed BAM file containing position-sorted reads aligned to the genome and transcriptome.
2. Molecule Info: HDF5 file containing per-molecule information for all molecules that contain a valid cell-barcode and valid UMI.
Obtaining the read-level information from each file will require some custom coding.
For example, from the molecule_info.h5 file, you can get read-level support for each valid barcode, valid UMI, and gene combination. If you are comfortable working with H5 files in R or Python, you can aggregate the confidently mapped reads per cell, gene, and UMI .
The latest version of cellrangerRkit also has some support for reading in molecule_info.h5 using the custom function load_molecule_info.
Once you load the molecule info file, you have confidently mapped read count data for each valid barcode, gem group, gene, and valid UMI (assuming single genome reference).
> t_3k <- load_cellranger_matrix("data/t_3k")
Searching for genomes in: data/t_3k/outs/filtered_gene_bc_matrices
Using GRCh38 in folder: data/t_3k/outs/filtered_gene_bc_matrices/GRCh38
Loaded matrix information
Loaded gene information
Loaded barcode information
Loaded summary information
> t_3k <- load_molecule_info(t_3k)
Loaded molecule information
Warning message:
In load_molecule_info(t_3k) :
Loading the molecule info is only necessary if you are subsampling reads. This method of normalizing matrices is deprecated.
Please use the `cellranger aggr` pipeline (new in cellranger 1.2.0), which can combine arbitrary gene-barcode matrices
and produce a combined, depth-normalized matrix.
> str(t_3k@molecule_info)
Classes ‘data.table’ and 'data.frame': 17626174 obs. of 5 variables:
$ barcode :integer64 24674393 24674393 24674396 24674396 24674396 24674396 24674396 24674396 ...
$ gem_group:integer64 1 1 1 1 1 1 1 1 ...
$ gene :integer64 30532 33654 4742 4995 4996 5538 5611 6477 ...
$ umi :integer64 44083 55575 212865 910565 338608 672212 467484 710293 ...
$ reads :integer64 9 20 1 2 1 24 14 1 ...
- attr(*, "sorted")= chr "barcode" "gem_group" "gene" "umi"
- attr(*, ".internal.selfref")=<externalptr>
Disclaimer: This article and code-snippet are provided for instructional purposes only. 10x Genomics does not support or guarantee the code.