Question: How do I calculate median number of peaks per cell for Cell Ranger ATAC?
Answer: This particular metric is not calculated by the pipeline. However, it can be estimated by this product: (median_fragments_per_cell * frac_fragments_overlapping_peaks), using two metrics that are available from summary.csv and web_summary.html.
- frac_fragments_overlapping_peaks
applies to all fragments that pass our mapping filters. It also includes fragments that are in non-cell barcodes because peak-calling happens before cell-calling. - Technically, the estimate really represents "Median Fragments Overlapping Peaks per Cell", which may be a bit different from "Median Number of Peaks per Cell". The problem is that more than one fragment can overlap the same peak, especially if the called peak is large.
In order to calculate the metric accurately, it will be necessary to go through the peak-barcode matrix. First, go to the directory containing the peak-barcode matrix data (e.g. ANALYSIS/outs/filtered_gene_bc_matrices/GRCh38
), then copy and paste the entire code block at once into a bash shell and hit ENTER.
tail -n +4 matrix.mtx | cut -d ' ' -f 2 | sort -n -k1 | uniq -c | sed 's/^\s*//' | cut -d ' ' -f 1 | sort -n -k1 > sorted_peaks_per_cell;
export cells=`wc -l sorted_peaks_per_cell | sed 's/\s.*//'`;
export median_cell=`echo "${cells}/2" | bc`;
clear;
echo "Median number of peaks per cell: `tail -n +${median_cell} sorted_peaks_per_cell | head -n 1`";
The answer should be displayed on the terminal like below:
Median number of peaks per cell: 6367
Disclaimer: This article and code-snippet are provided for instructional purposes only. 10x Genomics does not support or guarantee the code.