Question: How is the "Log2 fold change" value calculated in Cell Ranger and Loupe Cell Browser?
Answer: The "Log2 fold change" value reported in Cell Ranger and in the gene table in Loupe Cell Browser is the ratio of the normalized mean gene UMI counts in each cluster/group relative to all other clusters/groups for comparison.
In Loupe Cell Browser, the comparison is between a cluster/group versus the rest of the dataset (globally distinguishing) or versus all other checked groups in a selected category (locally distinguishing).
There are 5 main steps in calculating the Log2 fold change:
Assume n total cells
* Calculate the total number of UMIs in each cellcounts_per_cell: n values
* Calculate a size factor for each cell by dividing the cell's total UMI count by the median of those n counts_per_cellcounts_per_cell / median(counts_per_cell): n values
* Calculate a size factor for each cluster by summing the size factors of each cell in that cluster.
* Normalize the UMI counts for each gene in each cluster by dividing by the size factor for that cluster
* Calculate fold change per gene by dividing the normalized total UMI counts for that gene in cluster1 by cluster2
For more specifics on how the value is calculated, please see the Cell Ranger source code for the differential expression analysis module:
# Introduce a pseudocount into log2(fold_change) 'log2_fold_change': np.log2((1+gene_sums_a)/(1+size_factor_a)) - np.log2((1+gene_sums_b)/(1+size_factor_b))
If you are comfortable working with R here are is an example command line for computing the fold change:
library(Matrix)
# Load the matrix in R
matrix_dir = "~/Downloads/pbmc_filtered_feature_bc_matrix/"
barcode.path <- paste0(matrix_dir, "barcodes.tsv.gz")
features.path <- paste0(matrix_dir, "features.tsv.gz")
matrix.path <- paste0(matrix_dir, "matrix.mtx.gz")
mat <- readMM(file = matrix.path)
feature.names = read.delim(features.path,
+ header = FALSE,
+ stringsAsFactors = FALSE)
barcode.names = read.delim(barcode.path,
+ header = FALSE,
+ stringsAsFactors = FALSE)
colnames(mat) = barcode.names$V1
rownames(mat) = feature.names$V1
# Calculate total UMI counts per cell and size factor per cell
totalUMIcounts = apply(mat,2,sum)
sizefactor = totalUMIcounts/median(totalUMIcounts)
# Read the csv file with cluster assignments for barcodes downloaded from Loupe Cell Browser
# In this example, I have downloaded Graph-based clusters
graphbased = read.table("~/Downloads/Graph-based.csv",header=TRUE,sep=",")
# Extract subset of UMI counts for gene tcf7l2 in a separate data table
tcf7l2 = mat["ENSG00000148737",]
# Extract barcodes in cluster1 and cluster 5 for comparison
c1 = subset(graphbased,Graph.based=="Cluster 1")
c5 = subset(graphbased,Graph.based=="Cluster 5")
c1bc = c1$Barcode
c5bc = c5$Barcode
# Get the UMI counts for tcf7l2 for barcodes in cluster1 and cluster5
tcf7l2_c1 = tcf7l2[c1bc]
tcf7l2_c5 = tcf7l2[c5bc]
# Sum the UMI counts for gene for cells in cluster1 and cluster5
c1sumtcf7l2 = sum(tcf7l2[c1bc])
c5sumtcf7l2 = sum(tcf7l2[c5bc])
# Calculate size factor for each cluster
c1_sf = sizefactor[c1bc]
c5_sf = sizefactor[c5bc]
# Calculate fold change
log2((1+c1sumtcf7l2)/(1+sum(c1_sf)))-log2((1+c5sumtcf7l2)/(1+sum(c5_sf)))