Question: How to use
.cloupe file from the GEX library to investigate and better understand the cell number disparity in V(D)J data?
Answer: Analysis of gene expression (GEX) data via Cell Ranger produces the .cloupe file, which can be opened using 10x Genomics’ Loupe Browser Software, for visualization, interpretation, and analysis. The
.cloupefile from the GEX library is very useful for troubleshooting and understanding the root cause for some of the VDJ library failure modes, as described here. In this article, we provide details of how to use the Loupe Browser software to investigate cases where there is a disparity in the cell numbers between the GEX and V(D)J libraries.
.cloupe file from GEX data can be used to assess the total number of T/B-cells in a sample. For example, marker genes such as CD79A/B (human) and CD19 (mouse) can be used to identify B-cells. While marker genes such as CD3D/E/G, and CD8/ILR can be used for T-cells.
The Loupe Browser screenshot below illustrates CD79A and CD79B expressions in a sorted B-cells sample. Most of the cells in this sample are B-cells, but a subset of the cells (cluster to the right) have higher UMI counts compared to the rest. . In some cases (more details later), the UMI counts or expression levels may be relevant.
Figure 1a,1b: Loupe Browser screenshots showing a wide range of CD79A and CD79B expression in a sorted B-cell sample and CD3D/E for T-cells respectively.
A low number of V(D)J cells (under-calling)
The phenomenon where the V(D)J library has fewer than the expected number of cells is referred to as under-calling. For example, if 2,000 B-cells were sorted and processed in a single GEM well, and the GEX library calls 1,500 cells while the V(D)J library calls only 20 B-cells, then it is a case of under-calling.
Some reasons for this issue could be related to the sample biology, the proportion of actual B/T-cells present in the given samples, or cell calls. There are two main phenotypes to look for:
Case (i) Look if there are fewer than expected T/B-cells in the GEX data, which may happen in some experiments. For example, in the screenshot below, very few T-cells are observed based on CD3D/E expression. If the VDJ data shows very few T-cells, this indicates that the biological truth of the cells has been captured, and there are no issues with the experiment or analysis.
Figure 2: A screenshot from Loupe Browser software showing very little expression (red dots) of the CD3E gene (highly expressed in T-cells) in the sample. This suggests that the sample does not have a high number of T-cells, to begin with.
Case (ii) Evaluate if V(D)J genes exhibit low UMI in the potential B/T cells. You can find out the specific V(D)J gene names by looking into the
regions.fa file present in the VDJ reference directory you are using (link here for more details about the reference directory structure). For example, Figure 3 below shows the same sample as Figure 2, which has a very low expression for T-cell receptor variable genes TRAV7 and TRAV2
Figure 3: Loupe Browser screenshot (illustrating the same sample as Figure 2), shows that the sample has no expression of TRAV2 and TRAV7 genes.
Alternatively, you can assess just the C gene, e.g. TRAC/TRBC1/TRBC2 for T-cells. Low UMIs in GEX could indicate that the VDJ transcripts have low expression. The Cell Ranger VDJ pipeline uses low UMI support as one of the criteria to filter out contigs and cells. If the transcripts have low expression, then they may be filtered out of the final VDJ results, leading to lower cell calls.
Case (iii) If there are expected numbers of T/B-cells in GEX data and if the VDJ gene UMI counts are reasonable, then we should consider evaluating other metrics for the VDJ data. For example, check if the enrichment metrics (reads mapped to any VDJ gene) in the
web_summary.html file, as described in this KB article here, is reasonable.
At this point also consider evaluating workflow-related questions such as sorting (cell viability after sorting; nozzle size; flow speed), cleanup, enrichment (BioAnalyzer traces), and cell viability. We would also recommend contacting 10x Genomics Technical Support (email@example.com).
More number of V(D)J cells called (over-calling)
When there are more T/B-cells that are called than expected, then the phenomenon is called here as VDJ cell over-calling. There may be several underlying reasons for the over-calling issue and this includes compromised samples (bad quality, poor viability, cell health, premature lysis), or a single clonotype expanded population. With the help of GEX data, the Loupe browser can be used to address the following questions:
- How many cells were called T/B-cells?
- Are there lots of T/B-cells but with a low UMI count?
- Is the BCR library dominated by plasma cells (expressing high levels of the JCHAIN gene which is highly expressed in Plasmacytoid dendritic cells)? For example, Figure 4 shows the GEX data of a B-cell sorted sample that has a high expression of JCHAIN.
Figure 4: Loupe Browser screenshot of a sorted B-cell sample where JCHAIN is very highly expressed, suggesting that the sample likely contains high numbers of plasma B-cells.
- Are there dead and dying cells? The Cell Ranger algorithm tries to filter out the barcodes where cells have low UMI or contain ambient RNA (such as, due to the leakage of receptor chains). However, Cell Ranger does not filter out cells or barcodes on the basis of high mitochondrial gene expression. Using the Loupe Browser software it is possible to investigate whether the sample contains too many cells with high expression of mitochondrial and MALAT1 genes, a key property of dead and dying cells. For example, Figure 5 (a and b) shows that the sorted B-cells sample contains a cluster towards the bottom that has both high expression of MALAT1 and mitochondrial gene (MT-CO1), suggesting that this cluster is mostly composed of dead/dying cells.
Figure 5: Loupe Browser screenshot showing a cluster with high expression of MALAT1 (panel a) and MT-CO1 (panel b).
Disclaimer: Please note that the Loupe Browser screenshots in this article are not from any real datasets. They have been created for illustration purposes only, and do not reflect true clustering or scenarios from any actual experiment.
Products: Single Cell Immune Profiling