Question:
Cell Ranger detected a lower number of cells than expected in my FFPE samples with the Fixed RNA Profiling (Flex) assay. Can I change cell calling parameters to recover more cells?
Answer:
We have found cell recovery with FFPE samples may be variable or lower than with other sample inputs. This may be due to low complexity and the cell calling algorithm. Before trying to change cell calling parameters, please make sure that the library and sample quality is good by checking metrics in web summary html files. Some key metrics to review for FFPE data are discussed in this article: What QC is available before, during, and after running the Chromium Gene Expression Flex assay with FFPE samples? It is also helpful to check the technical note, Interpreting Cell Ranger multi Web Summary Files for Fixed RNA Profiling, for understanding of metrics in web summaries.
The gene expression (GEX) barcode rank plot of each sample can help determine whether cell calling parameters need to be changed to recover more cells. In Cell Ranger, there are two steps of cell calling, as discussed here. The second step is based on the EmptyDrops method, and the minimum UMI threshold of EmptyDrops is max(500, 1+max UMI observed in the ambient range). This means barcodes with a UMI count lower than the threshold will not be considered for cell calling in the second step. Oftentimes, due to the lower complexity in FFPE samples, potential cell-associated barcodes with less than 500 UMIs are excluded from cell calling.
In data with optimal cell calling, the end of the blue-color gradient (representing the presence of cells) is usually located around the steepest drop-off point in the first cliff of barcode rank plots (see figure 1 for example) This indicates clear distinction between cell versus non-cell barcodes and cell calling identifies cell-associated barcodes before the steepest drop-off point.
Figure 1. GEX barcode rank plot of a Lung Cancer FFPE sample after running the Flex assay (Lung Cancer, Octo in this dataset). With default settings, the number of cells detected by Cell Ranger is 10,080, which is close to the targeted number (10,000) for this sample.
In FFPE data with lower complexity, the end of the blue-color gradient can happen before the first cliff due to the 500 UMI threshold in cell calling (see figure 2 for example). This may lead to “undercalling” of cell-associated barcodes. Sometimes with samples like this, users may also see the alert “Low Fraction Confidently Mapped Reads in Cells” in web summary files.
Figure 2. GEX barcode rank plot of a mouse spleen FFPE sample after running the Flex assay (this dataset). With default settings, the number of cells detected by Cell Ranger is 8,268, which is lower than the targeted number (10,000) for this sample.
If the data is generally of good quality (based on other metrics and clustering pattern), it may be worth adjusting cell calling parameters to recover additional cell-associated barcodes. There are two options in Cell Ranger that can be adjusted in this scenario.
- force-cells. This option will bypass the cell calling algorithm and force Cell Ranger to call the specific number of cells that you specified. For example, if you specify 10000, Cell Ranger will call the top 10000 barcodes as cells.
- emptydrops_minimum_umis. This option can be used to lower the UMI cutoff in the second step of cell calling. Cell Ranger will still go through cell calling steps, but it will now take into consideration the barcodes with UMIs above the specified threshold. For example, if you specify 100 for this parameter, Cell Ranger will include cells with more than 100 UMIs to go through cell calling. You may start with setting this to 100. Note that the value to use for this parameter may vary from sample to sample. You may experiment with different values and evaluate the results. Note this option is only available in Cell Ranger v7.1.0 and later.
For multiplex Flex data, you can specify either (not both) of the parameters above under the [samples] section for each sample in the multi config csv file, for example:
[samples]
sample_id,probe_barcode_ids,emptydrops_minimum_umis
BC001,BC001,100
BC002,BC002,100
BC003,BC003,100
Note that the number for each sample can be different, and users can choose to specify the value on only some, not all of the samples, for example:
[samples]
sample_id,probe_barcode_ids,emptydrops_minimum_umis
BC001,BC001,
BC002,BC002,100
BC003,BC003,150
For singleplex Flex data, you could specify either (not both) of the parameters above under the [gene expression] section for each sample, for example:
[gene-expression]
reference,/path/to/transcriptome
probe-set,/path/to/probe-set.csv
create-bam,false
emptydrops-minimum-umis,100
Note that the option in the [samples] section has an underscore (_) and it has a dash (-) in the [gene-expression] section. The same rule applies to the option `force_cells` versus `force-cells`.
Please note: with either approach described above, barcodes with only ambient RNAs may be included in the result. Please inspect the result carefully and perform appropriate filtering of barcodes in downstream analysis if needed. You may check this analysis guide article on Common Considerations for Quality Control Filters for Single Cell RNA-seq Data.
For samples with compromised quality due to workflow issues (e.g. poor mapping/sequencing metrics, lack of a clear clustering pattern), changing cell calling parameters may not be helpful. With poor quality samples, even if additional barcodes could be called as cells, it is likely that those are barcodes with background noise and not reliable data.
Please note that the parameter emptydrops_minimum_umis has only been tested and found beneficial for some FFPE samples analyzed using the Flex assay. This has not been tested with data from other chemistries, and lowering this value for other types of datasets may lead to overcalling of cells and inclusion of non-cell barcodes for downstream analyses.
Products: Single Cell Gene Expression Flex
Last Updated: August 2024