Question: Is there an upper limit on the number of cells that can be used with cellranger aggr
pipeline?
Answer: Cell Ranger does not have a maximum number of cells threshold when the aggr
pipeline is run without batch correction. We have validated for up to 2.5 million cells when run on minimum compute resources. But if you have larger compute resources, you could run larger cell counts in Cell Ranger when merging data without chemistry correction.
However, if you are using chemistry batch correction, then there is indeed a threshold of 800,000 cells. The chemistry batch correction is resource-intensive and, therefore, considering the minimum compute resources (64 GB RAM), we have specified the 800k cell limit. If however, you have sufficiently high compute resources, you can manually change the limit that is hardcoded. Here are the instructions:
1. First navigate to the folder where you have Cell Ranger installed in your system.
2. Next, navigate to the folder containing the constants.py
script by running either
Cell Ranger 4+: cd lib/python/cellranger/analysis
Cell Ranger 3: cd cellranger-cs/3.x.x/lib/python/cellranger/analysis
3. Open the constants.py
script using a text editor such as nano, emacs, or vim.
4. Within the script you will find this section:
# chemistry batch correction
# this upper limit was determined via testing,
# larger numbers of cells will require _at least_ memory reservation changes
# in CORRECT_CHEMISTRY_BATCH.join, if not substantial algorithmic changes
CBC_MAX_NCELLS = 800000
CBC_N_COMPONENTS_DEFAULT = 100
CBC_KNN = 10
CBC_ALPHA = 0.1
CBC_SIGMA = 150
CBC_REALIGN_PANORAMA = False
5. Here, change the CBC_MAX_NCELLS parameter (bolded line above) from "800000" to the maximum number that you need.
6. Save this change in the script, reload Cell Ranger to your environment, and you will be able to run the cellranger aggr
pipeline with a higher number of cells.
Please note that we have not validated the pipeline with cell counts beyond 800k and advise caution.
Also, please note that Loupe Cell Browser can load up to 1.3 million cells at this time. In addition, the performance of Loupe Cell Browser for differential expression analysis has only been validated with up to 100k cells. When performing differential expression analysis with more than 100k cells, Loupe Cell Browser's performance cannot be guaranteed. It may take a long time to run or even crash depending on various factors such as system resources and data size.
Note: Larger datasets may require additional memory beyond our stated minimum requirement of 64 GB. See the Cell Ranger system requirements page for more details and time trial data.
Disclaimer: This article and code-snippet are provided for instructional purposes only. 10x Genomics does not support or guarantee modifications to the Cell Ranger code base.
Last update: Nov 2024