Question: How do I run and troubleshoot 10x pipelines on a HPC cluster with a batch-scheduling system like SGE or LSF?
Answer: Every High Performance Computing (HPC) cluster cluster is unique - both the hardware configurations and the batch-scheduling systems allow for a great deal of flexibility. Although the details vary, generally there is a login node from which users will submit jobs to a queue using commands such as qsub or bsub. The cluster will then assign those jobs to compute nodes that are the 'workhorses' of the cluster.
There are two common ways to run our pipelines on a cluster:
1. Local Mode (on the cluster): Manually submit a default local mode job (
--jobmode=local) to a compute node on your cluster via the batch scheduling system. If you do this, please be sure to use the
--localcores options at the end of your command line. Additionally, when you submit the job you will need to make sure that you request resources from the cluster consistent with
--localcores, and that a queue exists with nodes that have sufficient resources, so that the scheduler will be able to actually assign the job to a node. The specific mechanisms of how this is done will vary by cluster.
2. Cluster Mode (Cell Ranger and Long Ranger only): Set up a 'cluster mode' template, allowing the pipeline to submit jobs with a batch scheduling system, leveraging the parallel computing capabilities of the cluster. When running in cluster mode, there will be a small program running on the submit node that interactively spawns off jobs to your compute nodes using the scheduler. Although LSF and SGE are our officially supported batch-scheduling systems, it is also possible to do this with other systems like Slurm, Torque, etc. However, because we do not have access to every kind of cluster, we may be unable to effectively troubleshoot issues that may arise. To a certain extent this is also true for the LSF and SGE, since individual cluster configurations can vary so widely. For more information on how to set up cluster mode please see Cluster Mode for Cell Ranger and Cluster mode for Long Ranger.
The same flexibility and customization that make HPC clusters so popular and powerful also can make troubleshooting difficult. Sometimes clusters will automatically kill jobs for reasons which are not obvious, and not able to be captured by the 10x pipeline. However, useful information is often available in the cluster's own log files. In these cases, we recommend consulting with the cluster administrator, who may be able more effectively understand how the 10x software is interacting with the specific cluster installation.