Question: I was running Cell Ranger in cluster mode with an LSF template when it crashed with this error:
Job failed in stage code
exit status 1
What is causing this error, and how can I fix it?
Answer: We have observed this error when the system python is being used on the jobs submitted to the cluster rather than the miniconda python that comes bundled with Cell Ranger. Here is a real example from
cellranger mkfastq. First notice the version of python specified in jobinfo from fork0/chunk0 of MAKE_FASTQS_PREFLIGHT:
"version": "2.7.5 (default, Aug 4 2017, 00:39:18) \n[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]"
By contrast, here is the corresponding excerpt from the jobinfo of fork0/chunk0 of MAKE_FASTQS_PREFLIGHT_LOCAL:
"version": "2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:09:15) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]"
Most of the LSF, SGE, and PBS templates contain the "-V" option to propagate the user's environment to the nodes. However, in some cluster configurations these environmental variables are obliterated when the shebang #!/usr/bin/env bash is called and a new shell starts.
One solution is to explicitly source the sourceme.bash file within the template file. This ensures that the environment gets properly set up again. Modify the end of your .template file as follows and try again:
If you still encounter problems, please upload your mri.tgz files to firstname.lastname@example.org along with the template that you are using and a description of your problem.