Question: I have pooled multiplex samples in my 5'GEX and TCR/BCR libraries using TotalSeqC or other custom methods. Can I use Cell Ranger to analyze the data and generate outputs for each sample separately?
Answer: If you have 5' cell multiplexed libraries and need to analyze the data, you may be able to use a workaround to enable the processing of cell multiplexed 5’ data with Cell Ranger. Please note that cell multiplexing for 5' Immune Profiling libraries is currently not supported by 10x.
For example, if you multiplexed 2 samples together using TotalSeqC antibodies and then generated GEX+VDJ data, using the following instructions, you can generate GEX and TCR/BCR outputs to separate out the cells from each sample.
Note: This unofficial solution is expected to work for both dual-indexed and single-indexed data.
Step 1: Demultiplex, that is, assign cells in the combined data to each sample
For this, you will need to run
cellranger multi pipeline with GEX+TotalSeqC libraries.
- If running Cell Ranger version 7.0 and above, you will not need to run Step 1a. Proceed directly to Step 1b.
a) First change a parameter in Cell Ranger code to enable it to run 5' multiplexing data. The parameters are in the file below:
Change fiveprime_multiplexing from false to true as below
fiveprime_multiplexing = true
b) Create a custom CMO reference CSV file as explained below:
For creating this file, please see the TotalSeqC section here: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/feature-bc-analysis#feature-ref
Please make sure that in this CMO reference CSV file, you define "feature_type" as Multiplexing Capture.
cellranger multi with GEX+TotalSeqC data as here:
The input CSV file may look something like this.
[gene-expression] reference-path,/home/refdata_cellranger/GRCh38-2020-A cmo-set,/home/path/to/TotalSeqC_CMO_reference.csv #This was created in step 1b [libraries] fastq_id,fastqs,feature_types Gex,/home/fastqs/Gex,Gene Expression Mux,/home/fastqs/Mux,Multiplexing Capture [samples] sample_id,cmo_ids Sample1,TotalSeqC1 # The cmo_ids are described in the CMO reference created in Step 1b Sample2,TotalSeqC2
This step will generate per sample outputs for the GEX data.
However, at this point, we have still not demultiplexed TCR/BCR data. That needs further analysis steps. The reason extra steps are needed to demultiplex TCR/BCR data is that currently, Cell Ranger does not enable the analysis of cell multiplexed TCR/BCR data. If TCR/BCR libraries are included with Multiplexing Capture libraries in the input CSV, Cell Ranger will error out. Therefore we need to analyze TCR/BCR data without Multiplexing Capture (in this case TotalSeqC) libraries after demultiplexing the samples using GEX data only. Next steps described this process.
Step 2: Generate per sample FASTQs for the GEX data
Ｗhen analyzing cell multiplexing data, Cell Ranger generates several output files that are split per sample. One of the per-sample output files is a bam file that contains reads assigned to that sample. For more details on the output files and the file structure, please see here.
For each of the demultiplexed samples (in this example 2 samples), convert the per sample bam files back to FASTQ format using the 10x tool bamtofastq.
- Notice here that
bamtofastqwill create FASTQ files with at least two folders: one for the GEX library and the other for the multiplexing library. In order to identify the folders (you want to use the GEX library files in step 3) follow the instructions on this article.
- In addition,
bamtofastqchunks FASTQ files into 50M reads, which means that if you have a total of 100M reads for your sample,
bamtofastqwill output 3 folders: 2 for GEX libraries with 50M reads each and 1 for the multiplexing library. To avoid this complexity or extra step to merge GEX FASTQs, please set the
bamtofastqto a number that is slightly higher than the number of reads you have for your sample.
Step 3: In this step run "cellranger multi" for GEX+VDJ+(optionally Antibody) data
cellranger multi pipeline for GEX+VDJ data as described here. You will need to run multi pipeline once for each sample. Points to note for this round of multi-analysis runs:
a) Note that in this step you will not use the Multiplexing Capture (TotalSeqC for hashing) library data. (Please note that if using the same library for Hashtag & Antibody, then the same FASTQs can be used for both steps 1 and 3 respectively).
b) You will use the per sample FASTQs for GEX libraries generated in Step 2. However, for VDJ FASTQs, you will use the FASTQs for the full library.
- If running Cell Ranger version 7.0 and above, you will not need to run Step 3c. Proceed directly to Step 3d.
c) Before proceeding, you may need to make an additional change to Cell Ranger code to prevent an error during runtime.
Locate the _sc_multi_defs.mro file in your installation of Cell Ranger:
In the script, locate line 441 under
CHECK_BARCODES_COMPATIBILITY_VDJ highlighted in the image below in a red box:
Change the parameter from true to false. Save the change and exit.
Remember to change the parameter back to true after this analysis, for your next Cell Ranger runs.
d) In the GEX section, use the
--force-cells option to specify the number of cells obtained per sample in Step 1. The reason why this is important is that Cell Ranger needs all raw reads (including per sample reads and background reads) for the cell calling step. Since the per sample reads are only a subset of the total reads, the cell calling step will not be reproducible when run with partial FASTQs. To enable calling as many cells as in Step 1, it is important to specify the cell counts in this step.
What Step 3 enables is for you to run GEX+VDJ analysis together and generate VDJ clonotypes per sample. When GEX and VDJ data are analyzed together in multi pipeline, the VDJ cell calls are gated based on GEX cell calls. Since the GEX data input in this step is for one individual sample, only cells for that sample will be called in VDJ data and used in generating clonotypes. At the end of this step, you will have demultiplexed per sample VDJ outputs such as clonotypes, .vloupe file, etc.
If your 5’ cell multiplexing data is generated using 10x single cell HT kits, please use Cell Ranger version 6.1 or above for data analysis. In the chemistry field of the web summary, "HT" will be indicated if the cell multiplexing data is generated with HT kits. If Cell Ranger fails to detect HT for your 5’ HT cell multiplexing data, please specify
SC5PHT for the ‘chemistry’ option in the multi config csv file and re-run Cell Ranger.
Attention: Running this pipeline involves changing code in Cell Ranger, in steps 1a and 3c. Please make sure that you change these codes back to their original values after you are done running your analysis, so that it does not confound subsequent analysis or subsequent users.
Disclaimer: This article and code-snippet are provided for instructional purposes only. 10x Genomics does not support or guarantee this workaround or code modification.