Question: I have pooled multiplex samples in my 5'GEX and TCR/BCR libraries using TotalSeqC or other custom methods. Can I use Cell Ranger to analyze the data and generate outputs for each sample separately?
Answer: If you have 5' cell multiplexed libraries and need to analyze the data, you may be able to use a workaround to enable the processing of cell multiplexed 5’ data with Cell Ranger. Please note that cell multiplexing for 5' Immune Profiling libraries is currently not supported by 10x.
For example, if you multiplexed 2 samples together using TotalSeqC antibodies and then generated GEX+VDJ data, using the following instructions, you can generate GEX and TCR/BCR outputs to separate out the cells from each sample.
Step 1: Demultiplex, i.e. assign cells in the combined data to each sample.
For this, you will need to run "cellranger multi" pipeline with GEX+TotalSeqC libraries.
a) First change a parameter in Cell Ranger code to enable it to run 5' multiplexing data. The parameters are in the below file:
Change fiveprime_multiplexing from false to true as below
fiveprime_multiplexing = true
b) Create a custom CMO reference CSV file as explained below:
For creating this file, please see the TotalSeqC section here: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/feature-bc-analysis#feature-ref
c) Run cellranger multi with GEX+TotalSeqC data as here:
The input CSV file may look something like this.
[gene-expression] reference-path,/home/refdata_cellranger/GRCh38-2020-A cmo-set,/home/path/to/TotalSeqC_CMO_reference.csv #This was created in step 1b [libraries] fastq_id,fastqs,feature_types Gex,/home/fastqs/Gex,Gene Expression Mux,/home/fastqs/Mux,Multiplexing Capture [samples] sample_id,cmo_ids Sample1,TotalSeqC1 # The cmo_ids are described in the CMO reference created in Step 1b Sample2,TotalSeqC2
This step will generate per sample outputs for the GEX data.
However, at this point, we have still not demultiplexed TCR/BCR data. That needs further analysis steps. The reason extra steps are needed to demultiplex TCR/BCR data is that currently, Cell Ranger does not enable the analysis of cell multiplexed TCR/BCR data. If TCR/BCR libraries are included with Multiplexing Capture libraries in the input CSV, Cell Ranger will error out. Therefore we need to analyze TCR/BCR data without Multiplexing Capture (in this case TotalSeqC) libraries after demultiplexing the samples using GEX data only. Next steps described this process.
Step 2: Generate per sample FASTQs for the GEX data
Ｗhen analyzing cell multiplexing data, Cell Ranger generates several output files that are split per sample. One of the per-sample output files is a bam file that contains reads assigned to that sample. For more details on the output files and the file structure, please see here.
For each of the demultiplexed samples (in this example 2 samples), convert the per sample bam files back to FASTQ format using the 10x tool bamtofastq.
Step 3: In this step run "cellranger multi" for GEX+VDJ data.
Run "cellranger multi" pipeline for GEX+VDJ data as described here. You will need to run multi pipeline once for each sample. Points to note for this round of multi analysis runs:
a) Note that in this step you will not use the Multiplexing Capture (TotalSeqC for hashing) library data.
b) You will use the per sample FASTQs for GEX libraries generated in Step 2. However, for VDJ FASTQs, you will use the FASTQs for the full library.
c) In the GEX section, use the --force-cells option to specify the number of cells obtained per sample in Step 1. The reason why this is important is that Cell Ranger needs all raw reads (including per sample reads and background reads) for the cell calling step. Since the per sample reads are only a subset of the total reads, the cell calling step will not be reproducible when run with partial FASTQs. To enable calling as many cells as in Step 1, it is important to specify the cell counts in this step.
What Step 3 enables is for you to run GEX+VDJ analysis together and generate VDJ clonotypes per sample. When GEX and VDJ data are analyzed together in multi pipeline, the VDJ cell calls are gated based on GEX cell calls. Since the GEX data input in this step is for one individual sample, only cells for that sample will be called in VDJ data and used in generating clonotypes. At the end of this step, you will have demultiplexed per sample VDJ outputs such as clonotypes, .vloupe file, etc.
Disclaimer: This article and code-snippet are provided for instructional purposes only. 10x Genomics does not support or guarantee this workaround or code modification.