Question: Can I use aggr
to combine CellPlexed and non-CellPlexed data? Will there be batch effects?
Answer: The cellranger aggr
pipeline supports the aggregation of 3' CellPlex v3.1 data with non-CellPlex Single Cell 3’ v3.1 data. Below are some considerations.
Running the pipeline:
Cell Ranger requires identical references (including feature/CMO reference) when combing data using aggr
pipeline. Therefore, you will need to run the non CellPlex data with cellranger count
or cellranger multi
in feature barcode mode with CellPlex tags (CMO) feature reference. You can find the feature reference for CMO here.
Option 1) If you use cellranger count
then you will need to input the CMO reference and use --no-libraries
option. An example of the command is below.
cellranger count --id=pbmc_1k_count \
--transcriptome=/path/to/transcriptome/GRCh38-2020-A \
--fastqs=/path/togex/fastqs/pbmc_1k_v3_fastqs/ \
--sample=pbmc_1k_v3 \
--feature-ref=/path/to/cmo-reference/cmo-ref.csv \
--no-libraries
The cellranger count
pipeline will generate a molecule_info.h5
. This file is used as input to cellranger aggr
pipeline.
Option 2) If you use cellranger multi
then you will need to input CMO feature reference in the [feature] section of the multi config file. An example of the config file is below.
[gene-expression]
reference,/path/to/transcriptome/GRCh38-2020-A
[feature]
reference,/path/to/cmo-reference/cmo-ref.csv
[libraries]
fastq_id,fastqs,feature_types
pbmc_1k_v3,/path/togex/fastqs/pbmc_1k_v3_fastqs/,Gene Expression
The cellranger multi
pipeline generates a per sample sample_molecule_info.h5
file which is equivalent to the molecule_info.h5
file from a cellranger count
run. These files can be used as input to cellranger aggr
pipeline.
Batch effects:
The CellPlex protocol has additional wash steps. These wash steps may lead to depletion of more ambient mRNA from a CellPlexed sample compared to a non multiplexed sample. This may then introduce small batch effects between CellPlex and non-CellPlexed data. However, in our observation from internal data, the batch effects are fairly small and batch effect correction is not needed in most cases. If the results for each sample reveal obvious differences between the CellPlex vs. non-CellPlex data, you can try cellranger aggr
with chemistry batch correction, which may improve the mixing of the batches in the t-SNE visualization and clustering results.
If you are aggregating 3' v3.1 CellPlex samples with other chemistries (3’v2, 5’v2, etc.), you can treat the data similar to 3'v3.1 chemistry. For example, we recommend using chemistry batch correction when aggregating 3’v3.1 with 3’v2 chemistry. More information can be found in this knowledge base article: Can I aggregate gene expression data from different chemistries?