Question: We are working with mice that express human V(D)J segments and mouse constant regions. Will the cellranger vdj
pipeline work for this data type?
Answer: The 5' Chromium Next GEM Single Cell Immune Profiling Solution does not support data from humanized mice (that expresses mouse constant genes and human V(D)J genes) as we do not have internal data to validate the pipeline.
In theory, the cellranger vdj
pipeline should work with data from humanized mouse samples. However, you must create a custom reference that contains mouse constant genes along with human V(D)J genes. All commands and suggestions provided in this article are for instructional purposes only and are not officially supported by 10x Genomics.
The V(D)J reference consists of a FASTQ sequence with all V D J C genes and their annotations. For example, the directory structure of the V(D)J reference for mouse looks like this:
$tree vdj_GRCm38_alts_ensembl-3.1.0
vdj_GRCm38_alts_ensembl-3.1.0
├── fasta
│ ├── regions.fa
│ └── supp_regions.fa
└── reference.json
The key file, fasta/regions.fa
, contains the sequences of the V,D,J,C gene segments. The annotations for these genes are in the FASTA header in 10x Genomics-specific format as described here.
For the pipeline to recognize mouse C genes and human V,D,J genes, you could create a custom reference that contains mouse C genes in addition to human V,D,J genes. These steps are meant to provide a framework for combining the references:
# Copy the human reference to a new location and rename it
cp -r refdata-cellranger-vdj-GRCh38-alts-ensembl-3.1.0 humanized-mouse
# You can choose to change the metadata in humanized-mouse/reference.json file so that the reference name is changed in output files
For example below is human reference
{
"fasta_hash": "10aa54c58952f920c072270fc459d1e304c2ba3c2da20ac80528064f638dc1f1",
"genomes": "vdj_GRCh38_alts_ensembl-3.1.0",
"gtf_hash": "b2b50cc12da4d2bda69207aa7fd51bf648826d0d2f39199e87922bf107d81ed0",
"input_fasta_files": "release-94/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa",
"input_gtf_files": "release-94/gtf/homo_sapiens/Homo_sapiens.GRCh38.94.chr_patch_hapl_scaff.gtf",
"mkref_version": "",
"type": "V(D)J Reference",
"version": "3.1.0"
}
Change to
{
"fasta_hash": "10aa54c58952f920c072270fc459d1e304c2ba3c2da20ac80528064f638dc1f1",
"genomes": "HumanizedMouse-3.1.0",
"gtf_hash": "b2b50cc12da4d2bda69207aa7fd51bf648826d0d2f39199e87922bf107d81ed0",
"input_fasta_files": "release-94/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa",
"input_gtf_files": "release-94/gtf/homo_sapiens/Homo_sapiens.GRCh38.94.chr_patch_hapl_scaff.gtf",
"mkref_version": "",
"type": "V(D)J Reference",
"version": "3.1.0"
}
#First extract the C genes from the mouse reference
grep -A 1 "C-REGION" vdj_GRCm38_alts_ensembl-3.1.0/fasta/regions.fa | grep -v "^-" > mouse_cgenes.fa
# Each gene in the fasta file has a unique identifier. Since gene ids might overlap between mouse and human, give some new id to the mouse genes.
# Here I choose to add 1000 to the mouse gene id since the highest id in the human reference is about 700 and so original mouse_id+1000 should be a unique number to add to human reference
perl -ne '$l=$_; if($l=~m/^>([0-9]+)(.+)/){$id=$1+1000;print ">".$id.$2."\n"}else{print}' < mouse_cgenes.fa > mouse_cgenes_newid.fa
# add the mouse C genes to fasta in the humanized reference folder
cat mouse_cgenes_newid.fa >> humanized-mouse/fasta/regions.fa
# remove temporary files
rm -f mouse_cgenes.fa
rm -f mouse_cgenes_newid.fa
Now you can run your data against this modified reference.
Note: If you want to create a custom reference for non-human/mouse species, please see our step-by-step instructions here.
Products: Single Cell Immune Profiling