Question: Custom probes can be spiked into probe based 10x Single Cell and Visium Spatial Gene Expression assays. After running a custom probe spiked assay, how do you add these probes to Cell Ranger and Space Ranger?
Answer: You can process probe based 10x Single Cell and Visium Spatial Gene Expression data without any changes to Cell Ranger and Space Ranger. However, custom probes need to be added to Cell Ranger and Space Ranger to enable their utilization during data processing and for them to be included in the output of the pipeline. This involves adding custom probes to the probe CSV file. Additionally, if the custom probe's target gene is not part of the reference genome (e.g. the probe targets a fluorescent reporter or a viral gene), a custom reference genome must be created.
Step 1: Add the Custom Probes to a Predesigned Probe Set
To start, download the relevant predesigned assay and species specific probe set.
Append a comma-separated entry for each custom probe to the downloaded probe set CSV file. The appended entry must match the format of the original downloaded probe set CSV file. For example, the Visium Human Transcriptome Probe Set v2 and the Chromium Human Transcriptome Probe Set v1.0.1 have five columns (gene_id, probe_seq, probe_id, included, region); therefore, any new probe entry must have these five columns. Details on each column are on our support site at the following links:
- Single Cell Gene Expression Probe Set Descriptions
- Visium Spatial Gene Expression Probe Set Descriptions
The custom probe entries can be appended to the probe set CSV file using a text editor or the command line. Using the command line to append a custom set of probes for model gene - "geneA"- to the Visium Human Transcriptome Probe Set v2 is demonstrated below.
We start by first using a text editor to create a custom probe CSV file called custom_probes.csv
that contains custom probe sequences and accompanying metadata. We recommend designing multiple non-overlapping custom probes against the gene of interest, which is illustrated in the example GTF entries below.
gene_id_geneA,GGTGACACCACAACAATGCAACGTATTTTGGATCTTGTCTACTGCATGGC,gene_id_geneA|geneA|8aab555,TRUE,unspliced
gene_id_geneA,TCTGCATCTCTCTGTGGAGTACAATCTTCAAGTTTACAGCAACTCTTAGG,gene_id_geneA|geneA|8aab556,TRUE,unspliced
gene_id_geneA,AAAGCTGTTCTTAATCTCATGTCTGAAAACAAATCCTACGATGGCAGCGA,gene_id_geneA|geneA|8aab557,TRUE,spliced
A command like the following will append the custom probe information to the Visium Human Transcriptome Probe Set v2 file:
$ cat Visium_Human_Transcriptome_Probe_Set_v2.0_GRCh38-2020-A.csv custom_probe.csv > human_v2_custom_probeset.csv
In this example, we assume one of the three probes for "geneA" bridges a splice junction. In the probe_id column, the third value (i.e. 8aab557) is a hash sequence that should not match the hash sequences in the original probe set and should be unique for each custom probe. You can check if a hash sequence appears in a probe set using the command line with a command such as:
$ fgrep "8aab557" Visium_Human_Transcriptome_Probe_Set_v2.0_GRCh38-2020-A.csv
If the hash sequence is present it will output to the command line. If it is not present then nothing is displayed.
Step 2: Create A Custom Reference
This step is optional if the gene is already in the prebuilt reference.
The process of creating a custom reference is demonstrated using the spaceranger mkref
pipeline. It is important to ensure that all the genes included in the probe set are in the genomic reference. The custom reference should be based on the reference used in the creation of the original probe set. In this case, the appropriate reference for the Visium Human Transcriptome Probe Set v2 is the GRCh38 2020-A reference. You can use the FASTA file and GTF file from the appropriate prebuilt reference to make the custom reference. The following section showcases this approach. You can download prebuilt references at the following links:
In the prebuilt reference directory the FASTA file (genome.fa
) and the GTF file (genes.gtf
) are stored in the fasta
and genes
subdirectories, respectively.
path_to_prebuilt_reference/
├── fasta
│ ├── genome.fa
│ └── genome.fa.fai
├── genes
│ └── genes.gtf
├── pickle
│ └── genes.pickle
├── reference.json
└── star
├── chrLength.txt
├── chrNameLength.txt
├── chrName.txt
├── chrStart.txt
├── exonGeTrInfo.tab
├── exonInfo.tab
├── geneInfo.tab
├── Genome
├── genomeParameters.txt
├── SA
├── SAindex
├── sjdbInfo.txt
├── sjdbList.fromGTF.out.tab
├── sjdbList.out.tab
└── transcriptInfo.tab
Assuming you are in the directory where you want to create the custom reference, you can copy the files with a command like:
$ cp path_to_prebuilt_reference/fasta/genome.fa path_to_prebuilt_reference/genes/genes.gtf .
To incorporate new genes not found in the prebuilt reference, you will need to add a new entry to the GTF file (genes.gtf
) and add the corresponding genomic information to the FASTA file (genome.fa
).
For "geneA" the GTF file entry could look something like the following:
geneA unknown exon 1 755 . + . gene_id "gene_id_geneA"; transcript_id "geneA"; gene_name "geneA"; gene_biotype "protein_coding";
The genomic information, which is added to the fasta file, has the following format:
>"chromosome/contig name"
<genomic sequence>
For illustrative purposes, we will assign "geneA" the sequence for eGFP below:
>geneA
TACACACGAATAAAAGATAACAAAGATGAGTAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTT
GTTGAATTAGATGGCGATGTTAATGGGCAAAAATTCTCTGTCAGTGGAGAGGGTGAAGGTGATGCAACAT
ACGGAAAACTTACCCTTAAATTTATTTGCACTACTGGGAAGCTACCTGTTCCATGGCCAACACTTGTCAC
TACTTTCTCTTATGGTGTTCAATGCTTTTCAAGATACCCAGATCATATGAAACAGCATGACTTTTTCAAG
AGTGCCATGCCCGAAGGTTATGTACAGGAAAGAACTATATTTTACAAAGATGACGGGAACTACAAGACAC
GTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATAGAATCGAGTTAAAAGGTATTGATTTTAAAGA
AGATGGAAACATTCTTGGACACAAAATGGAATACAACTATAACTCACATAATGTATACATCATGGCAGAC
AAACCAAAGAATGGAATCAAAGTTAACTTCAAAATTAGACACAACATTAAAGATGGAAGCGTTCAATTAG
CAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTC
CACACAATCTGCCCTTTCCAAAGATCCCAACGAAAAGAGAGATCACATGATCCTTCTTGAGTTTGTAACA
GCTGCTGGGATTACACATGGCATGGATGAACTATACAAATAAATGTCCAGACTTCCAATTGACACTAAAG
TGTCCGAACAATTACTAAATTCTCAGGGTTCCTGGTTAAATTCAGGCTGAGACTTTATTTATATATTTAT
AGATTCATTAAAATTTTATGAATAATTTATTGATGTTATTAATAGGGGCTATTTTCTTATTAAATAGGCT
ACTGGAGTGTAT
Working from within the directory where the genome.fa
and genes.gtf
files were copied, the next step is to add the gene annotation data to the GTF file and the genomic sequence data to the FASTA file.
The first step is to create two files: one with the GTF information for "geneA", the second with the genomic information. For our demonstration the files are called custom_probe_gene_gtf.gtf
and custom_probe_gene_genomic_sequence.fa
.
These files are then appended to the preexisting genome.fa
and genes.gtf
file. This can be done using the command line as shown below:
$ cat genes.gtf custom_probe_gene_gtf.gtf > genes_custom_probe.gtf
$ cat genome.fa custom_probe_gene_genomic_sequence.fa > genome_custom_probe_genomic_sequence.fa
These appended files are then used to run spaceranger mkref
or cellranger mkref
, depending on the assay. We show the use of spaceranger mkref
below:
$ spaceranger mkref --genome=custom --ref-version=custom_v1 --fasta=genome_custom_probe_genomic_sequence.fa --genes=genes_custom_probe.gtf
Using the above command line the custom reference is stored in a directory called custom
.
Step 3: Check that the Correct Transcriptome is Specified in the Customized Probe Set
This step can be skipped if a custom reference is not used.
In the custom or prebuilt reference directory is a file called reference.json
. This contains key-value pairs that describe the reference. Opening the JSON file with a text editor or using a command like cat reference.json
will show all the key-value pairs.
$ cat reference.json { "fasta_hash": "6f978a9cd593a0e0fb734a92bf23d7213869114c", "genomes": [ "custom" ],
"gtf_hash.gz": "f4844dc73a115465e3aa24b20b845d94d8af6711", "input_fasta_files": [ "genome_custom_probe_genomic_sequence.fa" ], "input_gtf_files": [ "genes_custom_probe.gtf" ], "mem_gb": 16, "mkref_version": "spaceranger mkref spaceranger-2.1.0\nCopyright (c) 2021 10x Genomics, Inc. All rights reserved.", "threads": 2, "version": [ "custom_v1" ] }
In the example above, the keys genomes
and versions
contain the values custom
and custom_v1
, respectively.
The final step is to open the customized probe set CSV file created in step 1 and edit the header. For our example, the human_v2_custom_probeset.csv
currently contains the following information:
#probe_set_file_format=2.0 #panel_name=Visium Human Transcriptome Probe Set v2.0 #panel_type=predesigned #reference_genome=GRCh38 #reference_version=2020-A
The lines #reference_genome=GRCh38
and #reference_version=2020-A
need to be changed to #reference_genome=custom
and #reference=custom_v1
. For record keeping purposes, the line#panel_name=Visium
Human Transcriptome Probe Set v2.0
could be updated to something like #panel_name=My custom panel
.
For our example, the final file now starts with the following information:
#probe_set_file_format=2.0 #panel_name=My cusom panel #panel_type=predesigned #reference_genome=custom #reference_version=custom_v1
Step 4: Run the Pipeline with the Customized Probe Set
Now that the custom probes have been added to the probe set Cell Ranger and Space Ranger can be run as usual. The only change is that you will use the customized probe set created in step 1 and checked in step 3. If a custom reference was created this will also be used when processing the data.
In our example, the command line could look something like the following:
spaceranger count \ --id=test_run \ --transcriptome=/path_to_custom_reference/custom \ --probe-set=/path_to_custom_probeset/humanv2_custom_probeset.csv \ --fastqs=/path_to_fastq_files/ \ --sample=sample_name \ --cytaimage=/path_to_CytAssist_image/myimage.tif \ --image=/path_to_HiRes_image/hires_image.tif \ --slide=V42A20-353 \ --area=A1
Starting from Cell Ranger v7.0, the multi
module processes FRP or FLEX data. Documentation is on the Support Site, which can be accessed using this link. Briefly, to add the custom probe-set to the multi
module, specify the path to the 'probe-set' CSV under the [gene-expression] section of the multi config CSV .
For more information on custom probe design, please refer to the following technical note: Custom Probe Design for Visium Spatial Gene Expression and Chromium Single Cell Gene Expression Flex.
Please note that while no impact on assay performance is anticipated, the use of custom probes in these assays is not supported or validated by 10x Genomics. 10x Genomics cannot guarantee that custom probes will yield data comparable to that from the whole transcriptome panel.
Disclaimer: All code-snippets are provided as-is for instructional purposes only. 10x Genomics does not support nor guarantee the code.
Products: Single Cell Gene Expression Flex, Spatial Gene Expression
Last Modified: August 29, 2023