Question: How can we add genes to a reference package for Cell Ranger?
Answer: To add genes to an existing Cell Ranger reference package, such as the ones available on our website, there are three steps:
First, add the additional FASTA sequence records to the fasta/genome.fa
file.
Second, update the GTF file (genes/genes.gtf
). The GTF file format is essentially a list of records, one per line, each comprising nine tab-delimited non-empty fields.
Column | Name | Description |
---|---|---|
1 | Chromosome | Must refer to a chromosome/contig in the genome fasta. |
2 | Source | Unused. |
3 | Feature | Cell Ranger count only uses rows where this line is exon . |
4 | Start | Start position on the reference (1-based inclusive). |
5 | End | End position on the reference (1-based inclusive). |
6 | Score | Unused. Suggested value ".". |
7 | Strand | Strandedness of this feature on the reference: + or - . |
8 | Frame | Unused. Suggested value ".". |
9 | Attributes | A semicolon-delimited list of key-value pairs of the form key "value" . The attribute keys transcript_id and gene_id are required; gene_name is optional, but if present will be preferentially displayed in reports. |
Example:
-
mylocus annotation exon 100 200 . + . gene_id "mygene"; transcript_id "mygene";
Third, after adding the necessary records to your FASTA file and the additional lines to your GTF file, run cellranger mkref
as normal.
For more information please see the Adding one or more genes to your reference section on the Using Custom References page. Please find a tutorial here on Building custom references that illustrates the addition of new genes.
Related Article: Common mkref errors when building custom reference from NCBI, UCSC, or RefSeq genomes
Product: Single Cell Gene Expression