Question: What genes should I filter using the mkgtf
tool when making a custom reference for Cell Ranger?
Answer: In order to create a custom reference, you will start with a GTF file that contains gene annotations. We recommend filtering the GTF file so that it contains only gene categories of interest by using the cellranger mkgtf
tool. Which genes to filter depends on your research question. The attributes used for filtering in pre-built 10x Genomics references include:
- Protein-coding genes (
--attribute=gene_biotype:protein_coding
) - Long intergenic noncoding RNAs (
--attribute=gene_biotype:lincRNA
) - Antisense (
--attribute=gene_biotype:antisense
) - Psuedogenes (
--attribute=gene_biotype:pseudogene
) - V(D)J germline genes, for example:
--attribute=gene_biotype:IG_V_gene
--attribute=gene_biotype:TR_V_gene
For more information please see Using Custom References.
Note that Cell Ranger does not use reads that map to multiple genes for expression (UMI) counting, so having overlapping genes in your reference will result in ignored reads. From this perspective, your reference should only have a small number of overlapping gene annotations.
Please refer to our step-by-step tutorial on creating a custom reference.
Related Article: Common mkref errors when building custom reference from NCBI, UCSC or RefSeq genomes
Product: Single Cell Gene Expression