Question: What genes should I filter using the mkgtf tool when making a custom reference for Cell Ranger?
Answer: In order to create a custom reference, you will start with a GTF file that contains gene annotations. We recommend filtering the GTF file so that it contains only gene categories of interest by using
cellranger mkgtf tool. Which genes to filter depends on your research question. The attributes used for filtering in pre-built 10x references include:
- Protein-coding genes (--attribute=gene_biotype:protein_coding)
- Long intergenic noncoding RNAs ( --attribute=gene_biotype:lincRNA)
- Antisense (--attribute=gene_biotype:antisense)
- Psuedogenes (--attribute=gene_biotype:pseudogene)
- V(D)J germline genes, for example:
For more information please see Using Custom References.
Of note is that Cell Ranger does not use reads that map to multiple genes towards expression (UMI) counting. Therefore having overlapping genes in your reference will lead to reads being ignored. From this perspective, your reference should have only a small number of overlapping gene annotations.
Please refer to our step-by-step tutorial on creating a custom reference.