Question: I successfully created a custom reference library. However, when I run cellranger count
or cellranger-arc count
it fails with the following error messages:
In the debug tarball log file
Log message:
Job failed in stage code
signal: segmentation fault (core dumped)
At the ALIGN_and_COUNT stage
[stderr] [E::sam_hrecs_update_hashes] Duplicate entry "contigABC" in sam header Job failed in stage code
Where contigABC is a generic contig entry name.
Answer: This error occurs when contig entries are duplicated in the custom reference FASTA file. If two or more contig entries have the same name but different sequences, then re-name the contigs, so that there is no duplication. If all these entries have identical contig names and sequences, then retain only one entry.
Below are two approaches using the command line, where you can check the reference library's contig entry names. For these examples, we will assume your custom FASTA file is named custom_genome.fa
.
Step 1: Check the FASTA file or FASTA index file for duplicate entries
Checking the FASTA file with grep
-
Eg:
$ grep ">" custom_genome.fa > contig_list.txt
$ cat contig_list.txt | sort | uniq -d
Checking the FASTA index* file -
Eg:
$ cat custom_genome.fa | cut -f1 > contig_list.txt
$ cat contig_list.txt | sort | uniq -d
*If the index file does not exist you can make it using samtools.
$ samtools faidx custom_genome.fa
Step 2: Manually check contig_list.txt
and custom_genome.fa
using your preferred method (e.g.less
,nano
,vi
, or cat
are all good command-line options) to confirm if both the contig entry name and sequence are duplicated or if just the contig entry name is duplicated
Step 3: Once the FASTA file is corrected remake the reference library using cellranger mkref
.
Disclaimer: The code-snippets provided are as-is for instructional purposes only. 10x Genomics does not support nor guarantee the code.
Products: Single Cell Gene Expression; Single Cell Multiome ATAC + Gene Expression