Question: How can I build a custom reference for V(D)J using the fetch-imgt tool for non-human/mouse experiments
Answer: A custom reference for V(D)J can be built by using one of the methods described here. When trying to build custom references using the fetch-imgt method for non-human/mouse species, it is common to find customers encounter errors such as:
None of the C-regions are found in the reference
One of the reasons for the above error is the incompatibility in version numbers between the two databases, GENE-DB and LIGM-DB, found on the IMGT website. The fetch-imgt
command (i.e. cellranger-x.y.z/lib/bin/fetch-imgt
) uses a query version number that fetches from LIGM-DB, which may have no C genes for particular species, while GENE-DB may have it for the same species. For such cases, this article provides some guidance to build a custom reference using the fetch-imgt
script using an example.
Eg: Building a V(D)J reference for Rhesus Monkey
If we want to build a reference for Rhesus Monkey (Macaca mulatta) - then, some changes need to be made in the fetch-imgt
script as illustrated below.
We are using c_query=14.1
for all species except mouse for which version is 7.2. This query version does not lead to any C genes for the Rhesus monkey. But if query number 7.2 is used in place of 14.1, then we get C genes for the Rhesus monkey and construct the reference successfully and that cellranger vdj
will not error out. Please see the detailed steps below for the No C genes error if it occurs on an IG dataset, for instance.
Step 1: Change the below lines in the script using any text editor
if args.species == "Mus musculus":
c_query2 = "7.2"
else:
c_query2 = "14.1"
to,
if args.species == "Mus musculus":
c_query2 = "7.2"
elif args.species == "Macaca mulatta":
c_query2 = "7.2"
else:
c_query2 = "14.1"
Step 2: Run the modified fetch-imgt
to make reference as illustrated here.
Step 3: Ensure to revert back the script changes to its original state.
Note: The aforementioned version numbers are from IMGT and are based on this page here. The IMGT APIs vary based on versions and/or species and it was necessitated to fetch reference sequences from different APIs. So you will notice that the version numbers are different for different gene groups. Also depending on the dataset, i.e. TR or IG, one can tweak the version numbers under the function for GENE-DB queries (see below image), for their species of interest to see if they get the C-regions for their dataset.
(from fetch-imgt code, Cell Ranger v6.1.2)
Disclaimer: Modifications to the Cell Ranger code are not officially supported. This code modification is provided as-is for instructional purposes only. 10x Genomics does not support or guarantee the code.
Products: Single Cell Immune Profiling