Question: I have 10x linked read data for ten samples and I am looking into Long Ranger to obtain variant calls for the species that I work with. However, the reference is one consensus sequence and not in the form of two haplotypes. Is that a big issue? Secondly, can you advise me on the quality of a Long Ranger approach compared to assembling multiple 10x libraries? Is the former generally better in quality?
Answer: Your best strategy here depends on several factors including (1) the actual organism (its phylogenetic position relative to human), (2) other aspects of its genome, such as size, GC content, and repetitive fraction, (3) the quality of the existing reference (both the quality of assembly and accuracy of annotations), and (4) computational and other resources available. It is ultimately difficult to predict which approach will be better, as it varies depending on these factors as well as the biological question being asked.
Our linked reads products were originally designed with human samples in mind. Later on, our software, especially de novo solution (Supernova), grew to accommodate non-model organisms as well, but we usually see better results with mammals and birds compared to fishes, insects and plants. But this varies based on the other factors above.
1. Long Ranger is likely your best bet if you already have a high quality (chromosome-level) reference genome and accurate annotations. It is OK that the reference is a consensus and not two haplotypes, Long Ranger will phase what it can. Please see the custom reference support page for details on how to prepare your existing reference: advanced references.
For documentation on how to run Long Ranger and Loupe Browser, please see genome-exome software overview.
2. Supernova. Since you already have your linked read data, it can't hurt to try Supernova for a phased de novo assembly if you have the time and computational resources, this may be an attractive option if your existing reference genome is of poor quality. Of course, there is still the problem of annotation, which our solutions don't address directly. If you are interested in using Supernova, please carefully read Achieving Success with De Novo Assembly. Please also review Supernova performance on twenty human and nonhuman datasets. You should expect better results with mammals (http://edwardslab.blogspot.com/2019/01/we-have-new-10x-supernova-assembly-lab.html) than with axolotls or peas.
3. Third party tools. There are also de novo assembly tools available that are unsupported by us. Which ones you use may depend on what other data you have available (e.g., if you have PacBio long reads that could enable hybrid assembly: a hybrid approach for targeted assembly of homologous sequences.
You may get some good ideas by browsing the literature. We have publications listed for our Genome solution: publications with genome solution and for de novo assembly: publications with assembly solution. Some of these publications use third party analysis tools as well.
Please email [email protected] with any follow up questions you may have.