Question: I noticed the 10X GRCh38 reference for Long Ranger includes the hs38d1 decoy. Is it important that this is included?
Answer: The hs38d1 decoy contains DNA viral sequences (e.g. Epstein-Barr Virus) as well human genomic sequence that could not be placed on chromosomes when the reference genome was put together. Much of the decoy consists of repeats that are difficult to assemble. Including the decoy in your analysis will improve accuracy. If a read aligns to the decoy better than to the assembly, then anywhere it aligns to in the assembly in the absence of the decoy is going to be the wrong place and possibly lead to false positives.
Including the decoy sequence also decreases alignment time. Reads that came from regions of the genome represented by the decoy will quickly map to the decoy avoiding wasted compute power trying to align these reads where they don't belong.
For these two reasons we recommend using the hs38d1 decoy with the 10X GRCh38 reference.
You can find our pre-made Long Ranger references on the Long Ranger downloads page.