Question: Why is the D gene annotation missing in several TRB chains in the TCR data or in Heavy chains in BCR data ?
Answer: D genes for human and mouse are very short and also not very well annotated/represented in the reference. For example, in the 10x human VDJ reference, there are only 2 annotated D genes for TRB chains and 30 D genes for heavy chains. Their lengths vary between 11 to 37 base pairs.
During recombination, non-templated insertions and deletions in the D gene can result in sequences that don’t align well to the germline sequence. This can render the D gene unrecognizable in some junction sequences. Mutations in the D gene and junction that occur naturally during somatic hypermutation can also make high-confidence assignment of the D gene difficult.