Question: Why is the D gene annotation missing in several TRB chains in the TCR data and in some heavy chains in BCR data?
Answer: Human and mouse D genes are short sequences that are often not very well annotated/represented in the reference. For example, in the 10x Genomics human V(D)J reference, there are only 2 annotated D genes for TRB chains and 30 annotated D genes for heavy chains. Their lengths vary from 11 to 37 base pairs.
During recombination, non-templated insertions and deletions in the D gene can result in sequences that don’t align well to the germline sequence, rendering the D gene unrecognizable. Mutations in the D gene and junction that occur naturally during somatic hypermutation can also make high-confidence assignment of the D gene difficult.