Question: My data has a clonotype (say clonotype1) where the TRB chain is paired with a TRA chain and a second clonotype (say clonotype2) that has only a TRB chain whose sequence is the same as the TRB chain of clonotype1. Could you please explain the reason why cellranger vdj
assigned the same TCR chain to different clonotypes?
Answer: In Cell Ranger v4 and lower, two cells were merged into a clonotype only when the CDR3 nucleotide sequences of all productive chains in the cells were an exact match. For example, if cell1 had productive TRB and TRA chains while cell2 had a productive TRB chain (no TRA chain), they would be assigned to separate clonotypes even if the CDR3 nucleotide sequences of the TRB chains match between cell1 and cell2.
TRA is generally expressed at lower levels than TRB. Therefore, sometimes the TRA chain is not detected or not annotated as productive. This could be the cause of the missing or nonproductive TRA in cell2 in your example.
The algorithm in Cell Ranger v5 and later has been upgraded to merge cells with just the TRA or TRB chains into clonotypes that contain both chain types, thus outputting fewer, biologically relevant clonotypes.
However, there may be a scenario where the same TRB chain appears both as part of multi-chain clonotypes and as its own single-chain clonotype. The hypothetical clonotype table may look like this:
This is happening because in clonotypes 1 and 6, the same TRB chain (CASRPWDRKNILYF) is paired with different TRA chains (CASRTCANSKRTF and CAGRVNSQQYQFVTK). In clonotype 6, the TRA chain is also paired with an additional TRB chain (CASSRGTGNERFF). Because the two clonotypes (1 and 6) have the exact same TRB sequence, when the algorithm encounters a barcode with a lone TRB (not paired with a TRA or TRB) sequence matching a TRB sequence of clonotypes 1 and 6, it fails to assign it to either of the existing clonotypes. Then a new clonotype is created for this barcode.
Please note that the image above is a manufactured one and not from an actual V(D)J dataset. It may or may not reflect biologically relevant CDR3 sequences.