Question: How to calculate paired clonotype diversity?
Answer: The paired clonotype diversity metric measures the variety of TCR or BCR clonotypes per sample. The metric is the inverse of the Simpson's Diversity Index value calculated on paired clonotypes, where pairing groups by cell barcode.
The Simpson's Diversity index is a commonly used statistic and https://en.wikipedia.org/wiki/Diversity_index#Simpson_index provides a mathematical explanation. Immediately below on the same page is an explanation for the inverse Simpson's index, at https://en.wikipedia.org/wiki/Diversity_index#Inverse_Simpson_index, which provides the following equation.
R represents richness and is the total number of types in the dataset, and pi is the proportional abundance of each type.
The filtered contig annotations result enables recapitulating the diversity score. For each raw_clonotype_id in the filtered contig annotations, count the number of unique barcodes. Divide the counts of unique barcodes by the total number of unique barcodes to derive the proportional abundances. Then square each of the proportional abundances. One over the sum of the squared values gives the inverse Simpson’s index.
The index value ranges from one to the estimated number of cells. A value of one indicates no diversity, and a value equal to the estimated number of cells indicates maximal diversity. If clonotypes are absent, the value is set to zero. The paired clonotype diversity metric is useful to compare samples, e.g. pre and post treatment, and to assess clonal expansion.
Toy example 1
For two clonotypes F and M, we have the following unique barcode counts.
{'F': 10, 'M': 5}
Divide the counts by the sum of the counts, in this case 15, to derive the abundances.
[0.66666667, 0.33333333]
Then square the abundances.
[0.44444444, 0.11111111]
The sum of the squared abundances is 0.5556. The inverse of this is 1.7999. For this example with two clonotypes, if each clonotype had equal representation, then the diversity would be 2.
Toy example 2
For a more skewed distribution of clonotypes as follows
{'M': 32, 'F': 3}
The abundances are
[0.91428571, 0.08571429]
And the squared abundances are
[0.83591837, 0.00734694]
The sum of the squared abundances is 0.8433 and the inverse of this is 1.1859. This data set’s clonotype diversity score is closer to one and the diversity is less than the first example's.
Related Documentation
The main support site describes the paired clonotype diversity at https://support.10xgenomics.com/single-cell-vdj/software/pipelines/latest/output/metrics and clonotype grouping at https://support.10xgenomics.com/single-cell-vdj/software/pipelines/latest/algorithms/clonotyping.
Last updated: February 21, 2022
Products: Single Cell Immune Profiling