Question: I got the alert in my web_summary.html: "High fraction of reads coming from barcodes with very high UMI counts" What does this mean?
During sample processing, proteins can clump together forming large aggregate molecules, like we see in the figure below:
These aggregate molecules can end up distributing into only a few GEMs, where they are then barcoded. This leads to the appearance of exceptionally high protein UMI counts in those cells.
What happens to these cells?
Since the UMI counts in these cells are coming from aggregates and not real proteins on the cell surface, we exclude these barcodes from the final matrix, t-SNE plot, and .cloupe file.
Cells with potential protein aggregates are identified using 2 criteria:
1) The cell barcode has more than 10k reads
2) More than 50% of the reads are corrected for that barcode.
This barcode removal process is described in more detail on the Cell Ranger 'Antibody Algorithms' page.
When barcodes are excluded because protein aggregates have been identified, we output the error message above. The idea is that these cells might confound interpretation of the antibody data since the antibody levels in these cells are a technical artifact and artificially elevated. You can still find the data for all cells, including those excluded in the unfiltered 'raw_feature_barcode _matrix' in the output directory of Cell Ranger.
In all cases that we have seen with antibody aggregates, only a small number of cells are excluded (<20), but the number of antibody reads lost with the aggregate molecules is very high.
You can find out the number of barcodes that were filtered out due to potential aggregates from this file:
In this file the metric, "ANTIBODY_number_highly_corrected_GEMs" tells you the number of filtered cells with high UMI correction rate.
What can I do to solve this issue?
See the article How can I optimize my TotalSeq™ antibody labeling protocol?.