Question: I got the alert in my web_summary.html: "High fraction of reads coming from barcodes with very high UMI counts" What does this mean?
Answer:
During sample processing, proteins can clump together forming large aggregate molecules, like we see in the figure below:
These aggregate molecules can end up distributing into only a few GEMs, where they are then barcoded. This leads to the appearance of exceptionally high protein UMI counts in those cells.
What happens to these cells?
Since the UMI counts in these cells are coming from aggregates and not real proteins on the cell surface, we exclude these barcodes from the final matrix, t-SNE plot, and .cloupe file.
Cells with potential protein aggregates are identified and removed from the filtered feature barcode matrix. This barcode removal process is described in more detail on the Cell Ranger 'Antibody Algorithms' page.
When barcodes are excluded because protein aggregates have been identified, we output the error message above. The idea is that these cells might confound interpretation of the antibody data since the antibody levels in these cells are a technical artifact and artificially elevated. You can still find the data for all cells, including those excluded in the unfiltered 'raw_feature_barcode _matrix' in the output directory of Cell Ranger.
In most cases that we have seen with antibody aggregates, only a small number of cells are excluded (<20), but the number of antibody reads lost with the aggregate molecules is very high.
You can find out more details about the barcodes that were filtered out due to potential aggregates from the following output file generated by Cell Ranger.
antibody_analysis/aggregate_barcodes.csv
What can I do to solve this issue?
See the article How can I optimize my TotalSeq™ antibody labeling protocol?.