Question: Can I perform shallow sequencing on 3’ Cell Multiplexing libraries to assess the quality of my CellPlex data?
Answer: Shallow sequencing can be performed to obtain an initial assessment of data quality prior to performing deeper sequencing. A minimum sequencing depth of ~500 to ~1,000 reads per cell for the CellPlex library and ~500 to ~10,000 reads per cell for the Gene Expression library is a suggested starting point.
Note that Cell Ranger cannot analyze data for CellPlex libraries alone. Therefore, it is important to sequence both the Gene Expression and CellPlex libraries.
Shallow sequencing can provide an initial assessment of data quality. However, not all metrics will be accurate at low sequencing depths. Further details are provided below for interpreting metrics at low sequencing depths.
1) Shallow sequencing of 3’ Gene Expression libraries
When analyzing CellPlex sequence data, Cell Ranger first performs cell calling using the Gene Expression data. A recommended depth for shallow sequencing of Gene Expression libraries is ~500 to ~10,000 reads/cell. For some sample types, ~500 reads/cell will be sufficient for reasonably accurate cell calling. However, for other sample types, cell counts will be underestimated at shallow sequencing depths. For further details on cell calling at different sequencing depths, see: Can I perform shallow sequencing to assess the quality of Single Cell 3' Gene Expression libraries?
2) Shallow sequencing of 3’ CellPlex libraries
A recommended starting point for shallow sequencing of 3’ CellPlex libraries is ~500-1,000 reads/cell. When performing shallow sequencing of CellPlex libraries, we recommend viewing the “Histogram of CMO Count” in the Cell Ranger Multi Web Summary file, as seen in the example below of a dataset that was downsampled to different sequencing depths:
- In a good quality dataset, we expect to see a clear separation between the background peak (left) and the foreground peak(s) (right) in the histogram.
- A shallow depth of ~500-1,000 reads/cell for the CellPlex library is often sufficient to evaluate the shape of this histogram. If the background and foreground peaks are well-separated, it is likely that the CellPlex assay performed well and we would recommend proceeding with deeper sequencing.
- For some sample types, sequencing more than 500-1,000 reads/cell may be necessary to properly assess the shape of the histogram, given that different sample types may have different peak numbers and peak shapes.
- If the “Fraction CMO Reads Usable” is low (eg. due to high background noise from unbound CMOs, or poor sequencing quality), sequencing more than 500-1,000 reads/cell may be necessary to properly assess the shape of the histogram. The example dataset shown above has a Fraction CMO Reads Usable of ~80%.
- If there is no clear separation between background and foreground peaks, optimization of upstream steps may be required (eg. sample preparation, CMO labeling, and library preparation protocols). For further guidance on CellPlex assay optimization, see: How can I reduce background from unbound CMOs using the 3’ CellPlex kit for Cell Multiplexing?
Note that other CellPlex metrics may not be accurate at low sequencing depths. For example, the “t-SNE Projection of Cells by CMO” may show unusual clustering phenotypes at shallow depths:
Furthermore, CMO tag assignment may not be accurate at shallow sequencing depths. Cell Ranger will attempt to assign cells to CMO tags at shallow depths; however, these assignments may not be accurate and should not be used for downstream data analysis and interpretation. For accurate CMO tag assignments, we recommend performing deeper sequencing to ~5k reads/cell (or ~1k usable reads/cell*) for the CellPlex library.
*Additional notes on estimating the depth required for accurate CMO tag assignment:
The "fraction of CMO reads usable" is defined as: the fraction of read pairs that contain a recognized CMO sequence, a valid UMI, and a cell-associated barcode. For accurate CMO tag assignment, we recommend >1,000 usable reads/cell. This can typically be achieved by sequencing to our general recommended depth of 5,000 raw reads/cell. However, given that the “fraction CMO reads usable” varies between datasets, more or less depth may be required to reach 1,000 usable reads per cell in a given dataset. If shallow sequencing is performed, this can be used to estimate how much additional sequencing would be needed to reach 1,000 usable reads/cell.
To perform these calculations, first obtain the “mean reads per cell-associated barcode” (ie. the raw reads per cell) and the “fraction CMO reads usable” from the Web Summary, as shown in this example:
Then, calculate the usable reads/cell with the following formula:
- usable reads/cell = raw reads/cell * fraction CMO reads usable
- usable reads/cell = 513 * 77.59%
- usable reads/cell = 398
Thus, this dataset has 398 usable reads/cell when sequenced to a depth of 513 raw reads/cell. To achieve 1,000 usable reads/cell for accurate CMO tag assignments, the CellPlex library would need to be sequenced at least ~2.5 fold deeper.
Products: Single Cell Gene Expression, CellPlex