Question: Can I aggregate data from different samples or staining protocols using spaceranger aggr?
Answer: The short answer is no. The spaceranger aggr pipeline is for combining multiple Capture Areas of consecutive sections from the same tissue block. We do not recommend aggregating data from different sample types or staining protocols in Space Ranger, because the data is inherently subject to batch effects. To reiterate, we do not recommend using spaceranger aggr on these sample types because, as of this writing, Space Ranger v1.2.1 does not provide a batch effect correction solution.
However, you can use third party tools to correct batch effects.
As an example, here is a UMAP projection after running spaceranger aggr on expression data from two differently stained brain sections from different mice visualized in Loupe Browser. In gold are the H&E-stained mouse brain cells, and in blue are the IF-stained mouse brain cells.
There is little overlap in the UMAP projection of these two samples. This means these samples are not likely to be clustering according to biological signal differences, but rather to batch effects.
To correct the batch effect seen here, we used the Harmony algorithm implemented in the Seurat R package. The harmony algorithm is described in
Korsunsky, I., Millard, N., Fan, J. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods 16, 1289–1296 (2019). https://doi.org/10.1038/s41592-019-0619-0
There is also a brief tutorial here.
After batch effect correction and re-clustering, we exported the data from R and imported them back into Loupe Browser.
The samples are now well mixed.
Now the clustering appears to be consistent between the samples, as we would expect from biologically-comparable samples.
A few things to watch out for with batch effect correction.
- First confirm that there is a batch effect that needs correcting
- Be careful not to “correct” away the biological signal
- Batch correction should not be used to try and save failed experiments
- Different tools may perform better on different data sets, so try a variety of methods
There are multiple batch effect correction algorithms that may have worked in this example. For an interesting review of batch correction algorithms, see
Tran, H.T.N., Ang, K.S., Chevrier, M. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol 21, 12 (2020). https://doi.org/10.1186/s13059-019-1850-9