Question: How do you select the number of PCs to use for clustering and t-SNE?
Answer: By default, cellranger count
currently uses the top 10 principal components from the principal component analysis (PCA) step during secondary analysis.
One of the outputs of cellranger count/reanalyze is a variance.csv file. This output file is described in the analysis outputs support documentation. It contains the proportion of total variance explained by each principal component.
$ head -5 analysis/pca/10_components/variance.csv
PC,Proportion.Variance.Explained
1,0.0056404970744118104
2,0.0038897311237809061
3,0.0028803714818085419
4,0.0020830581822081206
You can run cellranger reanalyze
to update the clustering and t-SNE analyses using a different number of top principal components based on the plot of variance explained versus PC rank. When choosing the number of principal components that are significant, it is useful to look at the plot of variance explained as a function of PC rank - when the numbers start to flatten out, subsequent PCs are unlikely to represent meaningful variation in the data.
Outside of Cell Ranger, there is a Seurat tutorial with a useful write-up on this topic under the section "Determine statistically significant principal components." It describes various heuristics for selecting the number of top PC components to use for clustering and t-SNE. The use of the elbow plot in Seurat is consistent with the advice above in the Cell Ranger documentation (plot of variance/standard deviation as a function of the PC rank). However, as noted in the Seurat tutorial, it is also useful to also evaluate alternative approaches.