Question: What is the relationship between principal components (from PCA) and genes? Can you help explain this statistical method to me?
Answer: A principal component (PC) is not the same as a gene. Principal Component Analysis (PCA) is a statistical technique that reduces the complexity in high-dimensional data (such as gene expression data) by transforming the data into fewer dimensions while maintaining as much of the information in the data as possible. Each principal component is influenced by the expression levels of multiple genes.
PCA is a computationally tractable way to find patterns in big datasets without prior knowledge about whether the data points come from different treatment groups or have phenotypic differences. But like any statistical technique, there are important caveats. For a more in-depth explanation of PCA as applied to gene expression data please see Lever et al. (2017). (Please note that this article is not officially endorsed by or affiliated with 10x Genomics).
By default, the top 10 PCs are used for clustering and t-SNE in
cellranger count, but this can be adjusted in
cellranger reanalyze. For more information on PCA usage within Cell Ranger, please see the "Dimensionality Reduction" section in the Cell Ranger Algorithms Overview.
Lever, J., M. Krzywinski & N. Altman (2017). Points of Significance: Principal component analysis. Nature Methods 14, 641–642 doi:10.1038/nmeth.4346