主成分分析 Principal Component Analysis(PCA)
1. 什么是主成分分析? What is PCA?
Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables.
意为:
主成分分析(PCA)是一种统计过程,通过一种正交变换将一组可能相关联的参数的观测值转换成一组线性互不关联的参数值,转换后的参数即被称为主成分。主成分的个数小于等于原参数的数目。
This transformation is defined in such a way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to (i.e., uncorrelated with) the preceding components.
意为:
还定义这种变换具有如下特点:第一个主成分具有可能的最大方差值(也就是,它尽可能地包含数据中的变化性),随后的主成分依次具有可能的最大方差,且满足与之前的成分正交这一限制。
The principal components are orthogonal because they are the eigenvectors of the covariance matrix, which is symmetric. PCA is sensitive to the relative scaling of the original variables.
意为:
主成分是正交的,因为它们是协方差矩阵的特征向量,而协方差矩阵是一个对称矩阵。主成分分析对原参数的相关度比较敏感。
2. 主成分分析能做什么?What can PCA do?
以下是Wikipedia中文 主成分分析 词条的叙述:
在多元统计分析中,主成分分析(英语:Principal components analysis,PCA)是一种分析、简化数据集的技术。主成分分析经常用于减少数据集的维数,同时保持数据集中的对方差贡献最大的特征。这是通过保留低阶主成分,忽略高阶主成分做到的。这样低阶成分往往能够保留住数据的最重要方面。但是,这也不是一定的,要视具体应用而定。由于主成分分