Principal component analysis(PCA)

Principal component analysis
1.Introduction
Large datasets are increasingly widespread in many disciplines. In order to interpret such datasets, methods are required to drastically reduce their dimensionality in an interpretable way, such that most of the information in the data is preserved.Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss.
2.The Basic Method
在这里插入图片描述
3.Steps Involved in PCA
(1).Standardization
The reason why standardization is very much needed before performing PCA is that PCA is very sensitive to variances.If there are large differences between the scales of the features, then those with larger scales will dominate over those with the small scales.So, transforming the data to the same scales will prevent this problem.
(2).Calculate the covariance matrix
The classic approach to PCA is to perform the Eigen decomposition on the covariance matrix Σ, which is a d×d matrix where each element represents the covariance between two features. Note, d is the number of original dimensions of the data set.
(3).Calculae the eigenvectors and eigenvalues computation
Next step is to calculate the eigenvalues and eigenvectors from the covariance matrix. ƛ is an eigenvalue for a matrix A if it is a solution of the characteristic equation.
For each eigenvalue ƛ, a corresponding eigen-vector v, can be found by solving.
(4).Selecting The Principal Components ans forming a feature vector
We sort the Eigenvalues in descending order and then choose the top k features concerning top k Eigenvalues.The idea here is that by choosing top k we have decided that the variance which corresponds to those k feature space is enough to describe the data set. And by losing the remaining variance of those not selected features, won’t cost the accuracy much or we are OK to loose that much accuracy that costs because of neglected variance.
Next we form a feature vector which is a matrix of vectors, in our case, the eigenvectors. In fact, only those eigenvectors which we want to proceed with.
(5)Forming Principal Components
We take the transpose of the feature vector and left-multiply it with the transpose of scaled version of original dataset.
NewData = FeatureVectorT x ScaledDataT
NewData is the Matrix consisting of the principal components,
FeatureVector is the matrix we formed using the eigenvectors we chose to keep, and
ScaledData is the scaled version of original dataset.
4.Applications of Principal Component Analysis
PCA is predominantly used as a dimensionality reduction technique in domains like facial recognition, computer vision and image compression. It is also used for finding patterns in data of high dimension in the field of finance, data mining, bioinformatics, psychology, etc.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值