Principal component analysis
1.Introduction
Large datasets are increasingly widespread in many disciplines. In order to interpret such datasets, methods are required to drastically reduce their dimensionality in an interpretable way, such that most of the information in the data is preserved.Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss.
2.The Basic Method
3.Steps Involved in PCA
(1).Standardization
The reason why standardization is very much needed before performing PCA is that PCA is very sensitive to variances.If there are large differences between the scales of the features, then those with larger scales will dominate over those with the small scales.So, transforming the data to the same scales will prevent this problem.
(2).Calculate the covariance matrix
The classic approach to PCA is to perform the Eigen decomposition on the covariance matrix Σ, which is a d×d matrix where each element represents the covariance between two features. Note, d is the number of original dimensions of the data set.
(3).Calculae the eigenvectors and eigenvalues computation
Next step is to calculate the eigenvalues and eigenvectors from the covariance matrix. ƛ is an eigenvalue for a matrix A if it is a solution of the characteristic equation.
For each eigenvalue ƛ, a corresponding eigen-vector v, can be found by solving.
(4).Selecting The Principal Components ans forming a feature vector
We sort the Eigenvalues in descending order and then choose the top k features concerning top k Eigenvalues.The idea here is that by choosing top k we have decided that the variance which corresponds to those k feature space is enough to describe the data set. And by losing the remaining variance of those not selected features, won’t cost the accuracy much or we are OK to loose that much accuracy that costs because of neglected variance.
Next we form a feature vector which is a matrix of vectors, in our case, the eigenvectors. In fact, only those eigenvectors which we want to proceed with.
(5)Forming Principal Components
We take the transpose of the feature vector and left-multiply it with the transpose of scaled version of original dataset.
NewData = FeatureVectorT x ScaledDataT
NewData is the Matrix consisting of the principal components,
FeatureVector is the matrix we formed using the eigenvectors we chose to keep, and
ScaledData is the scaled version of original dataset.
4.Applications of Principal Component Analysis
PCA is predominantly used as a dimensionality reduction technique in domains like facial recognition, computer vision and image compression. It is also used for finding patterns in data of high dimension in the field of finance, data mining, bioinformatics, psychology, etc.
Principal component analysis(PCA)
最新推荐文章于 2023-03-04 12:01:59 发布