Content
1 WHY Principal Component Analysis ?
PCA is useful to:
- reduce number of features (might reduce overfiting)
- reduce memory or disk storage required for features
- speed up execution of subsequent modelling step(s)
- visualization of features for higher order model
- find unknown structure in features via subsequent clustering
- detect outliers
- Usually in data analysis scheme:
- Scale the covariates
- Split the data into training and test set
- Apply PCA to training and test separately
- Build a model using features generated by PCA on training set
- Assess prediction accuracy using features generated by PCA on test set
2 principle component
https://blog.csdn.net/MINGRAN_JIA/article/details/122464746
- by setting a threshold of variance to maintain, like 80%
- Cattell’s or Kaiser’s methods
- cross-validation
2.1 Concept of PCA
Suppose matrix X : n × p X~:n\times p X :n×p, p p p is the number of features, Σ = V a r ( X ) = X T X \Sigma=Var(X)=X^{\mathsf{T}}X Σ=Var(X)=XTX is the p × p p\times p p×p covariance matrix of X X X.
For principle component a i ⃗ \vec{a_i} ai, we want to project data in the direction of a i ⃗ \vec{a_i} ai (maximise the variance) ⟹ \Longrightarrow ⟹ m a x a i ⃗ V a r ( X a i ⃗ ) = m a x a i ⃗ a i ⃗ T a i ⃗ \underset{\vec{a_i}}{max}Var(X\vec{a_i})=\underset{\vec{a_i}}{max}\vec{a_i}^{\mathsf{T}}\vec{a_i} aimaxVar(Xai)=aimaxaiTai, since this does not have upper bound, so:
Constraint ∣ ∣ a i ⃗ ∣ ∣ = a i ⃗ T a i ⃗ = 1 ||\vec{a_i}||=\vec{a_i}^{\mathsf{T}}\vec{a_i}=1 ∣∣ai∣∣=aiTai=1, use Lagrange fomula and Spectral decomposition of Σ \Sigma Σ
2.2 Useful result of PCA
Collossary:
Δ = d i a g ( λ 1 , . . . , λ p ) \Delta=diag(\lambda_1,...,\lambda_p) Δ=diag(λ1,...,λ