Big Data Analytics 笔记整理 3

最新推荐文章于 2024-05-27 14:23:59 发布

B栋3食堂

最新推荐文章于 2024-05-27 14:23:59 发布

阅读量420

点赞数

分类专栏：数学文章标签： r语言数据挖掘 big data 大数据

本文链接：https://blog.csdn.net/MINGRAN_JIA/article/details/123242755

版权

Content

1 WHY Principal Component Analysis ?
2 principle component

1 WHY Principal Component Analysis ?

PCA is useful to:

reduce number of features (might reduce overfiting)
reduce memory or disk storage required for features
speed up execution of subsequent modelling step(s)
visualization of features for higher order model
find unknown structure in features via subsequent clustering
detect outliers
Usually in data analysis scheme:
- Scale the covariates
- Split the data into training and test set
- Apply PCA to training and test separately
- Build a model using features generated by PCA on training set
- Assess prediction accuracy using features generated by PCA on test set

2 principle component

https://blog.csdn.net/MINGRAN_JIA/article/details/122464746

by setting a threshold of variance to maintain, like 80%
Cattell’s or Kaiser’s methods
cross-validation

2.1 Concept of PCA

Suppose matrix $X~:n\times p$ , $p$ is the number of features, $\Sigma=Var(X)=X^{\mathsf{T}}X$ is the $p\times p$ covariance matrix of $X$ .

For principle component $\vec{a_i}$ , we want to project data in the direction of $\vec{a_i}$ (maximise the variance) $\Longrightarrow$ $\underset{\vec{a_i}}{max}Var(X\vec{a_i})=\underset{\vec{a_i}}{max}\vec{a_i}^{\mathsf{T}}\vec{a_i}$ , since this does not have upper bound, so:

Constraint $||\vec{a_i}||=\vec{a_i}^{\mathsf{T}}\vec{a_i}=1$ , use Lagrange fomula and Spectral decomposition of $\Sigma$

2.2 Useful result of PCA

Collossary:

$\Delta=diag(\lambda_1,...,\lambda_p)$

最低0.47元/天解锁文章

B栋3食堂

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Big Data Analytics 笔记整理 3

1 WHY Principal Component Analysis ?2 principle component2.1 Concept of PCA2.2 Useful result of PCA2.3 Steps of PCA
复制链接

扫一扫