Stanford ML - Lecture 10 - Dimensionality Reduction

最新推荐文章于 2020-12-17 17:46:06 发布

Quebradawill

最新推荐文章于 2020-12-17 17:46:06 发布

阅读量786

点赞数

分类专栏： Machine Learning ML-Stanford-Andrew Ng

本文链接：https://blog.csdn.net/qiudw/article/details/8699756

版权

Machine Learning 同时被 2 个专栏收录

19 篇文章 0 订阅

订阅专栏

ML-Stanford-Andrew Ng

12 篇文章 0 订阅

订阅专栏

1. Motivation I: Data Compression

Reduce data from 2D to 1D

2. Motivation II: Data Visualization

3. Principal Component Analysis problem formulation

reduce from 2D to 1D: find a direction onto which to project the data so as to minimize the projection error
reduce from n-D to k-D: find k vectors onto which to project the data so as to minimize the projection error

PCA is not linear regression

4. Principal Component Analysis algorithm

data preprocessing

training set: $x^{(1)}, x^{(2)}, \cdots, x^{(m)}$

preprocessing (feature scaling/mean normalization)

$\mu_j = \frac{1}{m} \sum_{i = 1}^m x_j^{(i)}$

$\textrm{replace each} \ x_j^{(i)} \ \textrm{with} \ x_j^{(i)} - \mu_j$

if different features on different scales, scale features to have comparable range of values

PCA algorithm - reduce data from n-D to k-D

compute covariance matrix

$\Sigma = \frac{1}{m} \sum_{i = 1}^m \left( x^{(i)} \right) \left( x^{(i)} \right)^T$

compute eigenvectors of $\Sigma$

5. Reconstruction from compressed representation

$X = U_{reduce} z$

6. Choosing the number of principal components

average squared projection error

$\frac{1}{m} \sum_{i=1}^m || x^{(i)} - x_{approx}^{(i)}||^2$

total variation in the data

$\frac{1}{m} \sum_{i=1}^m || x^{(i)}||^2$

choose k to be smallest value so that

$\frac{\frac{1}{m} \sum_{i=1}^m || x^{(i)} - x_{approx}^{(i)}||^2 } {\frac{1}{m} \sum_{i=1}^m || x^{(i)}||^2} \leqslant 0.10$

7. Advice for applying PCA

supervised learning speedup

$\left( x^{(1)}, y^{(1)} \right ), \left( x^{(2)}, y^{(2)} \right ), \cdots, \left( x^{(m)}, y^{(m)} \right )$

$x^{(1)}, x^{(2)}, \cdots, x^{(m)} \in \mathbb{R}^{10000} \rightarrow^{PCA} z^{(1)}, z^{(2)}, \cdots, z^{(m)} \in \mathbb{R}^{1000}$

$\textrm{new training set:} \ \left( z^{(1)}, y^{(1)} \right ), \left( z^{(2)}, y^{(2)} \right ), \cdots, \left( z^{(m)}, y^{(m)} \right )$

this mapping can be applied as well to examples in the cross validation and test sets

application of PCA
- compression
  - reduce memory/disk needed to store data
  - speedup learning algorithm
- visualization
bad use of PCA: to prevent overfitting, this might work OK, but isn't a good way to address overfitting, use regularization instead

PCA与Linear Regression的区别

PCA衡量的是orthogonal distance，而linear regression是所有x点对应的真实值y=g(x)与估计值f(x)之间的vertical distance距离
more general 的解释：PCA中为的是寻找一个surface，将各feature{x1,x2,...,xn}投影到这个surface后使得各点间variance最大（跟y没有关系，是寻找最能够表现这些feature的一个平面）；而Linear Regression是给出{x1,x2,...,xn}，希望根据x去预测y，所以进行回归

信息熵的做法应该属于projection pursuit降维了，PCA是factor analysis的方法

Reference: http://blog.csdn.net/abcjennifer/article/details/8002329

Quebradawill

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Stanford ML - Lecture 10 - Dimensionality Reduction

1. Motivation I: Data CompressionReduce data from 2D to 1D2. Motivation II: Data Visualization3. Principal Component Analysis problem formulationreduce from 2D to 1D: find a directio
复制链接

扫一扫

专栏目录