第9章 降维
1 Motivation
- Data Compression 数据压缩
- Visualization 数据可视化
2 Principal Component Analysis Problem Formulation ( PCA ) 主成分分析问题
2.1 PCA与线性回归的区别
PCA | 线性回归 |
---|---|
最小化投射误差 Projected Error | 最小化预测误差 |
对结果不作预测 | 为了预测结果 |
2.2 Discription
- Reduce from n-dimension to k-dimension:Find k k k vectors u ( i ) , u ( i ) , ⋅ ⋅ ⋅ , u ( k ) u^{(i)},u^{(i)},···,u^{(k)} u(i),u(i),⋅⋅⋅,u(k) (经过原点?) onto which to project the data, so as to minimize the projection error.
- 保证降维的同时,保证数据的特性损失最小
- 完全无参数限制,在PCA的计算过程中完全不需要人为地设定参数或是根据任何经验模型对计算进行干预,最后的结果只与数据相关,与用户是独立的
2.3 Algorithm
2.3.1 Data Preprocessing
- Training set: x ( 1 ) , x ( 2 ) , ⋅ ⋅ ⋅ , x ( m ) x^{(1)},x^{(2)},···,x^{(m)} x(1),x(2),⋅⋅⋅,x(m)
- Preprocessing ( feature scaling 特征规范化 / mean normalization 均值归一化 ):
μ j = 1 m ∑ i = 1 m x j ( i ) \mu_j=\frac{1}{m}\sum_{i=1}^mx_j^{(i)} μj=m1i=1∑mxj(i)
Replace each x j ( i ) x_j^{(i)} xj(i) with x j − μ j x_j-\mu_j xj−μj
If different features on different scales, scale features to have comparable range of values.
2.3.2 Compute “Covariance Matrix” 计算协方差矩阵
-
Σ
=
1
m
∑
i
=
1
n
(
x
(
i
)
)
(
x
(
i
)
)
T
\Sigma=\frac{1}{m}\sum_{i=1}^n\left(x^{(i)}\right){\left(x^{(i)}\right)}^T
Σ=m1i=1∑n(x(i))(x(i))T
x ( i ) 是 一 个 n × 1 的 向 量 x^{(i)}是一个n×1的向量 x(i)是一个n×1的向量
2.3.3 Compute “Eigenvectors" of Matrix Σ \Sigma Σ 计算协方差矩阵的特征向量
[U,S,V]=svd(Sigma);
- U U U是一个具有与数据之间最小投射误差的方向向量构成的矩阵 U = [ ∣ ∣ ∣ ∣ u ( 1 ) u ( 2 ) ⋯ u ( k ) ⋯ u ( n ) ∣ ∣ ∣ ∣ ] ∈ R n × n U=\left[\begin{matrix} |&|&&|& &|\\ u^{(1)}&u^{(2)}&\cdots&u^{(k)}&\cdots&u^{(n)}\\ |&|& &|&&| \end{matrix}\right]∈\mathbb{R}^{n×n} U=⎣⎡∣u(1)∣∣u(2)∣⋯∣u(k)∣⋯∣u(n)∣⎦⎤∈Rn×n
- 从 U U U中选取前 k k k个向量,获得一个 n × k n×k n×k维度向量 U r e d u c e U_{reduce} Ureduce
- 获取新的
k
×
1
k×1
k×1维特征向量
z
(
i
)
=
U
r
e
d
u
c
e
T
∗
x
(
i
)
z^{(i)}={U_{reduce}}^T*x^{(i)}
z(i)=UreduceT∗x(i)
不对方差特征进行处理
2.3.4 Reconstruction from compressed representation 重建的压缩表示
- 近似获得原有的特征: x a p p r o x ( i ) = U r e d u c e z ( i ) ≈ x ( i ) x_{approx}^{(i)}=U_{reduce}z^{(i)}≈x^{(i)} xapprox(i)=Ureducez(i)≈x(i)
2.3.5 Choosing the Number of Principal Components 选择主成分的数量
Average squared projection error 平均均方误差: 1 m ∑ i = 1 m ∣ ∣ x ( i ) − x a p p r o x ( i ) ∣ ∣ 2 \frac{1}{m}\sum_{i=1}^m{||x^{(i)}-x_{approx}^{(i)}||}^2 m1i=1∑m∣∣x(i)−xapprox(i)∣∣2
Total variation in the data 训练集的方差: 1 m ∑ i = 1 m ∣ ∣ x ( i ) ∣ ∣ 2 \frac{1}{m}\sum_{i=1}^m{||x^{(i)}||}^2 m1i=1∑m∣∣x(i)∣∣2
- Goal:在平均均方误差与训练集方差的比例尽可能小的情况下选择尽可能小的 k k k值 1 m ∑ i = 1 m ∣ ∣ x ( i ) − x a p p r o x ( i ) ∣ ∣ 2 1 m ∑ i = 1 m ∣ ∣ x ( i ) ∣ ∣ 2 ≤ 0.01 means 99% of variance is retained,意味着保留了原本数据99%的偏差 \frac{\frac{1}{m}\sum_{i=1}^m{||x^{(i)}-x_{approx}^{(i)}||}^2}{\frac{1}{m}\sum_{i=1}^m{||x^{(i)}||}^2}≤0.01\text{ \ \ \ \ \ means 99\% of variance is retained,意味着保留了原本数据99\%的偏差} m1∑i=1m∣∣x(i)∣∣2m1∑i=1m∣∣x(i)−xapprox(i)∣∣2≤0.01 means 99% of variance is retained,意味着保留了原本数据99%的偏差
Way(1)
Try PCA with
k
=
1
,
2
,
⋅
⋅
⋅
k=1,2,···
k=1,2,⋅⋅⋅
Compute
U
r
e
d
u
c
e
,
z
(
1
)
,
z
(
2
)
,
⋅
⋅
⋅
,
z
(
m
)
,
x
a
p
p
r
o
x
(
1
)
,
⋅
⋅
⋅
,
x
a
p
p
r
o
x
(
m
)
U_{reduce},z^{(1)},z^{(2)},···,z^{(m)},x^{(1)}_{approx},···,x^{(m)}_{approx}
Ureduce,z(1),z(2),⋅⋅⋅,z(m),xapprox(1),⋅⋅⋅,xapprox(m)
Check if
1
m
∑
i
=
1
m
∣
∣
x
(
i
)
−
x
a
p
p
r
o
x
(
i
)
∣
∣
2
1
m
∑
i
=
1
m
∣
∣
x
(
i
)
∣
∣
2
≤
0.01
\frac{\frac{1}{m}\sum_{i=1}^m{||x^{(i)}-x_{approx}^{(i)}||}^2}{\frac{1}{m}\sum_{i=1}^m{||x^{(i)}||}^2}≤0.01
m1∑i=1m∣∣x(i)∣∣2m1∑i=1m∣∣x(i)−xapprox(i)∣∣2≤0.01
Way(2)
[U,S,V]=svd(sigma)
- S S S是一个 n × n n×n n×n的矩阵 S = [ s 11 0 0 ⋯ 0 0 s 22 0 ⋯ 0 0 0 s 33 ⋯ 0 ⋱ 0 0 0 ⋯ s n n ] S=\left[\begin{matrix} s_{11}&0&0&\cdots&0\\ 0&s_{22}&0&\cdots&0\\ 0&0&s_{33}&\cdots&0\\ &&&\ddots&\\ 0&0&0&\cdots&s_{nn} \end{matrix}\right] S=⎣⎢⎢⎢⎢⎡s110000s220000s330⋯⋯⋯⋱⋯000snn⎦⎥⎥⎥⎥⎤
-
1
m
∑
i
=
1
m
∣
∣
x
(
i
)
−
x
a
p
p
r
o
x
(
i
)
∣
∣
2
1
m
∑
i
=
1
m
∣
∣
x
(
i
)
∣
∣
2
=
1
−
∑
i
=
1
k
s
i
i
∑
i
=
1
n
s
i
i
≤
0.01
\frac{\frac{1}{m}\sum_{i=1}^m{||x^{(i)}-x_{approx}^{(i)}||}^2}{\frac{1}{m}\sum_{i=1}^m{||x^{(i)}||}^2}=1-\frac{\sum_{i=1}^ks_{ii}}{\sum_{i=1}^ns_{ii}}≤0.01
m1∑i=1m∣∣x(i)∣∣2m1∑i=1m∣∣x(i)−xapprox(i)∣∣2=1−∑i=1nsii∑i=1ksii≤0.01
即 ∑ i = 1 k s i i ∑ i = 1 n s i i ≥ 0.99 \frac{\sum_{i=1}^ks_{ii}}{\sum_{i=1}^ns_{ii}}≥0.99 ∑i=1nsii∑i=1ksii≥0.99
3 Advice for Applying PCA
-
Mapping x ( i ) → z ( i ) x^{(i)}→z^{(i)} x(i)→z(i) should be defined by running PCA only on the training set. This mapping can be applied as well to the x c v ( i ) x_{cv}^{(i)} xcv(i) and x t e s t ( i ) x_{test}^{(i)} xtest(i) in the cross validation and test sets.
-
Advantages:
(1) Compression:
① Reduce memory / disk needed to store data
② Speed up learning algorithm
(2) Visualization -
不要用PCA解决过拟合,用正则化解决
-
最好从所有原始特征开始,因为PCA降维后会丢失一些信息,有必要的时候再用PCA
4 Reference
吴恩达 机器学习 coursera machine learning
黄海广 机器学习笔记