机器学习-白板推导 P5_3
PCA 最大投影方差
X
=
[
x
1
x
2
.
.
.
x
N
]
T
=
[
x
1
T
x
2
T
⋮
x
N
T
]
=
[
x
11
x
12
.
.
.
x
1
p
x
21
x
22
.
.
.
x
2
p
⋮
⋮
⋱
⋮
x
N
1
x
N
2
.
.
.
x
N
p
]
N
∗
p
X=\begin{bmatrix} x_1 & x_2 &...& x_N \end{bmatrix}^T=\begin{bmatrix} x_1^T \\ x_2^T \\\vdots\\ x_N^T \end{bmatrix} = \begin{bmatrix} x_{11} & x_{12} &...& x_{1p} \\ x_{21} & x_{22} &...& x_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ x_{N1} & x_{N2} &...& x_{Np} \\ \end{bmatrix}_{N*p}
X=[x1x2...xN]T=⎣⎢⎢⎢⎡x1Tx2T⋮xNT⎦⎥⎥⎥⎤=⎣⎢⎢⎢⎡x11x21⋮xN1x12x22⋮xN2......⋱...x1px2p⋮xNp⎦⎥⎥⎥⎤N∗p
1
N
=
[
1
1
⋮
1
]
1_N= \begin{bmatrix} 1 \\ 1 \\\vdots\\ 1 \end{bmatrix}
1N=⎣⎢⎢⎢⎡11⋮1⎦⎥⎥⎥⎤
M
e
a
n
:
X
‾
=
1
N
∑
i
=
1
N
x
i
=
1
N
X
T
1
N
Mean:\overline{X}=\frac{1}{N}\sum_{i=1}^N x_i=\frac{1}{N}X^T1_N
Mean:X=N1∑i=1Nxi=N1XT1N
C o v a r i a n c e : S = 1 N ∑ i = 1 N ( x i − X ‾ ) ( x i − X ‾ ) T = 1 N X T H X Covariance:S = \frac{1}{N}\sum_{i=1}^N (x_i- \overline{X})(x_i- \overline{X})^T= \frac{1}{N}X^THX Covariance:S=N1∑i=1N(xi−X)(xi−X)T=N1XTHX
一个中心:
将一组可能线性相关的变量,通过正交变换变换成一组线性无关的变量(主成分)
原始特征空间的重构
\quad
相关
→
\rightarrow
→无关
两个基本点:
最大投影方差
最小重构距离
\quad
从投影返回到原始数据的代价
方法
1.中心化,每个样本点减去均值
x
i
−
X
‾
x_i- \overline{X}
xi−X
2.投影,求最大方差
假设投影到
  
μ
1
  
\;\mu_1\;
μ1上,投影过程为
(
x
i
−
X
‾
)
T
μ
1
s
.
t
.
      
μ
1
T
μ
1
=
1
(x_i- \overline{X})^T\mu_1 \quad s.t.\;\;\;\mu_1^T\mu_1=1
(xi−X)Tμ1s.t.μ1Tμ1=1
投影后方差为:(因为投影前已经减去了均值,所以这里可以直接平方)
J
=
1
N
∑
i
=
1
N
(
(
x
i
−
X
‾
)
T
μ
1
)
2
=
1
N
∑
i
=
1
N
μ
1
T
(
x
i
−
X
‾
)
(
x
i
−
X
‾
)
T
μ
1
=
μ
1
T
(
∑
i
=
1
N
1
N
(
x
i
−
X
‾
)
(
x
i
−
X
‾
)
T
)
μ
1
=
μ
1
T
S
μ
1
\begin{aligned} J&= \frac{1}{N}\sum_{i=1}^N \left( (x_i- \overline{X})^T\mu_1 \right) ^ 2 \\ &= \frac{1}{N}\sum_{i=1}^N \mu_1^T(x_i- \overline{X})(x_i- \overline{X})^T\mu_1 \\ &=\mu_1^T \left( \sum_{i=1}^N \frac{1}{N} (x_i- \overline{X})(x_i- \overline{X})^T \right) \mu_1\\ &=\mu_1^TS\mu_1 \end{aligned}
J=N1i=1∑N((xi−X)Tμ1)2=N1i=1∑Nμ1T(xi−X)(xi−X)Tμ1=μ1T(i=1∑NN1(xi−X)(xi−X)T)μ1=μ1TSμ1
优化问题
μ
^
=
a
r
g
max
μ
1
T
S
μ
1
s
.
t
.
    
μ
1
T
μ
1
=
1
\hat{\mu}=arg \max \mu_1^TS\mu_1 \quad s.t. \;\; \mu_1^T\mu_1=1
μ^=argmaxμ1TSμ1s.t.μ1Tμ1=1
拉格朗日乘值法:
L
(
μ
1
,
λ
)
=
μ
1
T
S
μ
1
+
λ
(
1
−
μ
1
T
μ
1
)
L(\mu_1,\lambda)= \mu_1^TS\mu_1 + \lambda(1-\mu_1^T\mu_1)
L(μ1,λ)=μ1TSμ1+λ(1−μ1Tμ1)
∂ L ∂ μ 1 = 2 S μ 1 − 2 λ μ 1 = 0 \frac{\partial L}{\partial \mu_1}=2S\mu_1- 2\lambda\mu_1=0 ∂μ1∂L=2Sμ1−2λμ1=0
S μ 1 = λ μ 1 S\mu_1=\lambda\mu_1 Sμ1=λμ1