接上一节:
数据
x
i
∈
R
p
,
i
=
1
,
2
,
…
,
N
x_i\in \mathbb{R^p}, i=1,2,\dots,N
xi∈Rp,i=1,2,…,N
D
a
t
a
:
X
=
(
x
1
,
x
2
,
…
,
x
N
)
N
×
p
T
=
(
x
1
T
x
2
T
⋮
x
N
T
)
=
(
x
11
x
12
…
x
1
p
x
21
x
22
…
x
2
p
⋮
x
N
1
x
N
2
…
x
N
p
)
N
×
p
Data: X=(x_1,x_2,\dots,x_N)^T_{N\times p}= \begin{pmatrix} x_1^T\\ x_2^T\\ \vdots\\ x_N^T \end{pmatrix}= \begin{pmatrix} x_{11} & x_{12} & \dots & x_{1p}\\ x_{21} & x_{22} & \dots & x_{2p}\\ \vdots \\ x_{N1} & x_{N2} & \dots & x_{Np} \end{pmatrix}_{N\times p}
Data:X=(x1,x2,…,xN)N×pT=⎝⎜⎜⎜⎛x1Tx2T⋮xNT⎠⎟⎟⎟⎞=⎝⎜⎜⎜⎛x11x21⋮xN1x12x22xN2………x1px2pxNp⎠⎟⎟⎟⎞N×p
结论
样本均值(Sample Mean): X ‾ p × 1 = 1 N ∑ i = 1 N x i = 1 N X T 1 N \overline{X}_{p\times1}=\dfrac{1}{N}\sum_{i=1}^{N}x_i=\dfrac{1}{N}X^T1_N Xp×1=N1∑i=1Nxi=N1XT1N
样本协方差(Sample Covariance): S = 1 N ∑ i = 1 N ( x i − X ‾ ) ( x i − X ‾ ) T = 1 N X T H X S=\dfrac{1}{N}\sum_{i=1}^{N}(x_i-\overline{X})(x_i-\overline{X})^T=\dfrac{1}{N}X^THX S=N1∑i=1N(xi−X)(xi−X)T=N1XTHX
其中: 1 N = ( 1 1 ⋮ 1 ) N × 1 1_N=\begin{pmatrix}1\\1\\\vdots\\1\end{pmatrix}_{N\times 1} 1N=⎝⎜⎜⎜⎛11⋮1⎠⎟⎟⎟⎞N×1, H = I N − 1 N 1 N 1 N T H=I_N-\dfrac{1}{N}1_N1_N^T H=IN−N11N1NT, X ‾ ∈ R p \overline{X}\in\mathbb{R}^p X∈Rp, S ∈ R p × p S\in\mathbb{R}^{p\times p} S∈Rp×p
H为中心矩阵(centering matrix),中心矩阵将X每一维都减去均值,实现归一化。
经典PCA
一个中心:原始特征空间的重构,线性相关特征–>线性无关特征(主成分)
两个基本点:
- 最大投影方差 :样本点投影到某方向上分布尽量分散(投影方差最大),该方向就是主成分。
- 最小重构距离:样本点投影后重构回去所花的代价要最小,投影分布越分散(投影方差最大),则重构所需的代价越小。所以,这两个方法其实是等价的。
最大投影方差:
中心化处理: x i − X ‾ x_i-\overline{X} xi−X,均值变为0
假设其中一个所取的方向为 u 1 , s . t . ∥ u 1 ∥ = u 1 T u 1 = 1 u_1,\quad s.t.\ \left\|u_1\right\|=u_1^Tu_1=1 u1,s.t. ∥u1∥=u1Tu1=1
投影为: ( x i − X ‾ ) T u 1 ∥ u 1 ∥ = ( x i − X ‾ ) T u 1 \dfrac{(x_i-\overline{X})^Tu_1}{\left\|u_1\right\|}=(x_i-\overline{X})^Tu_1 ∥u1∥(xi−X)Tu1=(xi−X)Tu1
因为 x i − X ‾ x_i-\overline{X} xi−X 均值为0,所以投影方差 J J J:
J = ∑ i = 1 N ( ( x i − X ‾ ) T u 1 ) 2 = ∑ i = 1 N ( x i − X ‾ ) T u 1 ⋅ ( x i − X ‾ ) T u 1 ∵ ( x i − X ‾ ) T u 1 是 个 数 , 所 以 ( x i − X ‾ ) T u 1 = ( ( x i − X ‾ ) T u 1 ) T = u 1 T ( x i − X ‾ ) ∴ J = ∑ i = 1 N u 1 T ( x i − X ‾ ) ( x i − X ‾ ) T u 1 = u 1 T [ ∑ i = 1 N ( x i − X ‾ ) ( x i − X ‾ T ) ] u 1 = N ⋅ u 1 T [ 1 N ∑ i = 1 N ( x i − X ‾ ) ( x i − X ‾ T ) ] u 1 = N ⋅ u 1 T S u 1 ∝ u 1 T S u 1 \begin{aligned}J &=\sum_{i=1}^{N}((x_i-\overline{X})^Tu_1)^2\\ &=\sum_{i=1}^{N}(x_i-\overline{X})^Tu_1\cdot(x_i-\overline{X})^Tu_1\\ \\ &\because\ (x_i-\overline{X})^Tu_1是个数,所以(x_i-\overline{X})^Tu_1=((x_i-\overline{X})^Tu_1)^T=u_1^T(x_i-\overline{X})\\ \\ \therefore J&=\sum_{i=1}^{N}u_1^T(x_i-\overline{X})(x_i-\overline{X})^Tu_1\\ &=u_1^T[\sum_{i=1}^{N}(x_i-\overline{X})(x_i-\overline{X}^T)]u_1\\ &=N\cdot u_1^T[\dfrac{1}{N}\sum_{i=1}^{N}(x_i-\overline{X})(x_i-\overline{X}^T)]u_1\\ &=N\cdot u_1^TSu_1\propto u_1^TSu_1 \end{aligned} J∴J=i=1∑N((xi−X)Tu1)2=i=1∑N(xi−X)Tu1⋅(xi−X)Tu1∵ (xi−X)Tu1是个数,所以(xi−X)Tu1=((xi−X)Tu1)T=u1T(xi−X)=i=1∑Nu1T(xi−X)(xi−X)Tu1=u1T[i=1∑N(xi−X)(xi−XT)]u1=N⋅u1T[N1i=1∑N(xi−X)(xi−XT)]u1=N⋅u1TSu1∝u1TSu1
由上述可得:
最
大
化
投
影
方
差
J
⟺
{
u
^
1
=
a
r
g
m
a
x
u
1
T
S
u
1
s
.
t
.
u
1
T
u
1
=
1
最大化投影方差J\Longleftrightarrow \left\{ \begin{aligned} &\hat{u}_1=argmax\ u_1^TSu_1\\ &s.t. \quad u_1^Tu_1=1 \end{aligned} \right.
最大化投影方差J⟺{u^1=argmax u1TSu1s.t.u1Tu1=1
拉格朗日乘子法:
L ( u 1 , λ ) = u 1 T S u 1 + λ ( 1 − u 1 T u 1 ) ∂ L ∂ u 1 = 2 S u 1 − λ ⋅ 2 u 1 = 0 ∂ L ∂ u 1 = 0 ⟹ S u 1 = λ u 1 \begin{aligned} &\mathcal{L}(u_1,\lambda)=u_1^TSu_1+\lambda(1-u_1^Tu_1)\\ \\ &\dfrac{\partial\mathcal{L}}{\partial u_1}=2Su_1-\lambda\cdot2u_1=0\\ \\ &\dfrac{\partial\mathcal{L}}{\partial u_1}=0\Longrightarrow Su_1=\lambda u_1 \end{aligned} L(u1,λ)=u1TSu1+λ(1−u1Tu1)∂u1∂L=2Su1−λ⋅2u1=0∂u1∂L=0⟹Su1=λu1
由上述得: u 1 u_1 u1为协方差矩阵 S S S的特征向量