目录
1 总体主成分分析
1.1 基本想法
(以前学过,很好理解,不放了)
1.2 定义和导出
x
=
(
x
1
,
x
2
,
⋯
,
x
m
)
T
\bm{x}=(x_1,x_2,\cdots,x_m)^T
x=(x1,x2,⋯,xm)T
μ
=
E
(
x
)
=
(
μ
1
,
μ
2
,
⋯
,
μ
m
)
\bm\mu=E(\bm x)=(\mu_1,\mu_2,\cdots,\mu_m)
μ=E(x)=(μ1,μ2,⋯,μm)
Σ
=
c
o
v
(
x
,
x
)
=
E
[
(
x
−
μ
)
(
x
−
μ
)
T
]
\Sigma=\mathbf {cov}(\bm x,\bm x)=E[(\bm x-\bm\mu)(\bm x-\bm \mu)^T]
Σ=cov(x,x)=E[(x−μ)(x−μ)T]
α
i
=
(
α
1
i
,
α
2
i
,
⋯
,
α
m
i
)
T
,
i
=
1
,
2
,
⋯
,
m
\alpha_i=(\alpha_{1i},\alpha_{2i},\cdots,\alpha_{mi})^T, i=1,2,\cdots,m
αi=(α1i,α2i,⋯,αmi)T,i=1,2,⋯,m
y
i
=
α
i
T
x
=
α
1
i
x
1
+
α
2
i
x
2
+
⋯
+
α
m
i
x
m
(
1
)
y_i=\alpha_i^T\bm x=\alpha_{1i}x_1+\alpha_{2i}x_2+\cdots+\alpha_{mi}x_m\qquad (1)
yi=αiTx=α1ix1+α2ix2+⋯+αmixm(1)
Properties
①
E
(
y
i
)
=
α
i
T
μ
,
i
=
1
,
2
,
⋯
,
m
E(y_i)=\alpha_i^T\mu,\quad i=1,2,\cdots,m
E(yi)=αiTμ,i=1,2,⋯,m
②
v
a
r
(
y
i
)
=
α
i
T
Σ
α
i
,
i
=
1
,
2
,
⋯
,
m
\mathbf{var}(y_i)=\alpha_i^T\Sigma\alpha_i,\quad i=1,2,\cdots,m
var(yi)=αiTΣαi,i=1,2,⋯,m
③
c
o
v
(
y
i
,
y
j
)
=
α
i
T
Σ
α
j
,
i
,
j
=
1
,
2
,
⋯
,
m
\mathbf{cov}(y_i,y_j)=\alpha_i^T\Sigma\alpha_j,\quad i,j=1,2,\cdots,m
cov(yi,yj)=αiTΣαj,i,j=1,2,⋯,m
Definition(总体主成分) 给定一个如(1)所示的线性变换,如果它们满足下列条件:
①
α
i
T
α
=
1
,
i
=
1
,
2
,
⋯
,
m
\alpha_i^T\alpha=1,\quad i=1,2,\cdots,m
αiTα=1,i=1,2,⋯,m
②
c
o
v
(
y
i
,
y
j
)
=
0
(
i
≠
j
)
\mathbf{cov}(y_i,y_j)=0(i\ne j)
cov(yi,yj)=0(i=j)
③一般地,
y
i
y_i
yi是与
y
1
,
y
2
,
⋯
,
y
i
−
1
(
i
=
1
,
2
,
⋯
,
m
)
y_1,y_2,\cdots,y_{i-1}(i=1,2,\cdots,m)
y1,y2,⋯,yi−1(i=1,2,⋯,m)都不相关的
x
\bm{x}
x的所有线性变换中方差最大的
1.3 主要性质
Theorem 设
x
\bm{x}
x是m维随机向量,
Σ
\Sigma
Σ的特征值为
λ
1
≥
λ
2
≥
⋯
≥
λ
m
≥
0
\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_m\ge0
λ1≥λ2≥⋯≥λm≥0,特征值对应的单位特征向量分别是
α
1
,
α
2
,
⋯
,
α
m
\alpha_1,\alpha_2,\cdots,\alpha_m
α1,α2,⋯,αm,则
x
\bm{x}
x的第k主成分是
y
k
=
α
k
T
x
=
α
1
k
x
1
+
α
2
k
x
2
+
⋯
+
α
m
k
x
m
,
k
=
1
,
2
,
⋯
,
m
y_k=\alpha_k^Tx=\alpha_{1k}x_1+\alpha_{2k}x_2+\cdots+\alpha_{mk}x_m,\quad k=1,2,\cdots,m
yk=αkTx=α1kx1+α2kx2+⋯+αmkxm,k=1,2,⋯,m
x
\bm{x}
x的第k主成分的方差是
v
a
r
(
y
k
)
=
α
k
T
Σ
α
k
=
λ
k
,
k
=
1
,
2
,
⋯
,
m
\mathbf{var}(y_k)=\alpha_k^T\Sigma\alpha_k=\lambda_k,\quad k=1,2,\cdots,m
var(yk)=αkTΣαk=λk,k=1,2,⋯,m
(证明略)
推论
y
=
(
y
1
,
y
2
,
⋯
,
y
m
)
T
\bm{y}=(y_1,y_2,\cdots,y_m)^T
y=(y1,y2,⋯,ym)T的分量依次是
x
\bm{x}
x的第一主成分到第m主成分的充要条件是:
①
y
=
A
T
x
\bm{y}=A^T\bm{x}
y=ATx,A是正交矩阵
A
=
[
α
11
α
12
⋯
α
1
m
α
21
α
22
⋯
α
2
m
⋮
⋮
⋮
α
m
1
α
m
2
⋯
α
m
m
]
A=\left[\begin{matrix} \alpha_{11} & \alpha_{12} & \cdots &\alpha_{1m}\\ \alpha_{21} & \alpha_{22} & \cdots & \alpha_{2m}\\ \vdots&\vdots&&\vdots\\ \alpha_{m1} & \alpha_{m2} &\cdots&\alpha_{mm} \end{matrix} \right]
A=⎣⎢⎢⎢⎡α11α21⋮αm1α12α22⋮αm2⋯⋯⋯α1mα2m⋮αmm⎦⎥⎥⎥⎤
②
c
o
v
(
y
)
=
d
i
a
g
(
λ
1
,
λ
2
,
⋯
,
λ
m
)
,
λ
1
≥
λ
2
≥
⋯
≥
λ
m
\mathbf{cov(y)}=diag(\lambda_1,\lambda_2,\cdots,\lambda_m),\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_m
cov(y)=diag(λ1,λ2,⋯,λm),λ1≥λ2≥⋯≥λm
Σ
α
k
=
λ
k
α
k
\Sigma\alpha_k=\lambda_k\alpha_k
Σαk=λkαk
Σ
A
=
A
Λ
\Sigma A=A\Lambda
ΣA=AΛ
Σ
=
A
Λ
A
T
\Sigma=A\Lambda A^T
Σ=AΛAT,
Λ
=
A
T
Σ
A
\Lambda=A^T\Sigma A
Λ=ATΣA
总体主成分的性质
①
c
o
v
(
y
)
=
d
i
a
g
(
λ
1
,
λ
2
,
⋯
,
λ
m
)
\mathbf{cov(y)}=diag(\lambda_1,\lambda_2,\cdots,\lambda_m)
cov(y)=diag(λ1,λ2,⋯,λm)
②
∑
i
=
1
m
λ
i
=
∑
i
=
1
m
σ
i
i
\sum\limits_{i=1}^m\lambda_i=\sum\limits_{i=1}^m\sigma_{ii}
i=1∑mλi=i=1∑mσii (利用trace的性质)
③factor loading
第k个主成分
y
k
y_k
yk与变量
x
i
x_i
xi的相关系数
ρ
(
y
k
,
x
i
)
\rho(y_k,x_i)
ρ(yk,xi)称为因子负荷量,
ρ
(
y
k
,
x
i
)
=
c
o
v
(
y
k
,
x
i
)
v
a
r
(
y
k
)
v
a
r
(
x
i
)
=
c
o
v
(
α
k
T
x
,
e
i
T
x
)
λ
k
σ
i
i
=
α
k
T
Σ
e
i
λ
k
σ
i
i
=
e
i
T
Σ
α
k
λ
k
σ
i
i
=
λ
k
e
i
T
α
k
σ
i
i
=
λ
k
α
i
k
σ
i
i
\rho(y_k,x_i)=\frac{cov(y_k,x_i)}{\sqrt{var(y_k)var(x_i)}}=\frac{cov(\alpha_k^T\bm{x},e_i^T\bm{x})}{\sqrt{\lambda_k\sigma_{ii}}}=\frac{\alpha_k^T\Sigma e_i}{\sqrt{\lambda_k\sigma_{ii}}}=\frac{e_i^T\Sigma \alpha_k}{\sqrt{\lambda_k\sigma_{ii}}}=\frac{\sqrt{\lambda_k}e_i^T\alpha_k}{\sqrt{\sigma_{ii}}}=\frac{\sqrt{\lambda_k}\alpha_{ik}}{\sqrt{\sigma_{ii}}}
ρ(yk,xi)=var(yk)var(xi)cov(yk,xi)=λkσiicov(αkTx,eiTx)=λkσiiαkTΣei=λkσiieiTΣαk=σiiλkeiTαk=σiiλkαik
④
∑
i
=
1
m
σ
i
i
ρ
2
(
y
k
,
x
i
)
=
∑
i
=
1
m
λ
k
α
i
k
2
=
λ
k
\sum\limits_{i=1}^m\sigma_{ii}\rho^2(y_k,x_i)=\sum\limits_{i=1}^m\lambda_k\alpha_{ik}^2=\lambda_k
i=1∑mσiiρ2(yk,xi)=i=1∑mλkαik2=λk
⑤
∑
k
=
1
m
ρ
2
(
y
k
,
x
i
)
=
1
\sum\limits_{k=1}^m\rho^2(y_k,x_i)=1
k=1∑mρ2(yk,xi)=1
1.4 主成分的个数
取 x \bm{x} x的前q个主成分时,能够最大限度地保留原有变量方差的信息;舍弃变量 x \bm{x} x的后p个主成分时,原有变量的方差的信息损失最少。
Definition(方差贡献率) η k = λ k ∑ i = 1 m λ i , ∑ i = 1 k η i = ∑ i = 1 k λ i ∑ i = 1 m λ i \eta_k=\frac{\lambda_k}{\sum\limits_{i=1}^m\lambda_i},\sum\limits_{i=1}^k\eta_i=\frac{\sum\limits_{i=1}^k\lambda_i}{\sum\limits_{i=1}^m\lambda_i} ηk=i=1∑mλiλk,i=1∑kηi=i=1∑mλii=1∑kλi
Definition k个主成分
y
1
,
y
2
,
⋯
,
y
k
y_1,y_2,\cdots,y_k
y1,y2,⋯,yk对原有变量
x
i
x_i
xi的贡献率定义为
x
i
x_i
xi与
(
y
1
,
y
2
,
⋯
,
y
k
)
(y_1,y_2,\cdots,y_k)
(y1,y2,⋯,yk)的相关系数的平方,记作
v
i
v_i
vi
v
i
=
ρ
2
(
x
i
,
(
y
1
,
y
2
,
⋯
,
y
k
)
)
=
∑
j
=
1
k
ρ
2
(
x
i
,
y
j
)
v_i=\rho^2(x_i,(y_1,y_2,\cdots,y_k))=\sum\limits_{j=1}^k\rho^2(x_i,y_j)
vi=ρ2(xi,(y1,y2,⋯,yk))=j=1∑kρ2(xi,yj)
1.5 规范化变量的总体主成分
x i ∗ = x i − E ( x i ) v a r ( x i ) x_i^*=\frac{x_i-E(x_i)}{\sqrt{var(x_i)}} xi∗=var(xi)xi−E(xi),协方差矩阵换为相关矩阵。性质类似,自己推。
2 样本主成分分析
2.1 样本主成分
样本矩阵
X
=
[
x
11
x
12
⋯
x
1
n
x
21
x
22
⋯
x
2
n
⋮
⋮
⋮
x
m
1
x
m
2
⋯
x
m
n
]
\bm{X}=\left[ \begin{matrix} x_{11}&x_{12}&\cdots&x_{1n}\\ x_{21}&x_{22}&\cdots&x_{2n}\\ \vdots&\vdots&&\vdots\\ x_{m1}&x_{m2}&\cdots&x_{mn} \end{matrix} \right]
X=⎣⎢⎢⎢⎡x11x21⋮xm1x12x22⋮xm2⋯⋯⋯x1nx2n⋮xmn⎦⎥⎥⎥⎤
x
ˉ
=
1
n
∑
j
=
1
n
x
j
\bar x=\frac{1}{n}\sum\limits_{j=1}^nx_j
xˉ=n1j=1∑nxj
x
ˉ
i
=
1
n
∑
k
=
1
n
x
i
k
,
x
ˉ
j
=
1
n
∑
k
=
1
n
x
j
k
\bar x_i=\frac{1}{n}\sum\limits_{k=1}^nx_{ik},\bar x_j=\frac{1}{n}\sum\limits_{k=1}^nx_{jk}
xˉi=n1k=1∑nxik,xˉj=n1k=1∑nxjk
S
=
[
s
i
j
]
m
×
m
S=[s_{ij}]_{m\times m}
S=[sij]m×m
s
i
j
=
1
n
−
1
∑
k
=
1
n
(
x
i
k
−
x
ˉ
i
)
(
x
j
k
−
x
ˉ
j
)
s_{ij}=\frac{1}{n-1}\sum\limits_{k=1}^n(x_{ik}-\bar x_i)(x_{jk}-\bar x_j)
sij=n−11k=1∑n(xik−xˉi)(xjk−xˉj)
R
=
[
r
i
j
]
m
×
m
,
r
i
j
=
s
i
j
s
i
i
s
j
j
R=[r_{ij}]_{m\times m},\quad r_{ij}=\frac{s_{ij}}{\sqrt{s_{ii}s_{jj}}}
R=[rij]m×m,rij=siisjjsij
样本主成分的定义类似,就是用S代替 Σ \Sigma Σ。
2.2 相关矩阵的特征值分解方法
①对观测数据进行规范化处理,得到规范化数据矩阵,仍以X表示
②计算
R
=
[
r
i
j
]
m
×
m
=
1
n
−
1
X
X
T
R=[r_{ij}]_{m\times m}=\frac{1}{n-1}XX^T
R=[rij]m×m=n−11XXT
③
∣
R
−
λ
I
∣
=
0
⇒
λ
1
≥
λ
2
≥
⋯
≥
λ
m
|R-\lambda I|=0\Rightarrow \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_m
∣R−λI∣=0⇒λ1≥λ2≥⋯≥λm
选择方差贡献率达到预定值的主成分个数k,对应的单位特征向量为
a
i
=
(
a
1
i
,
a
2
i
,
⋯
,
a
m
i
)
T
,
i
=
1
,
2
,
⋯
,
k
a_i=(a_{1i},a_{2i},\cdots,a_{mi})^T,\quad i=1,2,\cdots,k
ai=(a1i,a2i,⋯,ami)T,i=1,2,⋯,k
④求k个样本主成分
y
i
=
α
i
T
x
y_i=\alpha_i^Tx
yi=αiTx
⑤计算k个主成分
y
j
y_j
yj与原变量
x
i
x_i
xi的相关系数
ρ
(
x
i
,
y
j
)
\rho(x_i,y_j)
ρ(xi,yj),以及k个主成分对原变量
x
i
x_i
xi的贡献率
v
i
v_i
vi
⑥计算n个样本的k个主成分值
2.3 数据矩阵的奇异值分解算法
输入:
m
×
n
m\times n
m×n样本矩阵X,其每一行元素的均值为0
输出:
k
×
n
k\times n
k×n样本主成分矩阵Y
参数:主成分个数k
①
X
′
=
1
n
−
1
X
T
,
S
X
=
X
′
T
X
′
X^\prime=\frac{1}{n-1}X^T,S_X=X^{\prime T}X^\prime
X′=n−11XT,SX=X′TX′
②对矩阵
X
′
X^\prime
X′进行截断奇异值分解,得到
X
′
=
U
Σ
V
T
X^\prime=U\Sigma V^T
X′=UΣVT
③求
k
×
n
k\times n
k×n样本主成分矩阵,
Y
=
V
T
X
Y=V^TX
Y=VTX
总结
知识点不难,而自用者实难!