Singular Value Decomposition(SVD)
令 A A A为 m × n m\times n m×n的矩阵,借助两个矩阵, A T A ( n × n ) A^TA(n\times n) ATA(n×n)和 A A T ( m × m ) AA^T(m\times m) AAT(m×m),可以看出两个都是对称矩阵 ( A T A ) T = A T A (A^TA)^T=A^TA (ATA)T=ATA,所以可以对其正交对角化。
A
T
A
=
V
Λ
V
T
A^TA=V\Lambda V^T
ATA=VΛVT ,其中
V
V
V里是
A
T
A
A^TA
ATA的特征向量(eigenvectors),
Λ
\Lambda
Λ对角线是
A
T
A
A^TA
ATA的特征值(eigenvalues)。
A
A
T
=
U
Λ
U
T
AA^T=U\Lambda U^T
AAT=UΛUT ,其中
U
U
U里是
A
A
T
AA^T
AAT的特征向量(eigenvectors),
Λ
\Lambda
Λ对角线是
A
A
T
AA^T
AAT的特征值(eigenvalues)。
关于
A
T
A
A^TA
ATA和
A
A
T
AA^T
AAT一个性质是,它们的非零特征值(non-zero eigenvalues)是相同的,下面简单证明一下:
suppose: ( A T A ) v i = λ i v i then A ( A T A ) v i = λ i A v i then A A T ( A v i ) = λ i ( A v i ) \begin{aligned} &\text{suppose: }\quad (A^TA)v_i=\lambda_i v_i \\ &\text{then }\quad A(A^TA)v_i= \lambda_i A v_i \\ &\text{then }\quad AA^T(Av_i)= \lambda_i (A v_i) \\ \end{aligned} suppose: (ATA)vi=λivithen A(ATA)vi=λiAvithen AAT(Avi)=λi(Avi)
可见
u
i
=
A
v
i
∣
∣
A
v
i
∣
∣
u_i=\frac{Av_i}{||Av_i||}
ui=∣∣Avi∣∣Avi(需要保证它是单位向量),现在的问题是
∣
∣
A
v
i
∣
∣
||Av_i||
∣∣Avi∣∣是多少:
∣
∣
A
v
i
∣
∣
2
=
(
A
v
i
)
T
(
A
v
i
)
=
v
i
T
A
T
A
v
i
=
v
i
T
λ
v
i
=
λ
i
∣
∣
A
v
i
∣
∣
=
λ
i
||Av_i||^2=(Av_i)^T(Av_i)=v_i^TA^TAv_i=v_i^T\lambda v_i=\lambda_i \\ ||Av_i|| = \sqrt{\lambda_i}
∣∣Avi∣∣2=(Avi)T(Avi)=viTATAvi=viTλvi=λi∣∣Avi∣∣=λi
这里我们定义奇异值
σ
i
=
λ
i
\sigma_i=\sqrt{\lambda_i}
σi=λi
所以 u i = A v i σ i u_i=\frac{Av_i}{\sigma_i} ui=σiAvi, A v i = σ i u i Av_i=\sigma_i u_i Avi=σiui,换成矩阵表示为 A V = U Σ AV=U\Sigma AV=UΣ,也就是 A = U Σ V T A=U\Sigma V^T A=UΣVT,这就是奇异值分解。
A
V
=
U
Σ
AV=U\Sigma
AV=UΣ,令
r
r
r为
A
A
A的rank,
A
[
v
1
,
.
.
.
,
v
m
]
=
[
u
1
,
.
.
.
,
u
n
]
Σ
A[v_1,...,v_m]=[u_1,...,u_n]\Sigma
A[v1,...,vm]=[u1,...,un]Σ,有:
A
v
1
=
σ
1
u
1
A
v
2
=
σ
2
u
2
.
.
.
A
v
r
=
σ
r
u
r
A
v
r
+
1
=
0
.
.
.
A
v
m
=
0
\begin{aligned} &Av_1 =\sigma_1u_1 \\ &Av_2 =\sigma_2u_2 \\ &... \\ &Av_r =\sigma_ru_r \\ &Av_{r+1} =0 \\ &...\\ &Av_m =0 \\ \end{aligned}
Av1=σ1u1Av2=σ2u2...Avr=σrurAvr+1=0...Avm=0
代回有
A T A = ( U Σ V T ) T ( U Σ V T ) = V Σ T U T U Σ V T = V Σ T Σ V T = V Λ V T A A T = ( U Σ V T ) ( U Σ V T ) T = U Σ V T V Σ T U T = U Σ Σ T U T = U Λ U T A^TA=(U\Sigma V^T)^T(U\Sigma V^T)=V\Sigma^TU^TU\Sigma V^T = V\Sigma^T\Sigma V^T =V\Lambda V^T\\ AA^T=(U\Sigma V^T)(U\Sigma V^T)^T=U\Sigma V^TV\Sigma^TU^T = U\Sigma\Sigma^T U^T =U\Lambda U^T\\ ATA=(UΣVT)T(UΣVT)=VΣTUTUΣVT=VΣTΣVT=VΛVTAAT=(UΣVT)(UΣVT)T=UΣVTVΣTUT=UΣΣTUT=UΛUT
举个例子,假设 Σ = ( 1 0 0 0 2 0 ) \Sigma = \left( \begin{array}{ccc} 1 & 0 &0 \\ 0&2&0 \end{array} \right) Σ=(100200), Σ T = ( 1 0 0 2 0 0 ) \Sigma^T = \left( \begin{array}{ccc} 1 & 0 \\ 0&2\\0&0 \end{array} \right) ΣT=⎝⎛100020⎠⎞,则 Σ T Σ = ( 1 0 0 0 4 0 0 0 0 ) \Sigma^T\Sigma= \left( \begin{array}{ccc} 1 & 0 &0\\ 0&4&0\\0&0 &0 \end{array} \right) ΣTΣ=⎝⎛100040000⎠⎞, Σ Σ T = ( 1 0 0 4 ) \Sigma\Sigma^T= \left( \begin{array}{ccc} 1 & 0\\ 0&4 \end{array} \right) ΣΣT=(1004)。