矩阵的分解
1、特征值分解(EVD-eigen value decomposition)
1.1特征值与特征向量
设 A \boldsymbol{A} A是 n n n阶矩阵, λ \lambda λ是一个数,若存在 n n n维非零向量 x \boldsymbol{x} x,使得 A x = λ x \boldsymbol{Ax}=\lambda\boldsymbol{x} Ax=λx,则称 λ \lambda λ是矩阵 A \boldsymbol{A} A的特征值, x \boldsymbol{x} x是 A \boldsymbol{A} A对应于 λ \lambda λ的特征向量。
1.2特征值分解
我们知道一个矩阵是可以通过特征值和特征向量来表示,那假设存在一个
n
×
n
n×n
n×n的满秩矩阵
A
\boldsymbol{A}
A,我们便可以通过特征值将
A
\boldsymbol{A}
A分解。
A
=
U
Λ
U
−
1
=
U
Λ
U
T
\boldsymbol{A}=\boldsymbol{U}\boldsymbol{\Lambda}\boldsymbol{U}^{-1}=\boldsymbol{U}\boldsymbol{\Lambda}\boldsymbol{U}^{T}
A=UΛU−1=UΛUT
其中,
U
\boldsymbol{U}
U为特征向量组成的标准正交矩阵,
Λ
\boldsymbol{\Lambda}
Λ为特征值组成的对角阵。
2、奇异值分解(SVD-singular value decomposition)
在特征值分解时,
A
\boldsymbol{A}
A是
n
×
n
n×n
n×n的满秩矩阵,那如果
A
\boldsymbol{A}
A是一个
m
×
n
m×n
m×n的普通矩阵时,再想分解矩阵
A
\boldsymbol{A}
A就需要SVD了。此时的
A
\boldsymbol{A}
A虽然只是一个
m
×
n
m×n
m×n的普通矩阵,但是
A
T
A
\boldsymbol{A}^{T}\boldsymbol{A}
ATA是一个
n
×
n
n×n
n×n的对称阵,可以根据EVD来分解
A
T
A
\boldsymbol{A}^{T}\boldsymbol{A}
ATA
由特征值分解可知:
A
T
A
V
=
V
Λ
A
T
A
=
V
Λ
V
T
\boldsymbol{A}^{T}\boldsymbol{AV}=\boldsymbol{V\Lambda}\\\boldsymbol{A}^{T}\boldsymbol{A}=\boldsymbol{V\Lambda}\boldsymbol{V}^{T}
ATAV=VΛATA=VΛVT
其中,
V
\boldsymbol{V}
V为特征向量组成的标准正交矩阵,
Λ
\boldsymbol{\Lambda}
Λ为特征值组成的对角阵。
对于
A
V
\boldsymbol{AV}
AV,有
(
A
v
i
)
T
A
v
j
=
v
i
T
A
T
A
v
j
=
v
i
T
λ
j
v
j
=
0
(\boldsymbol{Av}_{i})^{T}\boldsymbol{Av}_{j}=\boldsymbol{v}^{T}_{i}\boldsymbol{A}^{T}\boldsymbol{A}\boldsymbol{v}_{j}=\boldsymbol{v}^{T}_{i}\lambda_{j}\boldsymbol{v}_{j}=\boldsymbol{0}
(Avi)TAvj=viTATAvj=viTλjvj=0,向量两两正交,满足正交阵第一个条件;
(
A
v
i
)
T
A
v
i
=
v
i
T
A
T
A
v
i
=
v
i
T
λ
i
v
i
=
λ
i
(\boldsymbol{Av}_{i})^{T}\boldsymbol{Av}_{i}=\boldsymbol{v}^{T}_{i}\boldsymbol{A}^{T}\boldsymbol{A}\boldsymbol{v}_{i}=\boldsymbol{v}^{T}_{i}\lambda_{i}\boldsymbol{v}_{i}=\lambda_{i}
(Avi)TAvi=viTATAvi=viTλivi=λi,
∣
∣
A
v
i
)
∣
∣
2
=
λ
i
||\boldsymbol{Av}_{i})||^{2}=\lambda_{i}
∣∣Avi)∣∣2=λi,将
A
v
i
\boldsymbol{Av}_{i}
Avi单位化,令
σ
i
=
λ
i
\sigma_{i}=\sqrt{\lambda_{i}}
σi=λi,则
A
v
i
∣
∣
A
v
i
∣
∣
=
A
v
i
σ
i
=
u
i
\frac{\boldsymbol{Av}_{i}}{||\boldsymbol{Av}_{i}||}=\frac{\boldsymbol{Av}_{i}}{\sigma_{i}}=\boldsymbol{u}_{i}
∣∣Avi∣∣Avi=σiAvi=ui,即
A
v
i
=
σ
i
u
i
\boldsymbol{Av}_{i}=\sigma_{i}\boldsymbol{u}_{i}
Avi=σiui,至此,各向量长度为单位长度,满足正交阵第二个条件。
综上所述,
m
×
n
m×n
m×n的矩阵
A
\boldsymbol{A}
A可以分解为:
A
=
U
Σ
V
T
\boldsymbol{A}=\boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^{T}
A=UΣVT
其中,
U
\boldsymbol{U}
U为
A
A
T
\boldsymbol{A}\boldsymbol{A}^{T}
AAT的特征向量,
V
\boldsymbol{V}
V为
A
T
A
\boldsymbol{A}^{T}\boldsymbol{A}
ATA的特征向量,
Σ
\boldsymbol{\Sigma}
Σ为对角元素为
σ
i
\sigma_{i}
σi的斜对角阵。
3、QR分解
若
n
n
n阶非奇异矩阵
A
n
×
n
\boldsymbol{A}_{n \times n}
An×n可以分解成正交矩阵
Q
n
×
n
\boldsymbol{Q}_{n\times n}
Qn×n和非奇异上三角矩阵
R
n
×
n
\boldsymbol{R}_{n \times n}
Rn×n的乘积,即
A
=
Q
R
\boldsymbol{A}=\boldsymbol{QR}
A=QR,则称该分解为
Q
R
\boldsymbol{QR}
QR分解
对于
m
×
n
m \times n
m×n的列满秩矩阵
A
\boldsymbol{A}
A,有
A
m
×
n
=
Q
m
×
n
⋅
R
n
×
n
\boldsymbol{A}_{m \times n}=\boldsymbol{Q}_{m \times n}\cdot \boldsymbol{R} _{n \times n}
Am×n=Qm×n⋅Rn×n 。其中
Q
\boldsymbol{Q}
Q为正交向量组,
R
\boldsymbol{R}
R为非奇异上三角矩阵,该分解也叫做
Q
R
\boldsymbol{QR}
QR分解。
施密特正交化:
设列向量
α
1
,
α
2
,
α
3
,
.
.
.
,
α
k
\boldsymbol{\alpha}_{1},\boldsymbol{\alpha}_{2},\boldsymbol{\alpha}_{3},...,\boldsymbol{\alpha}_{k}
α1,α2,α3,...,αk线性无关,令:
β
1
=
α
1
β
2
=
α
2
−
(
β
1
,
α
2
)
(
β
1
,
β
1
)
β
1
β
3
=
α
3
−
(
β
1
,
α
3
)
(
β
1
,
β
1
)
β
1
−
(
β
2
,
α
3
)
(
β
2
,
β
2
)
β
2
.
.
.
β
k
=
α
k
−
(
β
1
,
α
k
)
(
β
1
,
β
1
)
β
1
−
(
β
2
,
α
k
)
(
β
2
,
β
2
)
β
2
−
.
.
.
−
(
β
k
−
1
,
α
k
)
(
β
k
−
1
,
β
k
−
1
)
β
k
−
1
\begin{aligned}\boldsymbol{\beta}_{1}&=\boldsymbol{\alpha}_{1}\\ \boldsymbol{\beta}_{2}&=\boldsymbol{\alpha}_{2}-\frac{(\boldsymbol{\beta}_{1},\boldsymbol{\alpha}_{2})}{(\boldsymbol{\beta}_{1},\boldsymbol{\beta}_{1})}\boldsymbol{\beta}_{1}\\ \boldsymbol{\beta}_{3}&=\boldsymbol{\alpha}_{3}-\frac{(\boldsymbol{\beta}_{1},\boldsymbol{\alpha}_{3})}{(\boldsymbol{\beta}_{1},\boldsymbol{\beta}_{1})}\boldsymbol{\beta}_{1}-\frac{(\boldsymbol{\beta}_{2},\boldsymbol{\alpha}_{3})}{(\boldsymbol{\beta}_{2},\boldsymbol{\beta}_{2})}\boldsymbol{\beta}_{2}\\...\\\boldsymbol{\beta}_{k}&=\boldsymbol{\alpha}_{k}-\frac{(\boldsymbol{\beta}_{1},\boldsymbol{\alpha}_{k})}{(\boldsymbol{\beta}_{1},\boldsymbol{\beta}_{1})}\boldsymbol{\beta}_{1}-\frac{(\boldsymbol{\beta}_{2},\boldsymbol{\alpha}_{k})}{(\boldsymbol{\beta}_{2},\boldsymbol{\beta}_{2})}\boldsymbol{\beta}_{2}-...-\frac{(\boldsymbol{\beta}_{k-1},\boldsymbol{\alpha}_{k})}{(\boldsymbol{\beta}_{k-1},\boldsymbol{\beta}_{k-1})}\boldsymbol{\beta}_{k-1}\end{aligned}
β1β2β3...βk=α1=α2−(β1,β1)(β1,α2)β1=α3−(β1,β1)(β1,α3)β1−(β2,β2)(β2,α3)β2=αk−(β1,β1)(β1,αk)β1−(β2,β2)(β2,αk)β2−...−(βk−1,βk−1)(βk−1,αk)βk−1
则
β
1
,
β
2
,
β
3
,
.
.
.
,
β
k
\boldsymbol{\beta}_{1},\boldsymbol{\beta}_{2},\boldsymbol{\beta}_{3},...,\boldsymbol{\beta}_{k}
β1,β2,β3,...,βk两两正交,与
α
1
,
α
2
,
α
3
,
.
.
.
,
α
k
\boldsymbol{\alpha}_{1},\boldsymbol{\alpha}_{2},\boldsymbol{\alpha}_{3},...,\boldsymbol{\alpha}_{k}
α1,α2,α3,...,αk等价
令:
η
1
=
β
1
∣
∣
β
1
∣
∣
η
2
=
β
2
∣
∣
β
2
∣
∣
η
3
=
β
3
∣
∣
β
3
∣
∣
.
.
.
η
k
=
β
k
∣
∣
β
k
∣
∣
\boldsymbol{\eta}_{1}=\frac{\boldsymbol{\beta}_{1}}{||\boldsymbol{\beta}_{1}||}\\ \boldsymbol{\eta}_{2}=\frac{\boldsymbol{\beta}_{2}}{||\boldsymbol{\beta}_{2}||}\\\boldsymbol{\eta}_{3}=\frac{\boldsymbol{\beta}_{3}}{||\boldsymbol{\beta}_{3}||}\\...\\\boldsymbol{\eta}_{k}=\frac{\boldsymbol{\beta}_{k}}{||\boldsymbol{\beta}_{k}||}
η1=∣∣β1∣∣β1η2=∣∣β2∣∣β2η3=∣∣β3∣∣β3...ηk=∣∣βk∣∣βk
则
η
1
,
η
2
,
η
3
,
.
.
.
,
η
k
\boldsymbol{\eta}_{1},\boldsymbol{\eta}_{2},\boldsymbol{\eta}_{3},...,\boldsymbol{\eta}_{k}
η1,η2,η3,...,ηk两两正交,并且均为单位向量,是与
α
1
,
α
2
,
α
3
,
.
.
.
,
α
k
\boldsymbol{\alpha}_{1},\boldsymbol{\alpha}_{2},\boldsymbol{\alpha}_{3},...,\boldsymbol{\alpha}_{k}
α1,α2,α3,...,αk等价的标准正交组
系数矩阵:
由
α
和
β
\boldsymbol{\alpha}和\boldsymbol{\beta}
α和β的关系可知:
α
1
=
β
1
=
∣
∣
β
1
∣
∣
η
1
α
2
=
(
β
1
,
α
2
)
(
β
1
,
β
1
)
β
1
+
β
2
=
(
β
1
,
α
2
)
(
β
1
,
β
1
)
∣
∣
β
1
∣
∣
η
1
+
∣
∣
β
2
∣
∣
η
2
α
3
=
(
β
1
,
α
3
)
(
β
1
,
β
1
)
β
1
+
(
β
2
,
α
3
)
(
β
2
,
β
2
)
β
2
+
β
3
=
(
β
1
,
α
3
)
(
β
1
,
β
1
)
∣
∣
β
1
∣
∣
η
1
+
(
β
2
,
α
3
)
(
β
2
,
β
2
)
∣
∣
β
2
∣
∣
η
2
+
∣
∣
β
3
∣
∣
η
3
.
.
.
α
k
=
(
β
1
,
α
k
)
(
β
1
,
β
1
)
β
1
+
(
β
2
,
α
k
)
(
β
2
,
β
2
)
β
2
+
.
.
.
+
(
β
k
−
1
,
α
k
)
(
β
k
−
1
,
β
k
−
1
)
β
k
−
1
+
β
k
=
(
β
1
,
α
k
)
(
β
1
,
β
1
)
∣
∣
β
1
∣
∣
η
1
+
(
β
2
,
α
k
)
(
β
2
,
β
2
)
∣
∣
β
2
∣
∣
η
2
+
.
.
.
+
(
β
k
−
1
,
α
k
)
(
β
k
−
1
,
β
k
−
1
)
∣
∣
β
k
−
1
∣
∣
η
k
−
1
+
∣
∣
β
k
∣
∣
η
k
\begin{aligned}\boldsymbol{\alpha}_{1}&=\boldsymbol{\beta}_{1}=||\boldsymbol{\beta}_{1}||\boldsymbol{\eta}_{1}\\ \boldsymbol{\alpha}_{2}&=\frac{(\boldsymbol{\beta}_{1},\boldsymbol{\alpha}_{2})}{(\boldsymbol{\beta}_{1},\boldsymbol{\beta}_{1})}\boldsymbol{\beta}_{1}+\boldsymbol{\beta}_{2}\\&=\frac{(\boldsymbol{\beta}_{1},\boldsymbol{\alpha}_{2})}{(\boldsymbol{\beta}_{1},\boldsymbol{\beta}_{1})}||\boldsymbol{\beta}_{1}||\boldsymbol{\eta}_{1}+||\boldsymbol{\beta}_{2}||\boldsymbol{\eta}_{2}\\ \boldsymbol{\alpha}_{3}&=\frac{(\boldsymbol{\beta}_{1},\boldsymbol{\alpha}_{3})}{(\boldsymbol{\beta}_{1},\boldsymbol{\beta}_{1})}\boldsymbol{\beta}_{1}+\frac{(\boldsymbol{\beta}_{2},\boldsymbol{\alpha}_{3})}{(\boldsymbol{\beta}_{2},\boldsymbol{\beta}_{2})}\boldsymbol{\beta}_{2}+\boldsymbol{\beta}_{3}\\&=\frac{(\boldsymbol{\beta}_{1},\boldsymbol{\alpha}_{3})}{(\boldsymbol{\beta}_{1},\boldsymbol{\beta}_{1})}||\boldsymbol{\beta}_{1}||\boldsymbol{\eta}_{1}+\frac{(\boldsymbol{\beta}_{2},\boldsymbol{\alpha}_{3})}{(\boldsymbol{\beta}_{2},\boldsymbol{\beta}_{2})}||\boldsymbol{\beta}_{2}||\boldsymbol{\eta}_{2}+||\boldsymbol{\beta}_{3}||\boldsymbol{\eta}_{3}\\ ...\\\boldsymbol{\alpha}_{k}&=\frac{(\boldsymbol{\beta}_{1},\boldsymbol{\alpha}_{k})}{(\boldsymbol{\beta}_{1},\boldsymbol{\beta}_{1})}\boldsymbol{\beta}_{1}+\frac{(\boldsymbol{\beta}_{2},\boldsymbol{\alpha}_{k})}{(\boldsymbol{\beta}_{2},\boldsymbol{\beta}_{2})}\boldsymbol{\beta}_{2}+...+\frac{(\boldsymbol{\beta}_{k-1},\boldsymbol{\alpha}_{k})}{(\boldsymbol{\beta}_{k-1},\boldsymbol{\beta}_{k-1})}\boldsymbol{\beta}_{k-1}+\boldsymbol{\beta}_{k}\\&=\frac{(\boldsymbol{\beta}_{1},\boldsymbol{\alpha}_{k})}{(\boldsymbol{\beta}_{1},\boldsymbol{\beta}_{1})}||\boldsymbol{\beta}_{1}||\boldsymbol{\eta}_{1}+\frac{(\boldsymbol{\beta}_{2},\boldsymbol{\alpha}_{k})}{(\boldsymbol{\beta}_{2},\boldsymbol{\beta}_{2})}||\boldsymbol{\beta}_{2}||\boldsymbol{\eta}_{2}+...+\frac{(\boldsymbol{\beta}_{k-1},\boldsymbol{\alpha}_{k})}{(\boldsymbol{\beta}_{k-1},\boldsymbol{\beta}_{k-1})}||\boldsymbol{\beta}_{k-1}||\boldsymbol{\eta}_{k-1}+||\boldsymbol{\beta}_{k}||\boldsymbol{\eta}_{k}\end{aligned}
α1α2α3...αk=β1=∣∣β1∣∣η1=(β1,β1)(β1,α2)β1+β2=(β1,β1)(β1,α2)∣∣β1∣∣η1+∣∣β2∣∣η2=(β1,β1)(β1,α3)β1+(β2,β2)(β2,α3)β2+β3=(β1,β1)(β1,α3)∣∣β1∣∣η1+(β2,β2)(β2,α3)∣∣β2∣∣η2+∣∣β3∣∣η3=(β1,β1)(β1,αk)β1+(β2,β2)(β2,αk)β2+...+(βk−1,βk−1)(βk−1,αk)βk−1+βk=(β1,β1)(β1,αk)∣∣β1∣∣η1+(β2,β2)(β2,αk)∣∣β2∣∣η2+...+(βk−1,βk−1)(βk−1,αk)∣∣βk−1∣∣ηk−1+∣∣βk∣∣ηk
因此:
r
1
=
[
∣
∣
β
1
∣
∣
0
0
.
.
.
0
]
T
r
2
=
[
(
β
1
,
α
2
)
(
β
1
,
β
1
)
∣
∣
β
1
∣
∣
∣
∣
β
2
∣
∣
0
.
.
.
0
]
T
r
3
=
[
(
β
1
,
α
3
)
(
β
1
,
β
1
)
∣
∣
β
1
∣
∣
(
β
2
,
α
3
)
(
β
2
,
β
2
)
∣
∣
β
2
∣
∣
∣
∣
β
3
∣
∣
.
.
.
0
]
T
.
.
.
r
k
=
[
(
β
1
,
α
k
)
(
β
1
,
β
1
)
∣
∣
β
1
∣
∣
(
β
2
,
α
k
)
(
β
2
,
β
2
)
∣
∣
β
2
∣
∣
.
.
.
(
β
k
−
1
,
α
k
)
(
β
k
−
1
,
β
k
−
1
)
∣
∣
β
k
−
1
∣
∣
∣
∣
β
k
∣
∣
η
k
]
T
\begin{aligned}\boldsymbol{r}_{1}&=\begin{bmatrix}||\boldsymbol{\beta}_{1}||&0&0&...&0\end{bmatrix}^{T}\\ \boldsymbol{r}_{2}&=\begin{bmatrix}\frac{(\boldsymbol{\beta}_{1},\boldsymbol{\alpha}_{2})}{(\boldsymbol{\beta}_{1},\boldsymbol{\beta}_{1})}||\boldsymbol{\beta}_{1}||&||\boldsymbol{\beta}_{2}||&0&...&0\end{bmatrix}^{T}\\ \boldsymbol{r}_{3}&=\begin{bmatrix}\frac{(\boldsymbol{\beta}_{1},\boldsymbol{\alpha}_{3})}{(\boldsymbol{\beta}_{1},\boldsymbol{\beta}_{1})}||\boldsymbol{\beta}_{1}||&\frac{(\boldsymbol{\beta}_{2},\boldsymbol{\alpha}_{3})}{(\boldsymbol{\beta}_{2},\boldsymbol{\beta}_{2})}||\boldsymbol{\beta}_{2}||&||\boldsymbol{\beta}_{3}||&...&0\end{bmatrix}^{T}\\ ...\\\boldsymbol{r}_{k}&=\begin{bmatrix}\frac{(\boldsymbol{\beta}_{1},\boldsymbol{\alpha}_{k})}{(\boldsymbol{\beta}_{1},\boldsymbol{\beta}_{1})}||\boldsymbol{\beta}_{1}||&\frac{(\boldsymbol{\beta}_{2},\boldsymbol{\alpha}_{k})}{(\boldsymbol{\beta}_{2},\boldsymbol{\beta}_{2})}||\boldsymbol{\beta}_{2}||&...&\frac{(\boldsymbol{\beta}_{k-1},\boldsymbol{\alpha}_{k})}{(\boldsymbol{\beta}_{k-1},\boldsymbol{\beta}_{k-1})}||\boldsymbol{\beta}_{k-1}||&||\boldsymbol{\beta}_{k}||\boldsymbol{\eta}_{k}\end{bmatrix}^{T}\end{aligned}
r1r2r3...rk=[∣∣β1∣∣00...0]T=[(β1,β1)(β1,α2)∣∣β1∣∣∣∣β2∣∣0...0]T=[(β1,β1)(β1,α3)∣∣β1∣∣(β2,β2)(β2,α3)∣∣β2∣∣∣∣β3∣∣...0]T=[(β1,β1)(β1,αk)∣∣β1∣∣(β2,β2)(β2,αk)∣∣β2∣∣...(βk−1,βk−1)(βk−1,αk)∣∣βk−1∣∣∣∣βk∣∣ηk]T
QR分解:
1、写出矩阵
A
\boldsymbol{A}
A的列向量
α
1
,
α
2
,
α
3
,
.
.
.
,
α
k
\boldsymbol{\alpha}_{1},\boldsymbol{\alpha}_{2},\boldsymbol{\alpha}_{3},...,\boldsymbol{\alpha}_{k}
α1,α2,α3,...,αk
2、将
A
\boldsymbol{A}
A的列向量施密特正交化得到正交向量组
η
1
,
η
2
,
η
3
,
.
.
.
,
η
k
\boldsymbol{\eta}_{1},\boldsymbol{\eta}_{2},\boldsymbol{\eta}_{3},...,\boldsymbol{\eta}_{k}
η1,η2,η3,...,ηk,由此构成矩阵
Q
\boldsymbol{Q}
Q
3、把矩阵
A
\boldsymbol{A}
A的列向量表示成正交向量组
η
1
,
η
2
,
η
3
,
.
.
.
,
η
k
\boldsymbol{\eta}_{1},\boldsymbol{\eta}_{2},\boldsymbol{\eta}_{3},...,\boldsymbol{\eta}_{k}
η1,η2,η3,...,ηk的线性组合,其中列向量
r
1
,
r
2
,
r
3
,
.
.
.
,
r
k
\boldsymbol{r}_{1},\boldsymbol{r}_{2},\boldsymbol{r}_{3},...,\boldsymbol{r}_{k}
r1,r2,r3,...,rk构成系数矩阵
R
\boldsymbol{R}
R
4、
A
=
Q
R
\boldsymbol{A}=\boldsymbol{QR}
A=QR
例子: