Title: 矩阵乘操作、三角化、开方特征值 —— Umeyama 算法推导的数学准备 (III)
前言
本篇博客梳理一下如下三个概念
[1] 矩阵乘的操作
[2] 矩阵的上三角化
[3] 已知平方矩阵的特征值求原矩阵的特征值
主要目标是为了明确第三个概念, 作为点云配准的奇异值算法 (Umeyama 算法) 推导的一个数学基础.
当然这些基本的数学概念在教科书中都能找到[1], 写出来让自己理解更透彻一点.
I. 矩阵乘的操作
矩阵乘的列操作
假设矩阵 A \mathbf{A} A 和矩阵 P \mathbf{P} P 是两个 n n n 阶矩阵 (方阵). 矩阵 P \mathbf{P} P 写成列向量形式为 P = [ p 1 , … , p n ] \mathbf{P}=\begin{bmatrix}\mathbf{p}_1,\ldots, \mathbf{p}_n\end{bmatrix} P=[p1,…,pn]. 矩阵乘可以写成
A P = [ A p 1 ⋯ A p n ] (I-1) \mathbf{A}\mathbf{P} = \begin{bmatrix}\mathbf{A}\mathbf{p}_1 &\cdots & \mathbf{A}\mathbf{p}_n\end{bmatrix} \tag{I-1} AP=[Ap1⋯Apn](I-1)
如果矩阵 A \mathbf{A} A 是对角矩阵, 则
P [ a 1 ⋱ a n ] = [ a 1 p 1 ⋯ a n p n ] \mathbf{P}\begin{bmatrix}a_1 && \\ & \ddots &\\ &&a_n\end{bmatrix} = \begin{bmatrix}a_1 \mathbf{p}_1 &\cdots & a_n \mathbf{p}_n \end{bmatrix} P a1⋱an =[a1p1⋯anpn]
矩阵乘的行操作
假设矩阵 A \mathbf{A} A 和矩阵 P \mathbf{P} P 是两个 n n n 阶矩阵 (方阵). 矩阵 P \mathbf{P} P 写成行向量形式为 P = [ p 1 ⋮ p n ] \mathbf{P}=\begin{bmatrix}\mathbf{p}_1\\ \vdots\\ \mathbf{p}_n\end{bmatrix} P= p1⋮pn . 矩阵乘可以写成
P A = [ p 1 A ⋮ p n A ] (I-1) \mathbf{P} \mathbf{A} = \begin{bmatrix}\mathbf{p}_1 \mathbf{A} \\ \vdots \\ \mathbf{p}_n \mathbf{A}\end{bmatrix} \tag{I-1} PA= p1A⋮pnA (I-1)
如果矩阵 A \mathbf{A} A 是对角矩阵, 则
[ a 1 ⋱ a n ] P = [ a 1 p 1 ⋮ a n p n ] \begin{bmatrix}a_1 && \\ & \ddots &\\ &&a_n\end{bmatrix} \mathbf{P} = \begin{bmatrix}a_1 \mathbf{p}_1 \\ \vdots \\ a_n \mathbf{p}_n \end{bmatrix} a1⋱an P= a1p1⋮anpn
II. 上三角化
任何 n n n 阶矩阵 (方阵) 相似于一个上三角矩阵.
Proof[1]
利用归纳法证明.
如果 n = 1 n=1 n=1, 本身就是上三角矩阵, 结论成立.
假设 n − 1 n-1 n−1 时结论成立, 即任何 n − 1 n-1 n−1 阶矩阵相似于一个上三角矩阵.
如何推得 n n n 阶矩阵也成立?
假设 x 1 , x 2 , … , x n \mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n x1,x2,…,xn 是独立无关的列向量, 其中 x 1 \mathbf{x}_1 x1 是矩阵 A \mathbf{A} A 对应于特征值 λ 1 \lambda_1 λ1 的特征向量. 注意除了 x 1 \mathbf{x}_1 x1 以外其他列向量没有被要求是 A \mathbf{A} A 的特征向量.
定义
P
1
≜
[
x
1
x
2
…
x
n
]
(II-1)
\mathbf{P}_1 \triangleq \begin{bmatrix} \mathbf{x}_1 & \mathbf{x}_2 & \ldots & \mathbf{x}_n \end{bmatrix} \tag{II-1}
P1≜[x1x2…xn](II-1)
则根据上一节矩阵乘的操作有
A
P
1
=
[
A
x
1
A
x
2
⋯
A
x
n
]
=
[
λ
1
x
1
A
x
2
⋯
A
x
n
]
(II-2)
\mathbf{A}\mathbf{P}_1 = \begin{bmatrix} \mathbf{A} \mathbf{x}_1 & \mathbf{A} \mathbf{x}_2 &\cdots & \mathbf{A} \mathbf{x}_n \end{bmatrix} = \begin{bmatrix} \lambda_1 \mathbf{x}_1 & \mathbf{A} \mathbf{x}_2 &\cdots & \mathbf{A} \mathbf{x}_n \end{bmatrix} \tag{II-2}
AP1=[Ax1Ax2⋯Axn]=[λ1x1Ax2⋯Axn](II-2)
又因为逆矩阵的性质
P
1
−
1
P
1
=
P
1
−
1
[
x
1
x
2
…
x
n
]
=
[
P
1
−
1
x
1
P
1
−
1
x
2
…
P
1
−
1
x
n
]
=
[
1
⋱
1
]
(II-3)
\begin{aligned} \mathbf{P}_1^{-1} \mathbf{P}_1 & = \mathbf{P}_1^{-1} \begin{bmatrix} \mathbf{x}_1 & \mathbf{x}_2 & \ldots & \mathbf{x}_n \end{bmatrix}\\ & = \begin{bmatrix} \mathbf{P}_1^{-1} \mathbf{x}_1 & \mathbf{P}_1^{-1} \mathbf{x}_2 & \ldots & \mathbf{P}_1^{-1} \mathbf{x}_n \end{bmatrix}\\ &=\begin{bmatrix}1 && \\ &\ddots &\\ &&1\end{bmatrix} \end{aligned} \tag{II-3}
P1−1P1=P1−1[x1x2…xn]=[P1−1x1P1−1x2…P1−1xn]=
1⋱1
(II-3)
可以知道
P
1
−
1
x
1
=
[
1
0
⋮
0
]
\mathbf{P}_1^{-1} \mathbf{x}_1 = \begin{bmatrix}1\\0\\ \vdots\\ 0\end{bmatrix}
P1−1x1=
10⋮0
及
P
1
−
1
λ
1
x
1
=
[
λ
1
0
⋮
0
]
\mathbf{P}_1^{-1} \lambda_1 \mathbf{x}_1 = \begin{bmatrix}\lambda_1\\0\\ \vdots\\ 0\end{bmatrix}
P1−1λ1x1=
λ10⋮0
.
故有
P
1
−
1
A
P
1
=
P
1
−
1
[
λ
1
x
1
A
x
2
⋯
A
x
n
]
=
[
P
1
−
1
λ
1
x
1
P
1
−
1
A
x
2
⋯
P
1
−
1
A
x
n
]
=
[
λ
1
b
12
⋯
b
1
n
0
⋮
0
A
1
]
(II-4)
\begin{aligned} \mathbf{P}_1^{-1} \mathbf{A} \mathbf{P}_1 &= \mathbf{P}_1^{-1} \begin{bmatrix} \lambda_1 \mathbf{x}_1 & \mathbf{A} \mathbf{x}_2 &\cdots & \mathbf{A} \mathbf{x}_n \end{bmatrix} \\ &= \begin{bmatrix} \mathbf{P}_1^{-1} \lambda_1 \mathbf{x}_1 & \mathbf{P}_1^{-1} \mathbf{A} \mathbf{x}_2 &\cdots & \mathbf{P}_1^{-1} \mathbf{A} \mathbf{x}_n \end{bmatrix} \\ &= \left[\begin{array}{c:c} \lambda_1 & \begin{array}{ccc} b_{12} & \cdots & b_{1n}\end{array}\\ \hdashline \begin{array}{c} 0\\ \vdots\\ 0 \end{array} & \mathbf{A}_1 \end{array}\right] \end{aligned} \tag{II-4}
P1−1AP1=P1−1[λ1x1Ax2⋯Axn]=[P1−1λ1x1P1−1Ax2⋯P1−1Axn]=
λ10⋮0b12⋯b1nA1
(II-4)
由归纳法对
n
−
1
n-1
n−1 情况的假设可知, 存在
n
−
1
n-1
n−1 阶方阵
Q
\mathbf{Q}
Q 使得
Q
−
1
A
1
Q
=
[
λ
2
∗
⋱
λ
n
]
(II-5)
\mathbf{Q}^{-1} \mathbf{A}_1 \mathbf{Q} = \begin{bmatrix} \lambda_2 &&\ast \\ &\ddots & \\ &&\lambda_n\end{bmatrix} \tag{II-5}
Q−1A1Q=
λ2⋱∗λn
(II-5)
令
P
2
≜
[
1
0
⋯
0
0
⋮
0
Q
]
(II-6)
\mathbf{P}_2 \triangleq \left[\begin{array}{c:c} 1 & \begin{array}{ccc} 0 & \cdots & 0\end{array}\\ \hdashline \begin{array}{c} 0\\ \vdots\\ 0 \end{array} & \mathbf{Q} \end{array}\right] \tag{II-6}
P2≜
10⋮00⋯0Q
(II-6)
则其逆为
P
2
−
1
≜
[
1
0
⋯
0
0
⋮
0
Q
−
1
]
(II-7)
\mathbf{P}_2^{-1} \triangleq \left[\begin{array}{c:c} 1 & \begin{array}{ccc} 0 & \cdots & 0\end{array}\\ \hdashline \begin{array}{c} 0\\ \vdots\\ 0 \end{array} & \mathbf{Q}^{-1} \end{array}\right] \tag{II-7}
P2−1≜
10⋮00⋯0Q−1
(II-7)
同时令
P
≜
P
1
P
2
(II-8)
\mathbf{P} \triangleq \mathbf{P}_1 \mathbf{P}_2 \tag{II-8}
P≜P1P2(II-8)
计算
P
−
1
A
P
=
(
P
1
P
2
)
−
1
A
(
P
1
P
2
)
=
P
2
−
1
(
P
1
−
1
A
P
1
)
P
2
(II-4)
=
P
2
−
1
[
λ
1
b
12
⋯
b
1
n
0
⋮
0
A
1
]
P
2
(II-6),(II-7)
=
[
1
0
⋯
0
0
⋮
0
Q
−
1
]
[
λ
1
b
12
⋯
b
1
n
0
⋮
0
A
1
]
[
1
0
⋯
0
0
⋮
0
Q
]
=
[
λ
1
0
⋯
0
0
⋮
0
Q
−
1
A
1
Q
]
(II-5)
=
[
λ
1
λ
2
∗
⋱
λ
n
]
(II-9)
\begin{aligned} \mathbf{P}^{-1} \mathbf{A} \mathbf{P} & = (\mathbf{P}_1 \mathbf{P}_2)^{-1} \mathbf{A} (\mathbf{P}_1 \mathbf{P}_2) = \mathbf{P}_2^{-1} (\mathbf{P}_1^{-1} \mathbf{A} \mathbf{P}_1) \mathbf{P}_2\\ {\small \text{(II-4)}} \quad & = \mathbf{P}_2^{-1} \left[\begin{array}{c:c} \lambda_1 & \begin{array}{ccc} b_{12} & \cdots & b_{1n}\end{array}\\ \hdashline \begin{array}{c} 0\\ \vdots\\ 0 \end{array} & \mathbf{A}_1 \end{array}\right] \mathbf{P}_2\\ {\small \text{(II-6),(II-7)}} \quad &= \left[\begin{array}{c:c} 1 & \begin{array}{ccc} 0 & \cdots & 0\end{array}\\ \hdashline \begin{array}{c} 0\\ \vdots\\ 0 \end{array} & \mathbf{Q}^{-1} \end{array}\right] \left[\begin{array}{c:c} \lambda_1 & \begin{array}{ccc} b_{12} & \cdots & b_{1n}\end{array}\\ \hdashline \begin{array}{c} 0\\ \vdots\\ 0 \end{array} & \mathbf{A}_1 \end{array}\right] \left[\begin{array}{c:c} 1 & \begin{array}{ccc} 0 & \cdots & 0\end{array}\\ \hdashline \begin{array}{c} 0\\ \vdots\\ 0 \end{array} & \mathbf{Q} \end{array}\right]\\ &= \left[\begin{array}{c:c} \lambda_1 & \begin{array}{ccc} 0 & \cdots & 0\end{array}\\ \hdashline \begin{array}{c} 0\\ \vdots\\ 0 \end{array} &\mathbf{Q}^{-1} \mathbf{A}_1 \mathbf{Q} \end{array}\right]\\ {\small \text{(II-5)}}\quad &=\left[ \begin{array}{c:c} \lambda_1\\ \hdashline &\begin{matrix} \lambda_2 &&\ast \\ &\ddots & \\ &&\lambda_n\end{matrix} \end{array}\right] \end{aligned} \tag{II-9}
P−1AP(II-4)(II-6),(II-7)(II-5)=(P1P2)−1A(P1P2)=P2−1(P1−1AP1)P2=P2−1
λ10⋮0b12⋯b1nA1
P2=
10⋮00⋯0Q−1
λ10⋮0b12⋯b1nA1
10⋮00⋯0Q
=
λ10⋮00⋯0Q−1A1Q
=
λ1λ2⋱∗λn
(II-9)
这样推导得到归纳法的
n
n
n 阶矩阵情况也成立.
证明完毕.
III. 矩阵开平放的特征值
假设 A \mathbf{A} A 是一个复矩阵. λ \lambda λ 是 A 2 \mathbf{A}^2 A2 的特征值. 则可证明 λ \sqrt{\lambda} λ 或者 − λ -\sqrt{\lambda} −λ 是 A \mathbf{A} A 的特征值.
Proof[1]
根据上一小节以证明的定理 “任何
n
n
n 阶矩阵 (方阵) 相似于一个上三角矩阵”, 假设矩阵
A
\mathbf{A}
A 相似于上三角矩阵
B
\mathbf{B}
B, 即
P
−
1
A
P
=
B
≜
[
b
11
∗
⋱
b
n
n
]
(III-1)
\mathbf{P}^{-1} \mathbf{A} \mathbf{P} = \mathbf{B} \triangleq \begin{bmatrix} b_{11} &&\ast\\ &\ddots &\\ &&b_{nn}\end{bmatrix} \tag{III-1}
P−1AP=B≜
b11⋱∗bnn
(III-1)
那么
P
−
1
A
2
P
=
B
2
≜
[
b
11
2
∗
⋱
b
n
n
2
]
(III-2)
\mathbf{P}^{-1} {\mathbf{A}^{2}} \mathbf{P} = \mathbf{B}^2 \triangleq \begin{bmatrix} b_{11}^2 &&\ast\\ &\ddots &\\ &&b_{nn}^2\end{bmatrix} \tag{III-2}
P−1A2P=B2≜
b112⋱∗bnn2
(III-2)
说明这种情况下
A
2
\mathbf{A}^2
A2 与
B
2
\mathbf{B}^2
B2 也相似.
如下算式说明相似矩阵的特征方程是一致的.
∣
P
−
1
A
2
P
−
λ
I
∣
=
∣
P
−
1
∣
⋅
∣
A
2
−
λ
I
∣
⋅
∣
P
∣
=
∣
P
−
1
∣
⋅
∣
P
∣
⋅
∣
A
2
−
λ
I
∣
=
∣
P
−
1
P
∣
⋅
∣
A
2
−
λ
I
∣
=
∣
A
2
−
λ
I
∣
(III-3)
\begin{aligned} \left|\mathbf{P}^{-1} {\mathbf{A}^{2}} \mathbf{P} - \lambda \mathbf{I} \right| & = \left|\mathbf{P}^{-1}\right| \cdot \left| {\mathbf{A}^{2}} - \lambda \mathbf{I} \right| \cdot\left| \mathbf{P}\right|\\ & = \left|\mathbf{P}^{-1}\right| \cdot\left| \mathbf{P}\right| \cdot \left| {\mathbf{A}^{2}} - \lambda \mathbf{I} \right|\\ & = \left|\mathbf{P}^{-1} \mathbf{P}\right| \cdot \left| {\mathbf{A}^{2}} - \lambda \mathbf{I} \right|\\ & = \left| {\mathbf{A}^{2}} - \lambda \mathbf{I} \right| \end{aligned} \tag{III-3}
P−1A2P−λI
=
P−1
⋅
A2−λI
⋅∣P∣=
P−1
⋅∣P∣⋅
A2−λI
=
P−1P
⋅
A2−λI
=
A2−λI
(III-3)
特征方程的根就是特征值, 因为
P
−
1
A
2
P
\mathbf{P}^{-1} {\mathbf{A}^{2}} \mathbf{P}
P−1A2P 与
A
2
\mathbf{A}^{2}
A2 特征方程一致, 故特征值也一样. 也就是相似矩阵的特征值一致.
因为
λ
\lambda
λ 是
A
2
\mathbf{A}^2
A2 的特征值, 故也是
B
2
\mathbf{B}^2
B2 的特征值. 则
λ
\lambda
λ 必为上三角矩阵
B
2
\mathbf{B}^2
B2 的对角线上的元素之一, 如
λ
=
b
k
k
2
\lambda = b_{kk}^2
λ=bkk2
那么
A
\mathbf{A}
A 的特征值 (也就是
B
\mathbf{B}
B 的特征值)
b
k
k
=
λ
or
b
k
k
=
−
λ
b_{kk} = \sqrt{\lambda} \quad \text{or} \quad b_{kk} = -\sqrt{\lambda}
bkk=λorbkk=−λ
证明完毕.
参考文献
[1] C Y Hsiung, G Y Mao, Linear Algebra, World Scientific Publishing Company, 1998