矩阵的秩一校正在最优化中经常用到。那什么是秩一校正呢?以下主要内容部分抄自抄袁亚湘的《最优化理论与方法》。
定义:秩一校正( rank-1 update)
设
A
∈
R
n
×
n
A\in R^{n \times n}
A∈Rn×n 是非奇异矩阵,
u
,
v
∈
R
n
×
1
u,v\in R^{n\times 1}
u,v∈Rn×1 是任意向量,则称
A
+
u
v
T
A+uv^T
A+uvT 是
A
A
A 的秩一校正。
其实,
u
v
T
uv^T
uvT 就是一个秩为1的
n
×
n
n\times n
n×n 矩阵,“秩一校正”因而得名。
定理 1.2.6(Sherman-Morrison定理)
(Computes the inverse of the sum of an invertible matrix
A
A
A and the outer product,
u
v
T
uv^T
uvT, of vectors
u
u
u and
v
v
v.)
若
1
+
v
T
A
−
1
u
≠
0
(
1.2.36
)
1+v^TA^{-1}u\neq 0 \qquad(1.2.36)
1+vTA−1u̸=0(1.2.36)
则
A
A
A 的秩一校正
A
+
u
v
T
A+uv^T
A+uvT 也是非奇异的,且其逆矩阵可以表示为
(
A
+
u
v
T
)
−
1
=
A
−
1
−
A
−
1
u
v
T
A
−
1
1
+
v
T
A
−
1
u
(
1.2.37
)
(A+uv^T)^{-1}=A^{-1}-\frac{A^{-1}uv^TA^{-1}}{1+v^TA^{-1}u}\qquad(1.2.37)
(A+uvT)−1=A−1−1+vTA−1uA−1uvTA−1(1.2.37)
证明:
(
A
+
u
v
T
)
(
A
−
1
−
A
−
1
u
v
T
A
−
1
1
+
v
T
A
−
1
u
)
=
A
A
−
1
+
u
v
T
A
−
1
−
u
v
T
A
−
1
+
u
v
T
A
−
1
u
v
T
A
−
1
1
+
v
T
A
−
1
u
=
I
+
u
v
T
A
−
1
−
u
(
1
+
v
T
A
−
1
u
)
v
T
A
−
1
1
+
v
T
A
−
1
u
=
I
+
u
v
T
A
−
1
−
u
v
T
A
−
1
=
I
(A+uv^T)\left( A^{-1}-\frac{A^{-1}uv^TA^{-1}}{1+v^TA^{-1}u}\right)\\ \text{ } \\ =AA^{-1}+uv^TA^{-1} - \frac{uv^TA^{-1}+uv^TA^{-1}uv^TA^{-1}}{1+v^TA^{-1}u}\\ \text{ } \\ =I+uv^TA^{-1}-\frac{u(1+v^TA^{-1}u)v^TA^{-1}}{1+v^TA^{-1}u}\\ \text{ } \\=I+uv^TA^{-1}-uv^TA^{-1}=I
(A+uvT)(A−1−1+vTA−1uA−1uvTA−1) =AA−1+uvTA−1−1+vTA−1uuvTA−1+uvTA−1uvTA−1 =I+uvTA−1−1+vTA−1uu(1+vTA−1u)vTA−1 =I+uvTA−1−uvTA−1=I
它有什么用呢:(摘自:https://en.wikipedia.org/wiki/Sherman–Morrison_formula)
If the inverse of
A
A
A is already known, the formula provides a numerically cheap way
to compute the inverse of
A
A
A corrected by the matrix
u
v
T
uv^{T}
uvT (depending on the point of view, the correction may be seen as a perturbation or as a rank-1 update
). The computation is relatively cheap because the inverse of
A
+
u
v
T
A+uv^{T}
A+uvT does not have to be computed from scratch (which in general is expensive), but can be computed by correcting (or perturbing)
A
−
1
A^{-1}
A−1.
上述定理的推广为:
定理 1.2.7(Sherman-Morrison-Woodburg 定理)
设
A
∈
R
n
×
n
A\in R^{n \times n}
A∈Rn×n 是非奇异矩阵,
U
,
V
U,V
U,V 是
n
×
m
n\times m
n×m 矩阵,若
I
+
V
∗
A
−
1
U
I+V^*A^{-1}U
I+V∗A−1U,则
A
+
U
V
∗
A+UV^*
A+UV∗ 可逆,且:
(
A
+
U
V
∗
)
−
1
=
A
−
1
−
A
−
1
U
(
I
+
V
∗
A
−
1
U
)
−
1
V
∗
A
−
1
(A+UV^*)^{-1}=A^{-1}-A^{-1}U(I+V^*A^{-1}U)^{-1}V^*A^{-1}
(A+UV∗)−1=A−1−A−1U(I+V∗A−1U)−1V∗A−1
它的应用:(摘自https://en.wikipedia.org/wiki/Woodbury_matrix_identity)
This identity is useful in certain numerical computations where
A
−
1
A^{−1}
A−1 has already been computed and it is desired to compute
(
A
+
U
C
V
)
−
1
(A + UCV)^{−1}
(A+UCV)−1. With the inverse of A available, it is only necessary to find the inverse of
C
−
1
+
V
A
−
1
U
C^{−1} + VA^{−1}U
C−1+VA−1U in order to obtain the result using the right-hand side of the identity. If
C
C
C has a much smaller dimension than A
, this is more efficient than inverting
A
+
U
C
V
A + UCV
A+UCV directly.
关于秩一校正的行列式,有:
d
e
t
(
I
+
u
v
T
)
=
1
+
u
T
v
(
1.2.39
)
det(I+uv^T)=1+u^Tv\qquad(1.2.39)
det(I+uvT)=1+uTv(1.2.39)
事实上,可以假定
u
≠
0
u\neq 0
u̸=0,注意到
I
+
u
v
T
I+uv^T
I+uvT 的特征向量或者直交于
v
v
v,或者平行于
u
u
u。如果直交于
v
v
v,则特征值为1;如果平行于
u
u
u,则特征值为
1
+
u
T
v
1+u^Tv
1+uTv,从而得到(1.2.39)的结果。
证明:假设
ξ
\xi
ξ 是
I
+
u
v
T
I+uv^T
I+uvT 的特征向量,它对应的特征值是
λ
\lambda
λ,则有:
(
I
+
u
v
T
)
ξ
=
λ
ξ
ξ
+
u
v
T
ξ
=
λ
ξ
u
v
T
ξ
=
(
λ
−
1
)
ξ
(I+uv^T)\xi = \lambda \xi \\ \xi + uv^T \xi = \lambda \xi\\ uv^T \xi = (\lambda-1) \xi
(I+uvT)ξ=λξξ+uvTξ=λξuvTξ=(λ−1)ξ
由于
u
v
T
uv^T
uvT 是秩为1的矩阵,因而它可经过线性变换为:
u
v
T
→
[
λ
0
0
⋯
0
0
λ
0
⋯
0
⋮
⋮
⋱
⋮
0
0
⋯
λ
1
]
uv^T \to \left[\begin{array}{c c c c}\lambda_0 &0 &\cdots &0 \\ 0 & \lambda_0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_1 \end{array}\right]
uvT→⎣⎢⎢⎢⎡λ00⋮00λ0⋮0⋯⋯⋱⋯00⋮λ1⎦⎥⎥⎥⎤
因而它只有两个特征值:
λ
0
=
0
\lambda_0 = 0
λ0=0(有n-1个) 和
λ
1
\lambda_1
λ1(只有1个),
(1)若
λ
=
λ
0
+
1
=
1
\lambda = \lambda_0 + 1=1
λ=λ0+1=1,即
u
v
T
ξ
=
0
⇒
v
T
ξ
=
0
uv^T \xi =0 \Rightarrow v^T \xi =0
uvTξ=0⇒vTξ=0,即:
I
+
u
v
T
I+uv^T
I+uvT 的特征向量直交于
v
v
v,特征值为1;
(2)若
λ
=
λ
1
+
1
\lambda = \lambda_1 + 1
λ=λ1+1,有:
u
v
T
ξ
=
λ
1
ξ
.
ξ
T
u
v
T
ξ
=
λ
1
ξ
T
ξ
=
λ
1
(
∥
ξ
∥
2
=
1
)
.
ξ
T
u
v
T
ξ
=
λ
1
.
ξ
T
ξ
u
T
v
=
u
T
v
=
λ
1
⇒
λ
=
u
T
v
+
1
uv^T \xi = \lambda_1 \xi\\ . \\ \xi^Tuv^T \xi = \lambda_1 \xi^T\xi=\lambda_1\quad(\Vert\xi\Vert_2=1)\\ . \\ \xi^Tuv^T \xi=\lambda_1 \\ . \\ \xi^T \xi u^T v = u^Tv=\lambda_1 \Rightarrow \lambda=u^Tv+1
uvTξ=λ1ξ.ξTuvTξ=λ1ξTξ=λ1(∥ξ∥2=1).ξTuvTξ=λ1.ξTξuTv=uTv=λ1⇒λ=uTv+1
即:
I
+
u
v
T
I+uv^T
I+uvT 的特征向量平行于
u
u
u,特征值为
u
T
v
+
1
u^Tv+1
uTv+1。
进一步,对于秩二校正
,有:
det
(
I
+
u
1
u
2
T
+
u
3
u
4
T
)
=
(
1
+
u
1
T
u
2
)
(
1
+
u
3
T
u
4
)
−
(
u
1
T
u
4
)
(
u
2
T
u
3
)
(
1.2.40
)
\det(I+u_1u_2^T+u_3u_4^T)\\ =(1+u_1^Tu_2)(1+u_3^Tu_4)-(u_1^Tu_4)(u_2^Tu_3)\qquad(1.2.40)
det(I+u1u2T+u3u4T)=(1+u1Tu2)(1+u3Tu4)−(u1Tu4)(u2Tu3)(1.2.40)
事实上,只要注意到:
I
+
u
1
u
2
T
+
u
3
u
4
T
=
(
I
+
u
1
u
2
T
)
[
I
+
(
I
+
u
1
u
2
T
)
−
1
u
3
u
4
T
]
⇒
d
e
t
(
I
+
u
1
u
2
T
+
u
3
u
4
T
)
=
(
1
+
u
1
T
u
2
)
[
1
+
u
4
T
(
I
−
u
1
u
2
T
1
+
u
1
T
u
2
)
u
3
]
=
(
1
+
u
1
T
u
2
)
(
1
+
u
3
T
u
4
)
−
(
u
1
T
u
4
)
(
u
2
T
u
3
)
I+u_1u_2^T+u_3u_4^T=(I+u_1u_2^T)[I+(I+u_1u_2^T)^{-1}u_3u_4^T]\\ \text{ } \\ \Rightarrow det(I+u_1u_2^T+u_3u_4^T)=(1+u_1^Tu_2)\left[1+u_4^T\left(I-\frac{u_1u_2^T}{1+u_1^Tu_2}\right)u_3\right]\\ \text{ } \\ =(1+u_1^Tu_2)(1+u_3^Tu_4)-(u_1^Tu_4)(u_2^Tu_3)
I+u1u2T+u3u4T=(I+u1u2T)[I+(I+u1u2T)−1u3u4T] ⇒det(I+u1u2T+u3u4T)=(1+u1Tu2)[1+u4T(I−1+u1Tu2u1u2T)u3] =(1+u1Tu2)(1+u3Tu4)−(u1Tu4)(u2Tu3)
注意到
∥
A
∥
F
2
=
t
r
(
A
T
A
)
\Vert A \Vert_F^2=tr(A^TA)
∥A∥F2=tr(ATA),故秩一校正矩阵
A
+
x
y
T
A+xy^T
A+xyT 的Frobenius 范数
为:
∥
A
+
x
y
T
∥
F
2
=
∥
A
∥
F
2
+
2
y
T
A
T
x
+
∥
x
∥
F
2
∥
y
∥
F
2
(
1.2.41
)
\Vert A+xy^T\Vert_F^2=\Vert A \Vert_F^2+2y^TA^Tx+\Vert x \Vert_F^2\Vert y \Vert_F^2 \qquad(1.2.41)
∥A+xyT∥F2=∥A∥F2+2yTATx+∥x∥F2∥y∥F2(1.2.41)
又设
P
∈
R
n
×
n
P\in R^{n\times n}
P∈Rn×n,
P
=
I
−
x
y
T
∥
x
∥
∥
y
∥
(
1.2.42
)
P=I-\frac{xy^T}{\Vert x \Vert \Vert y \Vert}\qquad(1.2.42)
P=I−∥x∥∥y∥xyT(1.2.42)
显然,
P
P
P 有
n
−
1
n-1
n−1 个特征值为1,利用(1.2.40),考虑
P
T
P
P^TP
PTP的最大特征值,可知:
∥
P
∥
2
=
y
T
x
∥
x
∥
∥
y
∥
(
1.2.43
)
\Vert P \Vert_2=\frac{y^Tx}{\Vert x \Vert \Vert y \Vert}\qquad(1.2.43)
∥P∥2=∥x∥∥y∥yTx(1.2.43)
证明:
考虑
∥
P
∥
2
=
(
λ
P
T
P
)
1
/
2
\Vert P\Vert_2=(\lambda_{P^TP})^{1/2}
∥P∥2=(λPTP)1/2 即
P
P
P 矩阵的谱范数,
λ
P
T
P
\lambda_{P^TP}
λPTP 表示
P
T
P
P^TP
PTP 的最大特征值。
P
T
P
=
(
I
−
x
y
T
∥
x
∥
∥
y
∥
)
T
(
I
−
x
y
T
∥
x
∥
∥
y
∥
)
=
(
I
−
y
x
T
∥
x
∥
∥
y
∥
)
(
I
−
x
y
T
∥
x
∥
∥
y
∥
)
=
I
−
y
x
T
∥
x
∥
∥
y
∥
−
x
y
T
∥
x
∥
∥
y
∥
+
y
x
T
x
y
T
∥
x
∥
2
∥
y
∥
2
=
I
−
y
x
T
∥
x
∥
∥
y
∥
−
x
y
T
∥
x
∥
∥
y
∥
+
y
y
T
∥
y
∥
2
=
I
−
y
x
T
∥
x
∥
∥
y
∥
+
(
y
∥
y
∥
−
x
∥
x
∥
)
y
T
∥
y
∥
P^TP=\left(I-\frac{xy^T}{\Vert x \Vert \Vert y \Vert}\right)^T\left(I-\frac{xy^T}{\Vert x \Vert \Vert y \Vert}\right)\\ \text{ } \\ = \left(I-\frac{yx^T}{\Vert x \Vert \Vert y \Vert}\right)\left(I-\frac{xy^T}{\Vert x \Vert \Vert y \Vert}\right)\\ \text{ } \\ =I-\frac{yx^T}{\Vert x \Vert \Vert y \Vert} -\frac{xy^T}{\Vert x \Vert \Vert y \Vert}+\frac{yx^Txy^T}{\Vert x \Vert^2 \Vert y \Vert^2}\\ \text{ } \\ = I-\frac{yx^T}{\Vert x \Vert \Vert y \Vert}-\frac{xy^T}{\Vert x \Vert \Vert y \Vert}+\frac{yy^T}{\Vert y \Vert^2}\\ \\ \text{ } \\ = I-\frac{yx^T}{\Vert x \Vert \Vert y \Vert}+\left(\frac{y}{\Vert y \Vert} -\frac{x}{\Vert x \Vert}\right)\frac{y^T}{\Vert y \Vert}
PTP=(I−∥x∥∥y∥xyT)T(I−∥x∥∥y∥xyT) =(I−∥x∥∥y∥yxT)(I−∥x∥∥y∥xyT) =I−∥x∥∥y∥yxT−∥x∥∥y∥xyT+∥x∥2∥y∥2yxTxyT =I−∥x∥∥y∥yxT−∥x∥∥y∥xyT+∥y∥2yyT =I−∥x∥∥y∥yxT+(∥y∥y−∥x∥x)∥y∥yT
这是个秩二校正矩阵,令
u
1
=
y
∥
y
∥
,
u
2
=
−
x
∥
x
∥
,
u
3
=
y
∥
y
∥
−
x
∥
x
∥
,
u
4
=
y
∥
y
∥
u_1=\frac{y}{\Vert y \Vert}\quad,u_2=-\frac{x}{\Vert x \Vert}\quad,u_3=\frac{y}{\Vert y \Vert}-\frac{x}{\Vert x \Vert}\quad,u_4=\frac{y}{\Vert y \Vert}
u1=∥y∥y,u2=−∥x∥x,u3=∥y∥y−∥x∥x,u4=∥y∥y
因为
y
T
y
=
∥
y
∥
2
,
x
T
x
=
∥
x
∥
2
y^Ty=\Vert y \Vert^2,\quad x^Tx=\Vert x \Vert^2
yTy=∥y∥2,xTx=∥x∥2,代入(1.2.40),有:
det
(
P
T
P
)
=
(
1
+
u
1
T
u
2
)
(
1
+
u
3
T
u
4
)
−
(
u
1
T
u
4
)
(
u
2
T
u
3
)
=
(
1
−
y
T
x
∥
y
∥
∥
x
∥
)
(
1
+
y
T
y
∥
y
∥
2
−
x
T
y
∥
x
∥
∥
y
∥
)
+
y
T
∥
y
∥
y
∥
y
∥
x
T
∥
x
∥
(
y
∥
y
∥
−
x
∥
x
∥
)
=
(
1
−
y
T
x
∥
y
∥
∥
x
∥
)
(
2
−
x
T
y
∥
y
∥
∥
x
∥
)
+
x
T
y
∥
x
∥
∥
y
∥
−
1
=
2
−
2
y
T
x
∥
y
∥
∥
x
∥
−
x
T
y
∥
y
∥
∥
x
∥
+
y
T
x
∥
y
∥
∥
x
∥
x
T
y
∥
y
∥
∥
x
∥
+
x
T
y
∥
x
∥
∥
y
∥
−
1
=
1
−
2
y
T
x
∥
y
∥
∥
x
∥
+
y
T
x
∥
y
∥
∥
x
∥
x
T
y
∥
y
∥
∥
x
∥
=
(
1
−
y
T
x
∥
y
∥
∥
x
∥
)
2
\det(P^TP)=(1+u_1^Tu_2)(1+u_3^Tu_4)-(u_1^Tu_4)(u_2^Tu_3)\\ \text{ } \\ =\left(1-\frac{y^Tx}{\Vert y \Vert \Vert x \Vert}\right)\left(1+\frac{y^Ty}{\Vert y\Vert^2}-\frac{x^Ty}{\Vert x\Vert \Vert y\Vert} \right)+\frac{y^T}{\Vert y \Vert} \frac{y}{\Vert y \Vert}\frac{x^T}{\Vert x \Vert}\left(\frac{y}{\Vert y \Vert}-\frac{x}{\Vert x \Vert}\right) \\ \text{ } \\ = \left(1-\frac{y^Tx}{\Vert y \Vert \Vert x\Vert} \right)\left(2-\frac{x^Ty}{\Vert y \Vert \Vert x\Vert} \right)+\frac{x^Ty}{\Vert x \Vert\Vert y \Vert}-1\\ \text{ } \\ = 2- 2\frac{y^Tx}{\Vert y \Vert \Vert x\Vert}-\frac{x^Ty}{\Vert y \Vert \Vert x\Vert}+\frac{y^Tx}{\Vert y \Vert \Vert x\Vert}\frac{x^Ty}{\Vert y \Vert \Vert x\Vert}+\frac{x^Ty}{\Vert x \Vert\Vert y \Vert}-1\\ \text{ } \\ =1-2\frac{y^Tx}{\Vert y \Vert \Vert x\Vert}+\frac{y^Tx}{\Vert y \Vert \Vert x\Vert}\frac{x^Ty}{\Vert y \Vert \Vert x\Vert}=\left(1-\frac{y^Tx}{\Vert y \Vert \Vert x\Vert} \right)^2
det(PTP)=(1+u1Tu2)(1+u3Tu4)−(u1Tu4)(u2Tu3) =(1−∥y∥∥x∥yTx)(1+∥y∥2yTy−∥x∥∥y∥xTy)+∥y∥yT∥y∥y∥x∥xT(∥y∥y−∥x∥x) =(1−∥y∥∥x∥yTx)(2−∥y∥∥x∥xTy)+∥x∥∥y∥xTy−1 =2−2∥y∥∥x∥yTx−∥y∥∥x∥xTy+∥y∥∥x∥yTx∥y∥∥x∥xTy+∥x∥∥y∥xTy−1 =1−2∥y∥∥x∥yTx+∥y∥∥x∥yTx∥y∥∥x∥xTy=(1−∥y∥∥x∥yTx)2
矩阵特征值的乘积等于它的行列式值,由此我们应该得到的谱范数应该是:
1
−
y
T
x
∥
y
∥
∥
x
∥
1-\frac{y^Tx}{\Vert y \Vert \Vert x\Vert}
1−∥y∥∥x∥yTx,然而为什么是(1.2.43)的
y
T
x
∥
y
∥
∥
x
∥
\frac{y^Tx}{\Vert y \Vert \Vert x\Vert}
∥y∥∥x∥yTx?
此处,我证不出来,请诸位有心人帮我看看,多谢多谢!
关于秩一校正矩阵特征值得联锁定理可以表示如下:
定理 1.2.8(联锁特征值定理):
设
A
A
A 是
n
×
n
n\times n
n×n 对称矩阵,其特征值为
λ
1
≥
λ
2
≥
⋯
≥
λ
n
\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n
λ1≥λ2≥⋯≥λn,又设
A
‾
=
A
+
σ
u
u
T
\overline A = A+\sigma uu^T
A=A+σuuT,其特征值为
λ
‾
1
≥
λ
‾
2
≥
⋯
≥
λ
‾
n
\overline \lambda_1\ge \overline\lambda_2\ge\cdots\ge\overline\lambda_n
λ1≥λ2≥⋯≥λn,那么:
(1)若
σ
>
0
\sigma\gt 0
σ>0,则
λ
‾
1
≥
λ
1
≥
λ
‾
2
≥
λ
2
≥
⋯
≥
λ
‾
n
≥
λ
n
\overline \lambda_1\ge \lambda_1\ge\overline\lambda_2\ge\lambda_2\ge\cdots\ge\overline\lambda_n\ge \lambda_n
λ1≥λ1≥λ2≥λ2≥⋯≥λn≥λn
(2)若
σ
<
0
\sigma\lt 0
σ<0,则
λ
1
≥
λ
‾
1
≥
λ
2
≥
λ
‾
2
≥
⋯
≥
λ
n
≥
λ
‾
n
\lambda_1\ge \overline \lambda_1\ge \lambda_2 \ge \overline \lambda_2\ge\cdots\ge \lambda_n\ge \overline\lambda_n
λ1≥λ1≥λ2≥λ2≥⋯≥λn≥λn