抄书——最优化的理论与方法(4)——数学基础(秩一校正)

矩阵的秩一校正在最优化中经常用到。那什么是秩一校正呢?以下主要内容部分抄自抄袁亚湘的《最优化理论与方法》。


定义:秩一校正( rank-1 update)
A ∈ R n × n A\in R^{n \times n} ARn×n 是非奇异矩阵, u , v ∈ R n × 1 u,v\in R^{n\times 1} u,vRn×1 是任意向量,则称 A + u v T A+uv^T A+uvT A A A 的秩一校正。
其实, u v T uv^T uvT 就是一个秩为1的 n × n n\times n n×n 矩阵,“秩一校正”因而得名。
定理 1.2.6(Sherman-Morrison定理)
(Computes the inverse of the sum of an invertible matrix A A A and the outer product, u v T uv^T uvT, of vectors u u u and v v v.)

1 + v T A − 1 u ≠ 0 ( 1.2.36 ) 1+v^TA^{-1}u\neq 0 \qquad(1.2.36) 1+vTA1u̸=0(1.2.36)
A A A 的秩一校正 A + u v T A+uv^T A+uvT 也是非奇异的,且其逆矩阵可以表示为
( A + u v T ) − 1 = A − 1 − A − 1 u v T A − 1 1 + v T A − 1 u ( 1.2.37 ) (A+uv^T)^{-1}=A^{-1}-\frac{A^{-1}uv^TA^{-1}}{1+v^TA^{-1}u}\qquad(1.2.37) (A+uvT)1=A11+vTA1uA1uvTA1(1.2.37)
证明:
( A + u v T ) ( A − 1 − A − 1 u v T A − 1 1 + v T A − 1 u )   = A A − 1 + u v T A − 1 − u v T A − 1 + u v T A − 1 u v T A − 1 1 + v T A − 1 u   = I + u v T A − 1 − u ( 1 + v T A − 1 u ) v T A − 1 1 + v T A − 1 u   = I + u v T A − 1 − u v T A − 1 = I (A+uv^T)\left( A^{-1}-\frac{A^{-1}uv^TA^{-1}}{1+v^TA^{-1}u}\right)\\ \text{ } \\ =AA^{-1}+uv^TA^{-1} - \frac{uv^TA^{-1}+uv^TA^{-1}uv^TA^{-1}}{1+v^TA^{-1}u}\\ \text{ } \\ =I+uv^TA^{-1}-\frac{u(1+v^TA^{-1}u)v^TA^{-1}}{1+v^TA^{-1}u}\\ \text{ } \\=I+uv^TA^{-1}-uv^TA^{-1}=I (A+uvT)(A11+vTA1uA1uvTA1) =AA1+uvTA11+vTA1uuvTA1+uvTA1uvTA1 =I+uvTA11+vTA1uu(1+vTA1u)vTA1 =I+uvTA1uvTA1=I


它有什么用呢:(摘自:https://en.wikipedia.org/wiki/Sherman–Morrison_formula
If the inverse of A A A is already known, the formula provides a numerically cheap way to compute the inverse of A A A corrected by the matrix u v T uv^{T} uvT (depending on the point of view, the correction may be seen as a perturbation or as a rank-1 update). The computation is relatively cheap because the inverse of A + u v T A+uv^{T} A+uvT does not have to be computed from scratch (which in general is expensive), but can be computed by correcting (or perturbing) A − 1 A^{-1} A1.


上述定理的推广为:
定理 1.2.7(Sherman-Morrison-Woodburg 定理)
A ∈ R n × n A\in R^{n \times n} ARn×n 是非奇异矩阵, U , V U,V U,V n × m n\times m n×m 矩阵,若 I + V ∗ A − 1 U I+V^*A^{-1}U I+VA1U,则 A + U V ∗ A+UV^* A+UV 可逆,且:
( A + U V ∗ ) − 1 = A − 1 − A − 1 U ( I + V ∗ A − 1 U ) − 1 V ∗ A − 1 (A+UV^*)^{-1}=A^{-1}-A^{-1}U(I+V^*A^{-1}U)^{-1}V^*A^{-1} (A+UV)1=A1A1U(I+VA1U)1VA1


它的应用:(摘自https://en.wikipedia.org/wiki/Woodbury_matrix_identity
This identity is useful in certain numerical computations where A − 1 A^{−1} A1 has already been computed and it is desired to compute ( A + U C V ) − 1 (A + UCV)^{−1} (A+UCV)1. With the inverse of A available, it is only necessary to find the inverse of C − 1 + V A − 1 U C^{−1} + VA^{−1}U C1+VA1U in order to obtain the result using the right-hand side of the identity. If C C C has a much smaller dimension than A, this is more efficient than inverting A + U C V A + UCV A+UCV directly.


关于秩一校正的行列式,有:
d e t ( I + u v T ) = 1 + u T v ( 1.2.39 ) det(I+uv^T)=1+u^Tv\qquad(1.2.39) det(I+uvT)=1+uTv(1.2.39)
事实上,可以假定 u ≠ 0 u\neq 0 u̸=0,注意到 I + u v T I+uv^T I+uvT 的特征向量或者直交于 v v v,或者平行于 u u u。如果直交于 v v v,则特征值为1;如果平行于 u u u,则特征值为 1 + u T v 1+u^Tv 1+uTv,从而得到(1.2.39)的结果。


证明:假设 ξ \xi ξ I + u v T I+uv^T I+uvT 的特征向量,它对应的特征值是 λ \lambda λ,则有:
( I + u v T ) ξ = λ ξ ξ + u v T ξ = λ ξ u v T ξ = ( λ − 1 ) ξ (I+uv^T)\xi = \lambda \xi \\ \xi + uv^T \xi = \lambda \xi\\ uv^T \xi = (\lambda-1) \xi (I+uvT)ξ=λξξ+uvTξ=λξuvTξ=(λ1)ξ
由于 u v T uv^T uvT 是秩为1的矩阵,因而它可经过线性变换为:
u v T → [ λ 0 0 ⋯ 0 0 λ 0 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ λ 1 ] uv^T \to \left[\begin{array}{c c c c}\lambda_0 &0 &\cdots &0 \\ 0 & \lambda_0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_1 \end{array}\right] uvTλ0000λ0000λ1
因而它只有两个特征值: λ 0 = 0 \lambda_0 = 0 λ0=0(有n-1个) 和 λ 1 \lambda_1 λ1(只有1个),
(1)若 λ = λ 0 + 1 = 1 \lambda = \lambda_0 + 1=1 λ=λ0+1=1,即 u v T ξ = 0 ⇒ v T ξ = 0 uv^T \xi =0 \Rightarrow v^T \xi =0 uvTξ=0vTξ=0,即: I + u v T I+uv^T I+uvT 的特征向量直交于 v v v,特征值为1;
(2)若 λ = λ 1 + 1 \lambda = \lambda_1 + 1 λ=λ1+1,有:
u v T ξ = λ 1 ξ . ξ T u v T ξ = λ 1 ξ T ξ = λ 1 ( ∥ ξ ∥ 2 = 1 ) . ξ T u v T ξ = λ 1 . ξ T ξ u T v = u T v = λ 1 ⇒ λ = u T v + 1 uv^T \xi = \lambda_1 \xi\\ . \\ \xi^Tuv^T \xi = \lambda_1 \xi^T\xi=\lambda_1\quad(\Vert\xi\Vert_2=1)\\ . \\ \xi^Tuv^T \xi=\lambda_1 \\ . \\ \xi^T \xi u^T v = u^Tv=\lambda_1 \Rightarrow \lambda=u^Tv+1 uvTξ=λ1ξ.ξTuvTξ=λ1ξTξ=λ1(ξ2=1).ξTuvTξ=λ1.ξTξuTv=uTv=λ1λ=uTv+1
即: I + u v T I+uv^T I+uvT 的特征向量平行于 u u u,特征值为 u T v + 1 u^Tv+1 uTv+1


进一步,对于秩二校正,有:
det ⁡ ( I + u 1 u 2 T + u 3 u 4 T ) = ( 1 + u 1 T u 2 ) ( 1 + u 3 T u 4 ) − ( u 1 T u 4 ) ( u 2 T u 3 ) ( 1.2.40 ) \det(I+u_1u_2^T+u_3u_4^T)\\ =(1+u_1^Tu_2)(1+u_3^Tu_4)-(u_1^Tu_4)(u_2^Tu_3)\qquad(1.2.40) det(I+u1u2T+u3u4T)=(1+u1Tu2)(1+u3Tu4)(u1Tu4)(u2Tu3)(1.2.40)
事实上,只要注意到:
I + u 1 u 2 T + u 3 u 4 T = ( I + u 1 u 2 T ) [ I + ( I + u 1 u 2 T ) − 1 u 3 u 4 T ]   ⇒ d e t ( I + u 1 u 2 T + u 3 u 4 T ) = ( 1 + u 1 T u 2 ) [ 1 + u 4 T ( I − u 1 u 2 T 1 + u 1 T u 2 ) u 3 ]   = ( 1 + u 1 T u 2 ) ( 1 + u 3 T u 4 ) − ( u 1 T u 4 ) ( u 2 T u 3 ) I+u_1u_2^T+u_3u_4^T=(I+u_1u_2^T)[I+(I+u_1u_2^T)^{-1}u_3u_4^T]\\ \text{ } \\ \Rightarrow det(I+u_1u_2^T+u_3u_4^T)=(1+u_1^Tu_2)\left[1+u_4^T\left(I-\frac{u_1u_2^T}{1+u_1^Tu_2}\right)u_3\right]\\ \text{ } \\ =(1+u_1^Tu_2)(1+u_3^Tu_4)-(u_1^Tu_4)(u_2^Tu_3) I+u1u2T+u3u4T=(I+u1u2T)[I+(I+u1u2T)1u3u4T] det(I+u1u2T+u3u4T)=(1+u1Tu2)[1+u4T(I1+u1Tu2u1u2T)u3] =(1+u1Tu2)(1+u3Tu4)(u1Tu4)(u2Tu3)
注意到 ∥ A ∥ F 2 = t r ( A T A ) \Vert A \Vert_F^2=tr(A^TA) AF2=tr(ATA),故秩一校正矩阵 A + x y T A+xy^T A+xyTFrobenius 范数为:
∥ A + x y T ∥ F 2 = ∥ A ∥ F 2 + 2 y T A T x + ∥ x ∥ F 2 ∥ y ∥ F 2 ( 1.2.41 ) \Vert A+xy^T\Vert_F^2=\Vert A \Vert_F^2+2y^TA^Tx+\Vert x \Vert_F^2\Vert y \Vert_F^2 \qquad(1.2.41) A+xyTF2=AF2+2yTATx+xF2yF2(1.2.41)
又设 P ∈ R n × n P\in R^{n\times n} PRn×n
P = I − x y T ∥ x ∥ ∥ y ∥ ( 1.2.42 ) P=I-\frac{xy^T}{\Vert x \Vert \Vert y \Vert}\qquad(1.2.42) P=IxyxyT(1.2.42)
显然, P P P n − 1 n-1 n1 个特征值为1,利用(1.2.40),考虑 P T P P^TP PTP的最大特征值,可知:
∥ P ∥ 2 = y T x ∥ x ∥ ∥ y ∥ ( 1.2.43 ) \Vert P \Vert_2=\frac{y^Tx}{\Vert x \Vert \Vert y \Vert}\qquad(1.2.43) P2=xyyTx(1.2.43)


证明:
考虑 ∥ P ∥ 2 = ( λ P T P ) 1 / 2 \Vert P\Vert_2=(\lambda_{P^TP})^{1/2} P2=(λPTP)1/2 P P P 矩阵的谱范数, λ P T P \lambda_{P^TP} λPTP 表示 P T P P^TP PTP 的最大特征值。
P T P = ( I − x y T ∥ x ∥ ∥ y ∥ ) T ( I − x y T ∥ x ∥ ∥ y ∥ )   = ( I − y x T ∥ x ∥ ∥ y ∥ ) ( I − x y T ∥ x ∥ ∥ y ∥ )   = I − y x T ∥ x ∥ ∥ y ∥ − x y T ∥ x ∥ ∥ y ∥ + y x T x y T ∥ x ∥ 2 ∥ y ∥ 2   = I − y x T ∥ x ∥ ∥ y ∥ − x y T ∥ x ∥ ∥ y ∥ + y y T ∥ y ∥ 2   = I − y x T ∥ x ∥ ∥ y ∥ + ( y ∥ y ∥ − x ∥ x ∥ ) y T ∥ y ∥ P^TP=\left(I-\frac{xy^T}{\Vert x \Vert \Vert y \Vert}\right)^T\left(I-\frac{xy^T}{\Vert x \Vert \Vert y \Vert}\right)\\ \text{ } \\ = \left(I-\frac{yx^T}{\Vert x \Vert \Vert y \Vert}\right)\left(I-\frac{xy^T}{\Vert x \Vert \Vert y \Vert}\right)\\ \text{ } \\ =I-\frac{yx^T}{\Vert x \Vert \Vert y \Vert} -\frac{xy^T}{\Vert x \Vert \Vert y \Vert}+\frac{yx^Txy^T}{\Vert x \Vert^2 \Vert y \Vert^2}\\ \text{ } \\ = I-\frac{yx^T}{\Vert x \Vert \Vert y \Vert}-\frac{xy^T}{\Vert x \Vert \Vert y \Vert}+\frac{yy^T}{\Vert y \Vert^2}\\ \\ \text{ } \\ = I-\frac{yx^T}{\Vert x \Vert \Vert y \Vert}+\left(\frac{y}{\Vert y \Vert} -\frac{x}{\Vert x \Vert}\right)\frac{y^T}{\Vert y \Vert} PTP=(IxyxyT)T(IxyxyT) =(IxyyxT)(IxyxyT) =IxyyxTxyxyT+x2y2yxTxyT =IxyyxTxyxyT+y2yyT =IxyyxT+(yyxx)yyT
这是个秩二校正矩阵,令
u 1 = y ∥ y ∥ , u 2 = − x ∥ x ∥ , u 3 = y ∥ y ∥ − x ∥ x ∥ , u 4 = y ∥ y ∥ u_1=\frac{y}{\Vert y \Vert}\quad,u_2=-\frac{x}{\Vert x \Vert}\quad,u_3=\frac{y}{\Vert y \Vert}-\frac{x}{\Vert x \Vert}\quad,u_4=\frac{y}{\Vert y \Vert} u1=yy,u2=xx,u3=yyxx,u4=yy
因为 y T y = ∥ y ∥ 2 , x T x = ∥ x ∥ 2 y^Ty=\Vert y \Vert^2,\quad x^Tx=\Vert x \Vert^2 yTy=y2,xTx=x2,代入(1.2.40),有:
det ⁡ ( P T P ) = ( 1 + u 1 T u 2 ) ( 1 + u 3 T u 4 ) − ( u 1 T u 4 ) ( u 2 T u 3 )   = ( 1 − y T x ∥ y ∥ ∥ x ∥ ) ( 1 + y T y ∥ y ∥ 2 − x T y ∥ x ∥ ∥ y ∥ ) + y T ∥ y ∥ y ∥ y ∥ x T ∥ x ∥ ( y ∥ y ∥ − x ∥ x ∥ )   = ( 1 − y T x ∥ y ∥ ∥ x ∥ ) ( 2 − x T y ∥ y ∥ ∥ x ∥ ) + x T y ∥ x ∥ ∥ y ∥ − 1   = 2 − 2 y T x ∥ y ∥ ∥ x ∥ − x T y ∥ y ∥ ∥ x ∥ + y T x ∥ y ∥ ∥ x ∥ x T y ∥ y ∥ ∥ x ∥ + x T y ∥ x ∥ ∥ y ∥ − 1   = 1 − 2 y T x ∥ y ∥ ∥ x ∥ + y T x ∥ y ∥ ∥ x ∥ x T y ∥ y ∥ ∥ x ∥ = ( 1 − y T x ∥ y ∥ ∥ x ∥ ) 2 \det(P^TP)=(1+u_1^Tu_2)(1+u_3^Tu_4)-(u_1^Tu_4)(u_2^Tu_3)\\ \text{ } \\ =\left(1-\frac{y^Tx}{\Vert y \Vert \Vert x \Vert}\right)\left(1+\frac{y^Ty}{\Vert y\Vert^2}-\frac{x^Ty}{\Vert x\Vert \Vert y\Vert} \right)+\frac{y^T}{\Vert y \Vert} \frac{y}{\Vert y \Vert}\frac{x^T}{\Vert x \Vert}\left(\frac{y}{\Vert y \Vert}-\frac{x}{\Vert x \Vert}\right) \\ \text{ } \\ = \left(1-\frac{y^Tx}{\Vert y \Vert \Vert x\Vert} \right)\left(2-\frac{x^Ty}{\Vert y \Vert \Vert x\Vert} \right)+\frac{x^Ty}{\Vert x \Vert\Vert y \Vert}-1\\ \text{ } \\ = 2- 2\frac{y^Tx}{\Vert y \Vert \Vert x\Vert}-\frac{x^Ty}{\Vert y \Vert \Vert x\Vert}+\frac{y^Tx}{\Vert y \Vert \Vert x\Vert}\frac{x^Ty}{\Vert y \Vert \Vert x\Vert}+\frac{x^Ty}{\Vert x \Vert\Vert y \Vert}-1\\ \text{ } \\ =1-2\frac{y^Tx}{\Vert y \Vert \Vert x\Vert}+\frac{y^Tx}{\Vert y \Vert \Vert x\Vert}\frac{x^Ty}{\Vert y \Vert \Vert x\Vert}=\left(1-\frac{y^Tx}{\Vert y \Vert \Vert x\Vert} \right)^2 det(PTP)=(1+u1Tu2)(1+u3Tu4)(u1Tu4)(u2Tu3) =(1yxyTx)(1+y2yTyxyxTy)+yyTyyxxT(yyxx) =(1yxyTx)(2yxxTy)+xyxTy1 =22yxyTxyxxTy+yxyTxyxxTy+xyxTy1 =12yxyTx+yxyTxyxxTy=(1yxyTx)2
矩阵特征值的乘积等于它的行列式值,由此我们应该得到的谱范数应该是: 1 − y T x ∥ y ∥ ∥ x ∥ 1-\frac{y^Tx}{\Vert y \Vert \Vert x\Vert} 1yxyTx,然而为什么是(1.2.43)的 y T x ∥ y ∥ ∥ x ∥ \frac{y^Tx}{\Vert y \Vert \Vert x\Vert} yxyTx?
此处,我证不出来,请诸位有心人帮我看看,多谢多谢!


关于秩一校正矩阵特征值得联锁定理可以表示如下:
定理 1.2.8(联锁特征值定理):
A A A n × n n\times n n×n 对称矩阵,其特征值为 λ 1 ≥ λ 2 ≥ ⋯ ≥ λ n \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n λ1λ2λn,又设 A ‾ = A + σ u u T \overline A = A+\sigma uu^T A=A+σuuT,其特征值为 λ ‾ 1 ≥ λ ‾ 2 ≥ ⋯ ≥ λ ‾ n \overline \lambda_1\ge \overline\lambda_2\ge\cdots\ge\overline\lambda_n λ1λ2λn,那么:
(1)若 σ > 0 \sigma\gt 0 σ>0,则
λ ‾ 1 ≥ λ 1 ≥ λ ‾ 2 ≥ λ 2 ≥ ⋯ ≥ λ ‾ n ≥ λ n \overline \lambda_1\ge \lambda_1\ge\overline\lambda_2\ge\lambda_2\ge\cdots\ge\overline\lambda_n\ge \lambda_n λ1λ1λ2λ2λnλn
(2)若 σ &lt; 0 \sigma\lt 0 σ<0,则
λ 1 ≥ λ ‾ 1 ≥ λ 2 ≥ λ ‾ 2 ≥ ⋯ ≥ λ n ≥ λ ‾ n \lambda_1\ge \overline \lambda_1\ge \lambda_2 \ge \overline \lambda_2\ge\cdots\ge \lambda_n\ge \overline\lambda_n λ1λ1λ2λ2λnλn

评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值