Matrix Analysis And Application

Preface

写在前面的话

此笔记是基于某个笔记增删该而来,如有侵权,请联系我哦! 当然有错误也可以联系我哦!

联系方式:发送邮件

最新下载地址:下载地址

Background

Vector

  • 线性相关
    存在一组 a 1 , a 2 , . . . a k a_1,a_2,...a_k a1,a2,...ak 不全为零的数,使得 a 1 x ⃗ 1 + a 2 x ⃗ 2 + . . . + a k x ⃗ k = 0 ⃗ a_1 \vec x_1 + a_2 \vec x_2 + ... + a_k \vec x_k = \vec 0 a1x 1+a2x 2+...+akx k=0 ,那么可以称这组向量 x ⃗ 1 , x ⃗ 2 , . . . , x ⃗ k \vec x_1,\vec x_2,...,\vec x_k x 1,x 2,...,x k 是线性相关的。

  • 线性无关
    当且仅当 a 1 , a 2 , . . . a k a_1,a_2,...a_k a1,a2,...ak 全都为0时, a 1 x ⃗ 1 + a 2 x ⃗ 2 + . . . + a k x ⃗ k = 0 ⃗ a_1 \vec x_1 + a_2 \vec x_2 + ... + a_k \vec x_k = \vec 0 a1x 1+a2x 2+...+akx k=0 才成立,,那么可以称这组向量 x ⃗ 1 , x ⃗ 2 , . . . , x ⃗ k \vec x_1,\vec x_2,...,\vec x_k x 1,x 2,...,x k 是线性无关的。

  • 极大线性无关组
    如果线性无关的 x ⃗ 1 , x ⃗ 2 , . . . , x ⃗ k \vec x_1,\vec x_2,...,\vec x_k x 1,x 2,...,x k 是向量组 x ⃗ \vec x x 部分组,且 x ⃗ \vec x x 中任一向量都可以用 x ⃗ 1 , x ⃗ 2 , . . . , x ⃗ k \vec x_1,\vec x_2,...,\vec x_k x 1,x 2,...,x k 表示,那么 x ⃗ 1 , x ⃗ 2 , . . . , x ⃗ k \vec x_1,\vec x_2,...,\vec x_k x 1,x 2,...,x k 就是一个极大线性无关组或最大线性无关组。

  • 向量运算
    内积 a ⃗ ⋅ b ⃗ = ∑ a i b i \vec a \cdot \vec b = \sum a_i b_i a b =aibi
    叉积 a ⃗ × b ⃗ = [ i ⃗ j ⃗ k ⃗ a x a y a z b x b y b z ] \vec a \times \vec b=\left[\begin{matrix}\vec i & \vec j & \vec k\\ a_x & a_y & a_z\\ b_x & b_y & b_z\end{matrix}\right] a ×b =i axbxj aybyk azbz

  • 范数

    • ∥ x ∥ 1 = ∑ ∣ x i ∣ \left\| {x} \right\| _1 = \sum|x_i| x1=xi
    • ∥ x ∥ 2 = ∑ i x i 2 \left\| {x} \right\| _2 = \sqrt{\sum_{i} x_{i}^2} x2=ixi2
    • ∥ x ∥ ∞ = max ⁡ { ∣ x i ∣ } \left\| {x} \right\| _{\infty} = \max\{|x_i|\} x=max{xi}
    • ∥ x ∥ p = ( ∑ ∣ x i ∣ p ) 1 p \left\| {x} \right\| _p = (\sum|x_i|^p)^{1 \over p} xp=(xip)p1
    • 向量范数的性质:
      1. ∥ x ∥ ≥ 0 \left\| x \right\| \ge 0 x0
      2. ∥ k x ∥ = k ∥ x ∥ \left\| kx \right\|=k\left\| x \right\| kx=kx
      3. ∥ x + y ∥ ≤ ∥ x ∥ + ∥ y ∥ \left\| x + y \right\| \le \left\| x \right\| + \left\| y \right\| x+yx+y

Matrix

  • 矩阵转置

A T = [ a 11 a 21 ⋯ a n 1 a 12 a 22 ⋯ a n 2 ⋮ ⋮ ⋱ ⋮ a 1 n a 2 n ⋯ a n n ] A^T =\left[\begin{matrix} a_{11} & a_{21} & \cdots & a_{n1}\\ a_{12} & a_{22} & \cdots & a_{n2}\\ \vdots & \vdots & \ddots & \vdots\\ a_{1n} & a_{2n} & \cdots & a_{nn}\end{matrix}\right] AT=a11a12a1na21a22a2nan1an2ann

( A B ) T = B T A T (AB)^T =B^TA^T (AB)T=BTAT

( A + B ) T = A T + B T (A+B)^T=A^T+B^T (A+B)T=AT+BT

( k A ) T = k A T (kA)^T=kA^T (kA)T=kAT

  • 共轭转置
    A H = ( A ˉ ) T A^H = (\bar A)^T AH=(Aˉ)T
    eg. ( 1 2 + i 1 − i 2 ) H = ( 1 1 + i 2 − i 2 ) \left(\begin{matrix} 1 & 2+i \\ 1-i & 2 \end{matrix}\right)^H = \left(\begin{matrix} 1 & 1+i \\ 2-i & 2 \end{matrix}\right) (11i2+i2)H=(12i1+i2)
    酉矩阵:复数域上的正交矩阵 u i H u j = { 0 , i ≠ j 1 , i = j u_i^H u_j=\left\{\begin{matrix} 0, & i\neq j\\ 1, & i=j \end{matrix}\right. uiHuj={0,1,i=ji=j
    Hermitian矩阵: A H = A A^H=A AH=A,例如 U 1 U 1 H = ( U 1 U 1 H ) H U_1U_1^H=(U_1U_1^H)^H U1U1H=(U1U1H)H

  • 伴随矩阵
    A ∗ = [ A 11 A 21 ⋯ A n 1 A 12 A 22 ⋯ A n 2 ⋮ ⋮ ⋱ ⋮ A 1 n A 2 n ⋯ A n n ] A^* = \left[\begin{matrix}A_{11} & A_{21} & \cdots & A_{n1}\\ A_{12} & A_{22} & \cdots & A_{n2}\\ \vdots & \vdots & \ddots & \vdots\\ A_{1n} & A_{2n} & \cdots & A_{nn}\end{matrix}\right] A=A11A12A1nA21A22A2nAn1An2Ann
    A i j = ( − 1 ) i + j M i j A_{ij}=(-1)^{i+j}M_{ij} Aij=(1)i+jMij

  • 矩阵的迹

t r ( A ) = ∑ i n a i i = ∑ i n λ i tr(A)=\sum_i^na_{ii}=\sum_i^n\lambda_i tr(A)=inaii=inλi

a = t r ( a ) a=tr(a) a=tr(a)

t r ( A B ) = t r ( B A ) tr(AB)=tr(BA) tr(AB)=tr(BA)

t r ( A + B ) = t r ( A ) + t r ( B ) tr(A+B)=tr(A)+tr(B) tr(A+B)=tr(A)+tr(B)

t r ( A ) = t r ( A T ) tr(A)=tr(A^T) tr(A)=tr(AT)

t r ( A T B ) = ∑ i , j A i j B i j tr(A^TB)=\sum_{i,j}A_{ij}B_{ij} tr(ATB)=i,jAijBij

t r ( A T ( B ⊙ C ) ) = t r ( ( A ⊙ B ) T C ) tr(A^T(B \odot C))=tr((A \odot B)^TC) tr(AT(BC))=tr((AB)TC)

  • 范数
    ∥ A ∥ F = ∑ i , j a i j 2 = t r ( A A T ) \left\| A \right\|_F = \sqrt{\sum_{i,j} a_{ij}^2}=\sqrt{tr(AA^T)} AF=i,jaij2 =tr(AAT)
    ∥ A ∥ 2 = λ max ⁡ ( A T A ) = δ max ⁡ ( A ) \left\| A \right\|_2 = \sqrt{\lambda_{\max}(A^TA)}=\delta_{\max}(A) A2=λmax(ATA) =δmax(A)
    ∥ A ∥ 1 = max ⁡ j ∑ i = 1 n ∣ a i j ∣ \left\| A \right\|_1 = \max_{j} \sum _{i=1}^n|a_{ij}| A1=jmaxi=1naij
    ∥ A ∥ ∞ = max ⁡ i ∑ j = 1 n ∣ a i j ∣ \left\| A \right\|_\infty = \max_{i} \sum _{j=1}^n|a_{ij}| A=imaxj=1naij
    ∥ A ∥ ∗ = ∑ δ ( A ) \left\| A \right\|_* = \sum \delta(A) A=δ(A)
    ∥ A ∥ p = max ⁡ ∥ x ∥ p = 1 ∥ A x ∥ p \left\| A \right\|_p = \max_{\left\| x \right\|_p = 1} \left\| Ax \right\|_p Ap=xp=1maxAxp
    矩阵范数满足:

    1. ∥ A ∥ ≥ 0 \left\| A \right\| \ge 0 A0
    2. ∥ k A ∥ = k ∥ A ∥ \left\| kA \right\|=k\left\| A \right\| kA=kA
    3. ∥ A + B ∥ ≤ ∥ A ∥ + ∥ B ∥ \left\| A + B \right\| \le \left\| A \right\| + \left\| B \right\| A+BA+B
    4. ∥ A B ∥ ≤ ∥ A ∥ ⋅ ∥ B ∥ \left\| AB \right\| \le \left\| A \right\| \cdot \left\| B \right\| ABAB
  • 谱半径
    ρ ( A ) = max ⁡ ∣ λ ( A ) ∣ \rho (A)=\max|\lambda(A)| ρ(A)=maxλ(A)

    1. 谱半径不是范数
    2. 若A是Hermitian矩阵,则 ρ ( A ) = ∥ A ∥ 2 \rho (A)=\left\| A \right\|_2 ρ(A)=A2
    3. ρ ( A ) = inf ⁡ ∥ ⋅ ∥ ∥ A ∥ \rho (A)=\inf_{\left\| \cdot \right\|}\left\| A \right\| ρ(A)=infA
    4. ∑ k A k \sum_kA^k kAk收敛 ⇒ A k → 0 , ρ ( A ) < 1 \Rightarrow A^k \to 0,\rho (A) \lt 1 Ak0,ρ(A)<1
    5. ∥ A k ∥ 1 k → ρ ( A ) \left\| A^k \right\|^{1 \over k} \to \rho (A) Akk1ρ(A)
  • 标准正交矩阵 U = [ α 1 , α 2 , ⋯   , α n ] U = [\alpha_1, \alpha_2, \cdots, \alpha_n ] U=[α1,α2,,αn]
    α i T α j = { 1 , i ≠ j 0 , i = j \alpha_i^T\alpha_j=\left\{\begin{matrix} 1, & i\neq j\\ 0, & i=j \end{matrix}\right. αiTαj={1,0,i=ji=j
    满足如下性质:

    1. U − 1 = U T U^{-1}=U^T U1=UT
    2. r a n k ( U ) = n rank(U)=n rank(U)=n
    3. U T U = U U T = E U^TU=UU^T=E UTU=UUT=E
    4. ∥ U ⋅ A ∥ = ∥ A ∥ \left\| U \cdot A \right\| = \left\| A \right\| UA=A
  • 部分列正交矩阵 U = [ U 1 , U 2 ] U=[U_1, U_2] U=[U1,U2]
    U 1 = [ u 1 , u 2 , ⋯   , u r ] ∈ R n × r U_1 = [u_1, u_2, \cdots, u_r] \in R^{n \times r} U1=[u1,u2,,ur]Rn×r
    u i T u j = { 0 , i ≠ j 1 , i = j u_i^T u_j=\left\{\begin{matrix} 0, & i\neq j\\ 1, & i=j \end{matrix}\right. uiTuj={0,1,i=ji=j
    U 1 T U 1 = E r × r U_1^T U_1 = E_{r \times r} U1TU1=Er×r
    U U T = [ U 1 , U 2 ] ( U 1 U 2 ) = U 1 U 1 T + U 2 U 2 T = E UU^T = [U_1, U_2]\left ( \begin{matrix} U_1\\ U_2 \end{matrix}\right )= U_1 U_1^T+ U_2 U_2^T = E UUT=[U1,U2](U1U2)=U1U1T+U2U2T=E
    U 2 U_2 U2是正交补

  • 正交化
    有一组向量 a 1 , a 2 , ⋯   , a n a_1, a_2, \cdots, a_n a1,a2,,an 寻找 q 1 , q 2 , ⋯   , q n q_1, q_2, \cdots, q_n q1,q2,,qn使得
    s p a n { a 1 , a 2 , ⋯   , a n } = s p a n { q 1 , q 2 , ⋯   , q n } span\{a_1, a_2, \cdots, a_n\}=span\{q_1, q_2, \cdots, q_n\} span{a1,a2,,an}=span{q1,q2,,qn}
    Q = [ q 1 , q 2 , ⋯   , q n ] Q=[q_1, q_2, \cdots, q_n] Q=[q1,q2,,qn]是标准正交矩阵
    Gram-Schmidt

    1. s p a n { q 1 } = s p a n { a 1 } span\{q_1\}=span\{a_1\} span{q1}=span{a1}, 则 q 1 = a 1 ∥ a 1 ∥ q_1={a_1 \over \left\| a_1 \right \|} q1=a1a1
    2. 假设 s p a n { a 1 , a 2 , ⋯   , a k } = s p a n { q 1 , q 2 , ⋯   , q k } span\{a_1, a_2, \cdots, a_k\}=span\{q_1, q_2, \cdots, q_k\} span{a1,a2,,ak}=span{q1,q2,,qk}且满足 q i ⊥ q j , i ≠ j , ∥ q i ∥ = 1 q_i \perp q_j, i \neq j, \left\|q_i\right\|=1 qiqj,i=j,qi=1
    3. 如何构造 q k + 1 q_{k+1} qk+1使其满足: { s p a n { q 1 , q 2 , ⋯   , q k } ⊕ s p a n { q k + 1 } = s p a n { a 1 , a 2 , ⋯   , a k + 1 } q k + 1 ⊥ q i , i = 1 , 2 , ⋯   , k ∥ q k + 1 ∥ = 1 \left\{ \begin{aligned} & span\{q_1, q_2, \cdots, q_k\} \oplus span\{q_{k+1}\} = span\{a_1, a_2, \cdots, a_{k+1}\}\\ & q_{k+1} \perp q_i, i = 1,2, \cdots, k\\ & \left\| q_{k+1} \right\| =1 \end{aligned} \right. span{q1,q2,,qk}span{qk+1}=span{a1,a2,,ak+1}qk+1qi,i=1,2,,kqk+1=1
      $a_{k+1}=\sum_{i=1}^{k+1}r_{i,k+1} q_i
      \Rightarrow q_iTa_{k+1}=r_{i,k+1}q_iTq_i
      $

      q k + 1 = a k + 1 − ∑ i = 1 k r i , k + 1 q i ∥ a k + 1 − ∑ i = 1 k r i , k + 1 q i ∥ q_{k+1} = {{a_{k+1} - \sum_{i=1}^{k}r_{i,k+1}q_i} \over {\left\| a_{k+1} - \sum_{i=1}^{k}r_{i,k+1}q_i \right \|}} qk+1=ak+1i=1kri,k+1qiak+1i=1kri,k+1qi, 其中 r i , k + 1 = q i T a k + 1 , i = 1 , 2 , 3 , ⋯   , k r_{i,k+1}=q_i^Ta_{k+1}, i=1,2,3,\cdots,k ri,k+1=qiTak+1,i=1,2,3,,k
      QR DecompositionGram-Schmidt
      [ a 1 , a 2 , ⋯   , a n ] = [ q 1 , q 2 , ⋯   , q n ] [ r 11 r 12 ⋯ r 1 n 0 r 22 ⋯ r 2 n ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ r n n ] [a_1, a_2, \cdots, a_n]=[q_1, q_2, \cdots, q_n]\left[\begin{matrix} r_{11} & r_{12} & \cdots & r_{1n}\\ 0 & r_{22} & \cdots & r_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & r_{nn}\end{matrix}\right] [a1,a2,,an]=[q1,q2,,qn]r1100r12r220r1nr2nrnn
      ⇒ { a 1 = r 11 q 1 a 2 = r 12 q 1 + r 22 q 2 ⋮ ⇒ { q 1 = a 1 ∥ a 1 ∥ q 2 = a 2 − r 12 q 1 ∥ a 2 − r 12 q 1 ∥ ⋮ \Rightarrow \left\{ \begin{aligned} & a_1 = r_{11}q_1\\ & a_2 = r_{12}q_1 + r_{22}q_2 \\ & \vdots \end{aligned} \right. \Rightarrow \left\{ \begin{aligned} & q_1 = {a_1 \over \left\| a_1 \right \|}\\ & q_2 = {a_2 - r_{12}q_1 \over \left\| a_2 - r_{12}q_1 \right \|} \\ & \vdots \end{aligned} \right. a1=r11q1a2=r12q1+r22q2q1=a1a1q2=a2r12q1a2r12q1
      Arnoldi分解:在 K k ( A , r 0 ) = s p a n { r 0 , A r 0 , ⋯   , A k − 1 r 0 } \Kappa_k(A,r_0)=span\{r_0,Ar_0,\cdots,A^{k-1}r_0\} Kk(A,r0)=span{r0,Ar0,,Ak1r0}上运用Gram-Schmidt
      1. v 1 = r 0 ∥ r 0 ∥ v_1={r_0 \over \left\| r_0 \right \|} v1=r0r0

      2. 假设已构造 { v 1 , v 2 , ⋯   , v k } , v k ∈ K k ( A , r 0 ) , q i ⊥ q j , i ≠ j , ∥ v i ∥ = 1 \{v_1, v_2, \cdots, v_k\}, v_k \in \Kappa_k(A,r_0), q_i \perp q_j, i \neq j, \left\|v_i\right\|=1 {v1,v2,,vk},vkKk(A,r0),qiqj,i=j,vi=1

      3. 如何构造 v k + 1 v_{k+1} vk+1使其满足: { v k + 1 ∈ K k + 1 ( A , r 0 ) v k + 1 ⊥ v i , i = 1 , 2 , ⋯   , k ∥ v k + 1 ∥ = 1 \left\{ \begin{aligned} & v_{k+1} \in \Kappa_{k+1}(A,r_0) \\ & v_{k+1} \perp v_i, i = 1,2, \cdots, k\\ & \left\| v_{k+1} \right\| =1 \end{aligned} \right. vk+1Kk+1(A,r0)vk+1vi,i=1,2,,kvk+1=1
        v k + 1 ∈ K k + 1 ( A , r 0 ) = K k ( A , r 0 ) ⊕ s p a n { A v k } = s p a n { v 1 , v 2 , ⋯   , v k + 1 } v_{k+1} \in \Kappa_{k+1}(A,r_0) = \Kappa_{k}(A,r_0) \oplus span\{Av_k\} = span\{v_1, v_2, \cdots, v_{k+1}\} vk+1Kk+1(A,r0)=Kk(A,r0)span{Avk}=span{v1,v2,,vk+1}
        A v k = ∑ i = 1 k + 1 h i , k v i ⇒ v i T A v k = h i , k v i T v i Av_k = \sum_{i=1}^{k+1}h_{i,k} v_i \Rightarrow v_i^TAv_k=h_{i,k}v_i^Tv_i Avk=i=1k+1hi,kviviTAvk=hi,kviTvi
        v k + 1 = A v k − ∑ i = 1 k h i , k v i ∥ A v k − ∑ i = 1 k h i , k v i ∥ v_{k+1} = {{Av_k - \sum_{i=1}^{k}h_{i,k}v_i} \over {\left\|Av_k - \sum_{i=1}^{k}h_{i,k}v_i \right \|}} vk+1=Avki=1khi,kviAvki=1khi,kvi, 其中 h i , k = v i T A v k , i = 1 , 2 , 3 , ⋯   , k h_{i,k}=v_i^TAv_k, i=1,2,3,\cdots,k hi,k=viTAvk,i=1,2,3,,k

  • A-正交
    ∥ X ∥ A = X T A X \left\| X \right\|_A= \sqrt{X^TAX} XA=XTAX
    有一组向量 a 1 , a 2 , ⋯   , a n a_1, a_2, \cdots, a_n a1,a2,,an 寻找 q 1 , q 2 , ⋯   , q n q_1, q_2, \cdots, q_n q1,q2,,qn使得
    s p a n { a 1 , a 2 , ⋯   , a n } = s p a n { q 1 , q 2 , ⋯   , q n } span\{a_1, a_2, \cdots, a_n\}=span\{q_1, q_2, \cdots, q_n\} span{a1,a2,,an}=span{q1,q2,,qn}
    q 1 , q 2 , ⋯   , q n q_1, q_2, \cdots, q_n q1,q2,,qn与A正交,即 q i T A q j = { ≠ 0 , i = j 0 , i ≠ j q_i^TAq_j=\left\{\begin{matrix} \neq 0, & i= j\\ 0, & i \neq j \end{matrix}\right. qiTAqj={=0,0,i=ji=j
    做法同上,这里就不赘述。

  • 广义逆
    A = U 1 Σ 1 V 1 H ∈ C m × n A=U_1\Sigma_1V_1^H \in C^{m \times n} A=U1Σ1V1HCm×n, 记 A + = V 1 Σ 1 − 1 U 1 H A^+=V_1\Sigma_1^{-1}U_1^H A+=V1Σ11U1H,满足:

    1. A + A A + = A + A^+AA^+=A^+ A+AA+=A+
    2. A A + A = A AA^+A=A AA+A=A
    3. A A + , A + A AA^+,A^+A AA+,A+A都是Hermitian
  • 正交投影
    假设 p p p是子空间 W W W s p a n { U 1 } span\{U_1\} span{U1}的投影, p = U 1 U 1 H p=U_1U_1^H p=U1U1H U 1 U_1 U1是酉阵,则有如下性质:

    1. p H = p p^H=p pH=p
    2. p 2 = p p^2=p p2=p
    3. ∀ x , p x ∈ W \forall x , px \in W x,pxW ( p x ) H ( x − p x ) = 0 (px)^H(x-px)=0 (px)H(xpx)=0
  • 子空间

    • 子空间距离

      • 点到子空间距离和span{x}到子空间的距离
        具体参考正交投影
      • 两个平面以及同维子空间距离
        Z , Y Z,Y Z,Y分别是其标准正交基
        d i s t ( X , Y ) = 1 − σ min ⁡ 2 ( Z H Y ) dist(\mathfrak{X},\mathfrak{Y})=\sqrt{1-\sigma_{\min}^2(Z^HY)} dist(X,Y)=1σmin2(ZHY)
    • Krylov子空间
      K k ( A , r 0 ) = s p a n { r 0 , A r 0 , ⋯   , A k − 1 r 0 } \Kappa_k(A,r_0)=span\{r_0,Ar_0,\cdots,A^{k-1}r_0\} Kk(A,r0)=span{r0,Ar0,,Ak1r0}

Matrix Decomposition

A ∈ C n × n , B ∈ C m × n A \in C^{n \times n} , B \in C^{m \times n} ACn×n,BCm×n ,U,V为酉阵

QR Decomposition

A = [ a 1 , a 2 , ⋯   , a n ] = Q ⋅ R = [ q 1 , q 2 , ⋯   , q n ] [ r 11 r 12 ⋯ r 1 n 0 r 22 ⋯ r 2 n ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ r n n ] A = [a_1, a_2, \cdots, a_n] = Q \cdot R =[q_1, q_2, \cdots, q_n]\left[\begin{matrix} r_{11} & r_{12} & \cdots & r_{1n}\\ 0 & r_{22} & \cdots & r_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & r_{nn}\end{matrix}\right] A=[a1,a2,,an]=QR=[q1,q2,,qn]r1100r12r220r1nr2nrnn

LU Decomposition

A = L ⋅ U A = L \cdot U A=LU
L为上三角矩阵,U为下三角矩阵
A x = b ⇒ ( L U ) x = b ⇒ { L y = b U x = y Ax=b \Rightarrow (LU)x = b \Rightarrow \left\{\begin{matrix}& Ly=b\\& Ux=y \end{matrix}\right. Ax=b(LU)x=b{Ly=bUx=y

Shur Decomposition

A = U R U H A = U R U^H A=URUH
其中U是正交阵,R是上三角阵。
若A是Hermitian矩阵,则R是对角阵,即 A = U Λ U T A=U \Lambda U^T A=UΛUT

Singular Value Decompostion

B = U Σ V H B = U \Sigma V^H B=UΣVH
r a n k ( B ) = r rank(B)=r rank(B)=r, 则有
A = ( u 1 , u 2 , ⋯   , u m ) ( σ 1 ⋱ σ r 0 ⋱ 0 ) m × n ( v 1 , v 2 , ⋯   , v n ) H = U 1 Σ 1 V 1 H A=\left(u_1, u_2,\cdots, u_m \right) \left( \begin{matrix} \sigma_1 & & & & & \\ & \ddots & & & &\\ & & \sigma_r & & &\\ & & & 0 & &\\ & & & &\ddots &\\ & & & & & 0 \end{matrix} \right)_{m \times n}\left(v_1, v_2,\cdots, v_n \right)^H=U_1\Sigma_1V_1^H A=(u1,u2,,um)σ1σr00m×n(v1,v2,,vn)H=U1Σ1V1H
其中, U 1 = ( u 1 , u 2 , ⋯   , u r ) , V 1 = ( v 1 , v 2 , ⋯   , v r ) U_1 = \left(u_1, u_2,\cdots, u_r \right), V_1 = \left(v_1, v_2,\cdots, v_r \right) U1=(u1,u2,,ur),V1=(v1,v2,,vr)

因此根据 B = U Σ V H B = U \Sigma V^H B=UΣVH, 可以得到
A v i = U Σ ( v 1 H ⋮ v n H ) v i = σ i u i Av_i=U\Sigma\left( \begin{matrix} v_1^H \\ \vdots \\ v_n^H\end{matrix} \right)v_i=\sigma_i u_i Avi=UΣv1HvnHvi=σiui
u i H A = u i H ( u 1 , u 2 , ⋯   , u m ) Σ V H = σ i v i H u_i^HA=u_i^H\left(u_1, u_2,\cdots, u_m \right) \Sigma V^H=\sigma_i v_i^H uiHA=uiH(u1,u2,,um)ΣVH=σiviH

思考:什么情况下奇异值与特征值相同?

A H A = V Σ H U H U Σ V H = V Σ H Σ V H A^HA=V \Sigma^H U^HU\Sigma V^H = V\Sigma^H\Sigma V^H AHA=VΣHUHUΣVH=VΣHΣVH
A A H = U Σ V H V Σ H U H = U Σ H Σ U H AA^H=U\Sigma V^HV \Sigma^H U^H = U\Sigma^H\Sigma U^H AAH=UΣVHVΣHUH=UΣHΣUH
因此我们可以根据以上两个矩阵的特征值已经特征向量得到矩阵A的奇异值和奇异向量。
例题: W = [ 1 1 0 1 1 0 ] W = \left[ \begin{matrix} 1 & 1 \\ 0 & 1\\ 1 & 0\end{matrix} \right] W=101110
W H W = [ 2 1 1 2 ] W^HW=\left[ \begin{matrix} 2 & 1 \\ 1 & 2\end{matrix} \right] WHW=[2112]
特征值:3,1
特征向量 v 1 = [ 1 2 1 2 ] , v 2 = [ 1 2 − 1 2 ] v_1=\left[ \begin{matrix} 1\over \sqrt{2} \\ 1\over \sqrt{2} \end{matrix} \right], v_2=\left[ \begin{matrix} 1\over \sqrt{2} \\ -{1\over \sqrt{2}} \end{matrix} \right] v1=[2 12 1],v2=[2 12 1]
W W H = [ 2 1 1 1 1 0 1 0 1 ] WW^H=\left[ \begin{matrix} 2 & 1 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 1\end{matrix} \right] WWH=211110101
特征值:3,1,0
特征向量 u 1 = [ 2 6 1 6 1 6 ] , u 2 = [ 0 − 1 2 1 2 ] , u 3 = [ − 1 3 1 3 1 3 ] u_1=\left[ \begin{matrix} 2\over \sqrt{6} \\ 1\over \sqrt{6} \\ 1\over \sqrt{6} \end{matrix} \right], u_2=\left[ \begin{matrix} 0 \\ -{1\over \sqrt{2}} \\ 1\over \sqrt{2} \end{matrix} \right], u_3=\left[ \begin{matrix} -{1\over \sqrt{3}} \\ 1\over \sqrt{3} \\ 1\over \sqrt{3} \end{matrix} \right] u1=6 26 16 1,u2=02 12 1,u3=3 13 13 1
因此 W = ( u 1 , u 2 , u 3 ) [ 1 3 0 0 1 0 0 ] ( v 1 , v 2 ) W=(u_1,u_2,u_3)\left[ \begin{matrix} 1\over \sqrt{3} & 0 \\ 0 & 1 \\ 0 & 0 \end{matrix} \right] (v_1,v_2) W=(u1,u2,u3)3 100010(v1,v2)

Householder Transformation

要求实现 H x = y , ∥ x ∥ = ∥ y ∥ Hx=y, \left\| x \right\|=\left\| y \right\| Hx=y,x=y, 要求:$ 1) H^H=H, 2) H是酉阵$
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-OCnR3We5-1589176889190)(./householder.png)]
y = x − 2 x − y 2 y = x - 2 {{x -y} \over {2}} y=x22xy
由于 ∥ x ∥ = ∥ y ∥ \left\| x \right\|=\left\| y \right\| x=y, 因此 x , y , x − y x,y,x-y x,y,xy构成等腰三角形
因此根据正交投影可知 x − y 2 = p x , p = u u H {{x -y} \over {2}} = px, p=uu^H 2xy=pxp=uuH s p a n { x − y } span\{x-y\} span{xy}的正交投影, u = x − y ∥ x − y ∥ u={{x-y}\over{\left\| x -y\right\|}} u=xyxy
∴ H = I − 2 u u H \therefore H=I-2uu^H H=I2uuH,显然满足上述两点。
利用Householder Transformation可以将矩阵稀疏化。
[ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 0 ∗ ∗ ∗ 0 0 ∗ ∗ ] ⇒ H [ ∗ ∗ ∗ ∗ 0 ∗ ∗ ∗ 0 0 ∗ ∗ 0 0 0 ∗ ] \left[ \begin{matrix} * & * & * & * \\ * & * & * & * \\ 0 & * & * & * \\ 0 & 0 & * & * \end{matrix} \right] \xRightarrow{H} \left[ \begin{matrix} * & * & * & * \\ 0 & * & * & * \\ 0 & 0 & * & * \\ 0 & 0 & 0 & * \end{matrix} \right] 000H 000000
注意可以跨行和列去选择变换
[ ∗ 0 0 ∗ ∗ 0 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ] ⇒ H [ ∗ 0 0 ∗ ∗ 0 ∗ ∗ ∗ 0 0 0 0 0 0 0 0 0 ] \left[ \begin{matrix} * & 0 & 0 \\ * & * & 0 \\ * & * & * \\ * & * & * \\ * & * & * \\ * & * & * \end{matrix} \right] \xRightarrow{H} \left[ \begin{matrix} * & 0 & 0 \\ * & * & 0 \\ * & * & * \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix} \right] 000H 000000000000
同样也能用Householder Transformation求QR分解

Givens Rotation Transformation

[ C S − S C ] [ a b ] = [ a 2 + b 2 0 ] \left[ \begin{matrix} C & S \\ -S & C \end{matrix}\right]\left[ \begin{matrix} a \\ b \end{matrix}\right] = \left[ \begin{matrix} \sqrt{a^2+b^2} \\ 0 \end{matrix}\right] [CSSC][ab]=[a2+b2 0],其中 C = a a 2 + b 2 , S = b a 2 + b 2 C={{a}\over {\sqrt{a^2+b^2}}}, S={{b}\over {\sqrt{a^2+b^2}}} C=a2+b2 a,S=a2+b2 b

同样也能用Givens Rotation Transformation求QR分解

Application

  • 标准正交基

    • R ( A ) = { A x ∣ x ∈ R n } R(A)=\{Ax|x \in R^n\} R(A)={AxxRn}
      ∵ A x = U 1 Σ 1 V 1 H x = U 1 Z \because Ax=U_1 \Sigma_1 V_1^H x=U_1Z Ax=U1Σ1V1Hx=U1Z
      ∴ R ( A ) = s p a n { U 1 } , R ( A H ) = s p a n { V 1 } \therefore R(A)=span\{U_1\},R(A^H)=span\{V_1\} R(A)=span{U1},R(AH)=span{V1}
    • N ( A ) = { x ∣ A x = 0 } N(A)=\{x|Ax=0\} N(A)={xAx=0}
      ∵ A x = U 1 Σ 1 V 1 H x = 0 ⇒ V 1 H x = 0 \because Ax=U_1 \Sigma_1 V_1^H x=0 \Rightarrow V_1^H x=0 Ax=U1Σ1V1Hx=0V1Hx=0
      ∴ N ( A ) = s p a n { V 2 } , N ( N H ) = s p a n { U 2 } \therefore N(A)=span\{V_2\},N(N^H)=span\{U_2\} N(A)=span{V2},N(NH)=span{U2}
  • 低秩逼近
    r a n k ( A ) = r , d < r rank(A)=r, d < r rank(A)=r,d<r, 求 min ⁡ r a n k ( x ) = d = ∥ A − x ∥ 2 \min_{rank(x)=d}=\left\| A - x \right\|_2 minrank(x)=d=Ax2

  • 最小二乘法
    min ⁡ x ∥ A x − b ∥ 2 ⇒ min ⁡ y ∈ R ( A ) ∥ b − y ∥ 2 \min_x\left\|Ax-b\right\|_2 \Rightarrow \min_{y \in R(A)}\left\|b-y\right\|_2 minxAxb2minyR(A)by2
    A x = A A + b Ax=AA^+b Ax=AA+b的通解为齐次解加上特解,其特解为 A + b A^+b A+b,齐次通解为 σ = ( I − A + A ) z ∈ N ( A ) \sigma = (I - A^+A)z\in N(A) σ=(IA+A)zN(A)
    因此 x = A + b + ( I − A + A ) z x = A^+b + (I - A^+A)z x=A+b+(IA+A)z

Matrix Differential

导数与微分

  • 标量对向量的求导
    对于标量 f , x ⃗ ( n × 1 ) f, \vec x_{(n\times 1)} f,x (n×1) 有:
    ∂ f ∂ x ⃗ = [ ∂ f ∂ x i ] \frac{\partial f}{\partial \vec x}=\left[\partial f \over \partial x_{i}\right] x f=[xif]

  • 标量对矩阵的求导
    对于标量 f , X ( m × n ) f, X_{(m\times n)} f,X(m×n) 有:
    ∂ f ∂ X = [ ∂ f ∂ x i j ] \frac{\partial f}{\partial X}=\left[\partial f \over \partial x_{ij}\right] Xf=[xijf]
    我们知道标量对标量的梯度gradient和微分differentiation有这样的关系:
    d f = f ′ ( x ) d x df=f'(x)dx df=f(x)dx
    d f = ∑ i ∂ f ∂ x i d x i = ∂ f ∂ x ⃗ T d x ⃗ df=\sum_i{\partial f \over \partial x_i}dx_i={\frac{\partial f}{\partial \vec x}}^T d\vec x df=ixifdxi=x fTdx
    那么标量对矩阵也存在:
    d f = ∑ i j ∂ f ∂ x i j d x i j = t r ( ∂ f ∂ X T d X ) df=\sum_{ij}{\partial f \over \partial x_{ij}}dx_{ij}=tr({\partial f \over \partial X}^TdX) df=ijxijfdxij=tr(XfTdX)
    例子1:已知 f = ∣ X ∣ f=|X| f=X,求 d f df df ?
    我们知道
    ∣ X ∣ = ∑ i x i j A i j ( A i j 代 数 余 子 式 ) |X|=\sum_ix_{ij}A_{ij}(A_{ij}代数余子式) X=ixijAij(Aij)
    将上式代入得:
    ∂ f ∂ X = [ ∂ ∑ k x k j A k j ∂ x i j ] = [ A i j ] = ( X ∗ ) T \frac{\partial f}{\partial X}=\left[\partial \sum_kx_{kj}A_{kj} \over \partial x_{ij}\right]=\left[A_{ij}\right]={(X^*)}^T Xf=[xijkxkjAkj]=[Aij]=(X)T
    因此有
    d f = t r ( ∂ f ∂ X T d X ) = t r ( X ∗ d X ) = ∣ X ∣ t r ( X − 1 d X ) df=tr({\partial f \over \partial X}^TdX)=tr(X^*dX)=|X|tr(X^{-1}dX) df=tr(XfTdX)=tr(XdX)=Xtr(X1dX)
    例子2:求 d X − 1 dX^{-1} dX1 ?
    我们知道
    X X − 1 = E X X^{-1}=E XX1=E
    对等式两边微分有
    d X X − 1 = d E dXX^{-1}=dE dXX1=dE
    X d X − 1 = − X − 1 d X XdX^{-1}=-X^{-1}dX XdX1=X1dX
    因此有
    d X − 1 = − X − 1 d X X − 1 dX^{-1}=-X^{-1}dXX^{-1} dX1=X1dXX1
    例子3: f = a ⃗ T X b ⃗ f = {\vec a}^TX \vec b f=a TXb ,求 ∂ f ∂ X \partial f \over \partial X Xf?
    d f = a ⃗ T d X b ⃗ = t r ( a ⃗ T d X b ⃗ ) = t r ( b ⃗ a ⃗ T d X ) = t r ( ∂ f ∂ X T d X ) df ={\vec a}^T dX \vec b=tr({\vec a}^T dX \vec b)=tr( \vec b {\vec a}^T dX)=tr({\partial f \over \partial X}^TdX) df=a TdXb =tr(a TdXb )=tr(b a TdX)=tr(XfTdX)
    因此 ∂ f ∂ X = a ⃗ b ⃗ T {\partial f \over \partial X } =\vec a {\vec b}^T Xf=a b T

  • 复合法则
    已知 f = g ( Y ) f=g(Y) f=g(Y),且 Y = h ( X ) Y=h(X) Y=h(X),怎么求 ∂ f ∂ X \partial f \over \partial X Xf?(其中g和h都是逐元素的函数)
    d f = t r ( ∂ f ∂ Y T d Y ) = t r ( ∂ f ∂ Y T ( h ′ ( X ) ⊙ d X ) ) = t r ( ( ∂ f ∂ Y ⊙ h ′ ( X ) ) T d X ) df=tr({\partial f \over \partial Y}^TdY)=tr({\partial f \over \partial Y}^T(h'(X)\odot dX))=tr(({\partial f \over \partial Y}\odot h'(X))^TdX) df=tr(YfTdY)=tr(YfT(h(X)dX))=tr((Yfh(X))TdX)
    例子4: l o s s = − y ⃗ T log ⁡   s o f t m a x ( W x ⃗ ) loss =-{\vec y}^T\log\space softmax(W\vec x ) loss=y Tlog softmax(Wx ),求 ∂   l o s s ∂ W {\partial \space loss \over \partial W } W loss y ⃗ \vec y y 是只有一个元素为1其余元素为0的向量。
    s o f t m a x ( x ⃗ ) = e x ⃗ 1 ⃗ T e x ⃗ softmax(\vec x) = {{e^{\vec x}}\over {{\vec 1}^Te^{\vec x}}} softmax(x )=1 Tex ex
    l o s s = − y ⃗ T W x ⃗ + ( y ⃗ T 1 ⃗ ) log ⁡ ( 1 ⃗ T e W x ⃗ ) (*) loss =-{\vec y}^TW\vec x +({\vec y}^T\vec 1) \log({\vec 1}^Te^{W\vec x}) \tag{*} loss=y TWx +(y T1 )log(1 TeWx )(*)
    d   l o s s = − y ⃗ T d W x ⃗ + 1 ⃗ T ( e W x ⃗ ⊙ d W x ⃗ ) 1 ⃗ T e W x ⃗ (**) d\space loss = -{\vec y}^TdW\vec x+{{{\vec 1}^T(e^{W\vec x }\odot dW\vec x)}\over{{\vec 1}^Te^{W\vec x}}} \tag{**} d loss=y TdWx +1 TeWx 1 T(eWx dWx )(**)
    d   l o s s = − y ⃗ T d W x ⃗ + ( e W x ⃗ ) T d W x ⃗ 1 ⃗ T e W x ⃗ d\space loss = -{\vec y}^TdW\vec x+{{{(e^{W\vec x }})^TdW\vec x}\over{{\vec 1}^Te^{W\vec x}}} d loss=y TdWx +1 TeWx (eWx )TdWx
    d   l o s s = t r ( − y ⃗ T d W x ⃗ + ( e W x ⃗ ) T d W x ⃗ 1 ⃗ T e W x ⃗ ) d\space loss =tr( -{\vec y}^TdW\vec x+{{{(e^{W\vec x }})^TdW\vec x}\over{{\vec 1}^Te^{W\vec x}}}) d loss=tr(y TdWx +1 TeWx (eWx )TdWx )
    d   l o s s = t r ( x ⃗ ( s o f t m a x ( W x ⃗ ) − y ⃗ ) T d W ) d\space loss = tr(\vec x (softmax(W \vec x) - \vec y)^T dW) d loss=tr(x (softmax(Wx )y )TdW)
    ∂   l o s s ∂ W = ( s o f t m a x ( W x ⃗ ) − y ⃗ ) x ⃗ T {\partial \space loss \over \partial W } = (softmax(W \vec x) - \vec y){\vec x}^T W loss=(softmax(Wx )y )x T
    注意:
    (*)式 log ⁡ ( b ⃗ c ) = log ⁡ b ⃗ − 1 ⃗ log ⁡ c \log({\vec b \over c}) =\log \vec b - \vec 1\log c log(cb )=logb 1 logc,且 y ⃗ T 1 ⃗ = 1 {\vec y}^T\vec 1 = 1 y T1 =1
    (**)式 log ⁡ ( 1 ⃗ T e W x ⃗ ) \log({\vec 1}^Te^{W\vec x}) log(1 TeWx ) 是标量, e W x ⃗ e^{W\vec x} eWx 是逐元素函数,因此 d log ⁡ ( 1 ⃗ T e W x ⃗ ) = 1 1 ⃗ T e W x ⃗ ⋅ 1 ⃗ T ( e W x ⃗ ⊙ d W x ⃗ ) d\log({\vec 1}^Te^{W\vec x})={1\over{{\vec 1}^Te^{W\vec x}}}\cdot {{\vec 1}^T(e^{W\vec x }\odot dW\vec x)} dlog(1 TeWx )=1 TeWx 11 T(eWx dWx )

  • 向量对向量求导
    对于 f ⃗ ( m × 1 ) , x ⃗ ( n × 1 ) \vec f_{(m\times 1)}, \vec x_{(n\times 1)} f (m×1),x (n×1) 有:
    ∂ f ⃗ ∂ x ⃗ = [ ∂ f i ∂ x j ] ( n × m ) \frac{\partial \vec f}{\partial \vec x}=\left[\partial f_{i} \over \partial x_{j}\right]_{(n \times m)} x f =[xjfi](n×m)
    d f ⃗ = [ ∂ f i ∂ x ⃗ T ] d x ⃗ = ∂ f ⃗ ∂ x ⃗ T d x ⃗ d\vec f=\left[{\frac{\partial f_i}{\partial \vec x}}^T \right]d\vec x ={\frac{\partial \vec f}{\partial \vec x}}^Td\vec x df =[x fiT]dx =x f Tdx

  • 矩阵对矩阵求导
    对于矩阵 F ( m × n ) , X ( p × q ) F_{(m\times n)},X_{(p\times q)} F(m×n),X(p×q) 有:
    ∂ F ∂ X = [ ∂ F i j ∂ x k l ] ( p q × m n ) \frac{\partial F}{\partial X}=\left[\partial F_{ij} \over \partial x_{kl}\right]_{(pq\times mn)} XF=[xklFij](pq×mn)
    矩阵向量化:
    对于矩阵 X ( p × q ) X_{(p\times q)} X(p×q),其矩阵向量化 v e c ( X ) ( p q × 1 ) = [ X 1 T , X 2 T , . . . , X q T ] T , X i 是 X 的 列 向 量 vec(X)_{(pq\times 1)}=\left[X_1^T, X_2^T,...,X_q^T\right]^T,X_i是X的列向量 vec(X)(pq×1)=[X1T,X2T,...,XqT]T,XiX
    v e c ( A + B ) = v e c ( A ) + v e c ( B ) vec(A+B)=vec(A)+vec(B) vec(A+B)=vec(A)+vec(B)
    v e c ( a ⃗ b ⃗ T ) = b ⃗ ⊗ a ⃗ vec(\vec a {\vec b}^T)=\vec b\otimes \vec a vec(a b T)=b a
    X = ∑ i X i e i T X=\sum_iX_i{e_i}^T X=iXieiT
    v e c ( ( A B ) ⊗ ( C D ) ) = v e c ( ( A ⊗ C ) ( B ⊗ D ) ) vec((AB)\otimes (CD))=vec((A \otimes C)(B \otimes D)) vec((AB)(CD))=vec((AC)(BD))
    v e c ( A X B ) = v e c ( ∑ i A X i e i T B ) = ∑ i v e c ( ( A X i ) ( B T e i ) T ) = ∑ i ( B T e i ) ⊗ ( A X i ) = ( B T ⊗ A ) v e c ( X ) vec(AXB)=vec(\sum_iAX_i{e_i}^TB)=\sum_ivec((AX_i)(B^Te_i)^T)=\sum_i(B^Te_i)\otimes(AX_i)=(B^T\otimes A)vec(X) vec(AXB)=vec(iAXieiTB)=ivec((AXi)(BTei)T)=i(BTei)(AXi)=(BTA)vec(X)
    因此有:
    ∂ F ∂ X = ∂ v e c ( F ) ∂ v e c ( X ) ( p q × m n ) \frac{\partial F}{\partial X}={\frac{\partial vec(F)}{\partial vec(X)}}_{(pq\times mn)} XF=vec(X)vec(F)(pq×mn)
    v e c ( d F ) = ∂ F ∂ X T v e c ( d X ) vec(dF)={\frac{\partial F}{\partial X}}^Tvec(dX) vec(dF)=XFTvec(dX)
    求导时矩阵被向量化,弊端是这在一定程度破坏了矩阵的结构,会导致结果变得形式复杂;好处是多元微积分中关于GradientHessian矩阵的结论可以沿用过来,只需将矩阵向量化。

应用

  1. 紧紧抓住两个转换公式: d f = t r ( ∂ f ∂ X T d X ) , d ( t r a c e ( f ( X ) ) ) = t r a c e ( d f ( X ) ) df=tr({\partial f \over \partial X}^TdX), d(trace(f(X)))=trace(df(X)) df=tr(XfTdX),d(trace(f(X)))=trace(df(X))以及定义,那么几乎所有的导数我们都能求。

  2. 泰勒公式

  3. 求最优化问题

    • 最小二乘法:

      • min ⁡ x ∥ A x − b ∥ 2 2 \min_x \left\| Ax-b \right\|_2^2 minxAxb22

      • min ⁡ x ∥ A x − b ∥ 2 2 + λ ∥ x ∥ 2 2 \min_x \left\| Ax-b \right\|_2^2+ \lambda \left\| x\right\|_2^2 minxAxb22+λx22

    • 有约束问题:

      • min ⁡ x ∥ A x ∥ 2 2 , s . t .   e T x = 1 \min_x \left\| Ax \right\|_2^2, s.t.\medspace e^Tx=1 minxAx22,s.t.eTx=1或者 s . t .   ∥ x ∥ 2 = 1 s.t.\medspace \left\| x \right\|_2=1 s.t.x2=1
      • min ⁡ U t r ( U T A U ) , s . t . U H U = I , A \min_Utr(U^TAU),s.t.U^HU=I,A minUtr(UTAU),s.t.UHU=I,A半正定
      • min ⁡ X [ t r ( X T X ) − 2 t r ( X ) ] , s . t . X A = 0 \min_X[tr(X^TX)-2tr(X)],s.t.XA=\bold{0} minX[tr(XTX)2tr(X)],s.t.XA=0
    • Locally Linear Embedding
      给定一组数据 x i ∈ R n x_i \in R^n xiRn及其邻域数据 x i 1 , x i 2 , ⋯   , x i j x_{i1},x_{i2},\cdots,x_{ij} xi1,xi2,,xij, 要求将 x i x_i xi降维为 y i ∈ R d y_i \in R^d yiRd
      LLE 认为数据局部是线性的 x i = ∑ i = 1 j w i j x i j x_i=\sum_{i=1}^jw_{ij}x_{ij} xi=i=1jwijxij,且在降维过程中线性不变,且组合系数不变。
      优化目标: arg ⁡ min ⁡ Y ∑ i = 1 N ∥ y i − ∑ j = 1 k w i j y i j ∥ 2 2 , s . t .   Y Y T = N I \arg \min_Y \sum_{i=1}^N\left\| y_i-\sum_{j=1}^kw_{ij}y_{ij}\right\|_2^2,s.t. \medspace YY^T=NI argminYi=1Nyij=1kwijyij22,s.t.YYT=NI

  4. 总体最小二乘
    ( A + Δ A ) x = ( b + Δ b ) ⇒ ( [ A , b ] + [ Δ A + Δ b ] ) [ x − 1 ] = 0 (A+\Delta A)x=(b+\Delta b) \Rightarrow ([A,b]+[\Delta A + \Delta b])\left[ \begin{matrix} {x} \\ -1 \end{matrix}\right]=0 (A+ΔA)x=(b+Δb)([A,b]+[ΔA+Δb])[x1]=0
    B = [ A , b ] , D = [ Δ A + Δ b ] , Z = [ x − 1 ] B=[A,b], D = [\Delta A + \Delta b], Z=\left[ \begin{matrix} {x} \\ -1 \end{matrix}\right] B=[A,b],D=[ΔA+Δb],Z=[x1]
    因此要求 x x x的近似解即求: min ⁡ D , x ∥ D ∥ F 2 , s . t .   ( B + D ) Z = 0 \min_{D,x} \left\| D \right\|_F^2, s.t. \medspace (B+D)Z=0 minD,xDF2,s.t.(B+D)Z=0

    • B B B不是列满秩,则存在 Z ≠ 0 , B Z = 0 Z\neq 0, BZ=0 Z=0,BZ=0,要使 min ⁡ D , x ∥ D ∥ F 2 \min_{D,x} \left\| D \right\|_F^2 minD,xDF2最小,只需 D = 0 D=0 D=0
    • B B B是列满秩,要使 ( B + D ) Z = 0 (B+D)Z=0 (B+D)Z=0有解,则 r ( B + D ) ≤ n r(B+D) \le n r(B+D)n
      • 若B的奇异值满足 σ 1 ≥ σ 2 ≥ ⋯ ≥ σ n ≥ σ n + 1 \sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_n \ge \sigma_{n+1} σ1σ2σnσn+1
        要使要使 min ⁡ D , x ∥ D ∥ F 2 \min_{D,x} \left\| D \right\|_F^2 minD,xDF2最小,只需 D = − σ n + 1 u n + 1 v n + 1 T D=-\sigma_{n+1}u_{n+1}v_{n+1}^T D=σn+1un+1vn+1T
        N ( B + D ) = s p a n { v n + 1 } N(B+D)=span\{v_{n+1}\} N(B+D)=span{vn+1}
        v n + 1 = ( v n + 1 , 1 , v n + 1 , 2 , ⋯   , v n + 1 , n + 1 ) T v_{n+1}=(v_{n+1,1},v_{n+1,2},\cdots,v_{n+1,n+1})^T vn+1=(vn+1,1,vn+1,2,,vn+1,n+1)T
        因此 ( x − 1 ) = ( − v n + 1 , 1 v n + 1 , n + 1 − v n + 1 , 2 v n + 1 , n + 1 ⋮ − 1 ) \left( \begin{matrix} {x} \\ -1 \end{matrix}\right) = \left( \begin{matrix} -\frac{v_{n+1,1}}{v_{n+1,n+1}} \\ -\frac{v_{n+1,2}}{v_{n+1,n+1}} \\ \vdots \\ -1 \end{matrix}\right) (x1)=vn+1,n+1vn+1,1vn+1,n+1vn+1,21
      • 若B的奇异值满足 σ 1 ≥ σ 2 ≥ ⋯ ≥ σ p ≥ σ p + 1 = ⋯ = σ n + 1 \sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_p \ge \sigma_{p+1} = \cdots = \sigma_{n+1} σ1σ2σpσp+1==σn+1
        N ( B + D ) = s p a n { v p + 1 , ⋯   , v n + 1 } N(B+D)=span\{v_{p+1},\cdots,v_{n+1}\} N(B+D)=span{vp+1,,vn+1}
        V 1 = ( v p + 1 , ⋯   , v n + 1 ) T V_1=(v_{p+1},\cdots,v_{n+1})^T V1=(vp+1,,vn+1)T
        运用Householder变换
        H V 1 = ( v ^ n + 1 0 ⋮ 0 ) HV_1= \left( \begin{matrix} \hat{v}_{n+1} \\ 0 \\ \vdots \\ 0 \end{matrix}\right) HV1=v^n+100

Matrix Equation

分裂迭代

方程 A x = b Ax=b Ax=b
A = M − N ⇒ M x = N x + b A=M-N \Rightarrow Mx=Nx+b A=MNMx=Nx+b
迭代形式 M x k = N x k − 1 + b Mx_{k}=Nx_{k-1}+b Mxk=Nxk1+b
x k = M − 1 ( N x k − 1 + b ) x_k=M^{-1}(Nx_{k-1}+b) xk=M1(Nxk1+b)
收敛性要求: ρ ( M − 1 N ) < 1 \rho (M^{-1}N) \lt 1 ρ(M1N)<1
若A正定,当且仅当 M + N H M+N^H M+NH时, ρ ( M − 1 N ) < 1 \rho (M^{-1}N) \lt 1 ρ(M1N)<1

  • Jacobi迭代
    A = D − L − U A = D- L -U A=DLU
    M = D , N = L + U = D − A M = D, N = L + U = D -A M=D,N=L+U=DA
    x k = D − 1 ( ( L + U ) x k − 1 + b ) = ( I − D − 1 A ) x k − 1 + D − 1 b x_k=D^{-1}((L+U)x_{k-1}+b)=(I-D^{-1}A)x_{k-1}+D^{-1}b xk=D1((L+U)xk1+b)=(ID1A)xk1+D1b
    收敛性要求: ρ ( I − D − 1 A ) < 1 \rho (I-D^{-1}A) \lt 1 ρ(ID1A)<1
    适用:A对角占优

  • Gauss-Seideld迭代
    A = D − L − U A = D- L -U A=DLU
    M = D − L , N = U M = D - L , N = U M=DL,N=U
    x k = ( D − L ) − 1 ( U x k − 1 + b ) x_k=(D-L)^{-1}(Ux_{k-1}+b) xk=(DL)1(Uxk1+b)
    收敛性要求: ρ ( ( D − L ) − 1 U ) < 1 \rho ((D-L)^{-1}U) \lt 1 ρ((DL)1U)<1
    适用:元素集中在下三角处

  • SOR
    M = 1 ω D − L , N = ( ( 1 ω − 1 ) D + U ) M = {1 \over \omega}D - L , N =(({1 \over \omega}-1)D+U) M=ω1DL,N=((ω11)D+U)
    ( D − ω L ) x k = ( ( 1 − ω ) D + ω U ) x k − 1 + ω b (D-\omega L)x_k=((1 - \omega)D + \omega U)x_{k-1}+\omega b (DωL)xk=((1ω)D+ωU)xk1+ωb
    收敛性要求: ρ ( ( D − ω L ) − 1 ( ( 1 − ω ) D + ω U ) ) < 1 , 0 < ω < 2 \rho ((D-\omega L)^{-1}((1 - \omega)D + \omega U)) \lt 1, 0 \lt \omega \lt 2 ρ((DωL)1((1ω)D+ωU))<1,0<ω<2
    适用:下三角占优

  • SSOR
    交替迭代

    1. M = 1 ω D − L , N = ( ( 1 ω − 1 ) D + U ) M = {1 \over \omega}D - L , N =(({1 \over \omega}-1)D+U) M=ω1DL,N=((ω11)D+U)
    2. M = 1 ω D − U , N = ( ( 1 ω − 1 ) D + L ) M = {1 \over \omega}D - U , N =(({1 \over \omega}-1)D+L) M=ω1DU,N=((ω11)D+L)

最速下降

A x = b    ⟺    min ⁡ x ψ ( x ) = 1 2 ( x , x ) A − ( b , x ) Ax=b \iff \min_x \psi(x)={1\over 2}(x,x)_A-(b,x) Ax=bxminψ(x)=21(x,x)A(b,x)
A对称正定
迭代式: x k + 1 = x k + α k r k , r k = b − A x k = − ∇ ψ ( x k ) , α k = arg ⁡ min ⁡ α ∥ x k + α r k − x ∗ ∥ x_{k+1}=x_k + \alpha_k r_k, r_k=b-Ax_k=-\nabla \psi(x_k), \alpha_k=\arg \min_\alpha \left\| x_k + \alpha r_k -x^{*} \right\| xk+1=xk+αkrk,rk=bAxk=ψ(xk),αk=argminαxk+αrkx
α \alpha α只需满足下式:
d ψ ( x k + α r k ) d α = 0 \frac{d\psi (x_k + \alpha r_k)}{d \alpha}=0 dαdψ(xk+αrk)=0
ψ ( x k + α r k ) = ψ ( x k ) + α ( − r k , r k ) + α 2 2 ( A r k , r k ) \psi (x_k + \alpha r_k)=\psi(x_k)+\alpha(-r_k,r_k)+\frac{\alpha^2}{2}(Ar_k,r_k) ψ(xk+αrk)=ψ(xk)+α(rk,rk)+2α2(Ark,rk)
( − r k , r k ) + α ( A r k , r k ) = 0 ⇒ α = ∥ r k ∥ 2 2 ∥ r k ∥ A 2 (-r_k,r_k)+\alpha(Ar_k,r_k)=0 \Rightarrow \alpha = \frac{\left\|r_k\right\|_2^2}{\left\|r_k\right\|_A^2} (rk,rk)+α(Ark,rk)=0α=rkA2rk22
收敛性分析: ∥ x k + 1 − x ∗ ∥ A = min ⁡ α ∥ x k + α r k − x ∗ ∥ A = min ⁡ α ∥ x k + α A ( x ∗ − x k ) − x ∗ ∥ A {\left\|x_{k+1}-x^{*}\right\|_A}=\min_\alpha \left\|x_k+\alpha r_k-x^{*}\right\|_A=\min_\alpha \left\|x_k+\alpha A(x^{*}-x_k)-x^{*}\right\|_A xk+1xA=minαxk+αrkxA=minαxk+αA(xxk)xA
= min ⁡ α ∥ ( I − α A ) ( x k − x ∗ ) ∥ A =\min_\alpha \left\|(I-\alpha A)(x_k-x^{*})\right\|_A =minα(IαA)(xkx)A
≤ min ⁡ α ρ ( I − α A ) ∥ x k − x ∗ ∥ A \le \min_\alpha \rho (I-\alpha A)\left\|x_k-x^{*}\right\|_A minαρ(IαA)xkxA
≤ λ n − λ 1 λ n + λ 1 ∥ x k − x ∗ ∥ A \le \frac{\lambda_n-\lambda_1}{\lambda_n+\lambda_1}\left\|x_k-x^{*}\right\|_A λn+λ1λnλ1xkxA
≤ ( λ n − λ 1 λ n + λ 1 ) k ∥ x 0 − x ∗ ∥ A \le (\frac{\lambda_n-\lambda_1}{\lambda_n+\lambda_1})^k\left\|x_0-x^{*}\right\|_A (λn+λ1λnλ1)kx0xA

子空间迭代

A ∈ C n × n A \in C^{n \times n} ACn×n
A x = b    ⟺    x = A − 1 b    ⟺    A ( x ∗ − x 0 ) = b − A x 0 = r 0    ⟺    x ∗ = x 0 + A − 1 r 0 Ax=b \iff x=A^{-1}b \iff A(x^{*}-x_0)=b-Ax_0=r_0 \iff x^{*}=x_0+A^{-1}r_0 Ax=bx=A1bA(xx0)=bAx0=r0x=x0+A1r0
A − 1 = ∑ i = 0 m C i A i r 0 A^{-1}=\sum_{i=0}^mC_iA^ir_0 A1=i=0mCiAir0
x ∗ = x 0 + K m + 1 ( A , r 0 ) x^{*}=x_0+\Kappa_{m+1}(A,r_0) x=x0+Km+1(A,r0)

共轭CG

x ∗ = x 0 + K m + 1 ( A , r 0 ) x^{*}=x_0+\Kappa_{m+1}(A,r_0) x=x0+Km+1(A,r0)
g 0 , g 1 , ⋯   , g m g_0,g_1,\cdots,g_{m} g0,g1,,gm K m + 1 ( A , r 0 ) \Kappa_{m+1}(A,r_0) Km+1(A,r0)一组标准A正交基,则
x ∗ = x 0 + ∑ j = 0 m α j g j x^{*}=x_0+\sum_{j=0}^m\alpha_jg_j x=x0+j=0mαjgj
x k = x 0 + ∑ j = 0 k − 1 α j g j x_{k}=x_0+\sum_{j=0}^{k-1}\alpha_jg_j xk=x0+j=0k1αjgj,则 x k + 1 = x k + α k g k x_{k+1}=x_k+\alpha_kg_k xk+1=xk+αkgk
如何确定系数 α k \alpha_k αk和正交基 g k g_k gk

  1. α k \alpha_k αk
    显然 x m + 1 = x ∗ x_{m+1}=x^{*} xm+1=x,那么 r m + 1 = b − A x m + 1 = b − A x ∗ = 0 r_{m+1}=b-Ax_{m+1}=b-Ax^{*}=0 rm+1=bAxm+1=bAx=0
    r m + 1 = b − A x m + 1 = b − A ( x m + α m g m ) = r m − α m A g m = b − A x m − α m A g m = b − A ( x m − 1 + α m − 1 g m − 1 ) − α m A g m = r m − 1 − α m − 1 A g m − 1 − α m A g m = ⋯ = r k − α k A g k − α k + 1 A g k + 1 − ⋯ − α m A g m = 0 \begin{aligned} r_{m+1} & =b-Ax_{m+1}\\ & =b-A(x_m+\alpha_mg_m) \\ &=r_{m}-\alpha_mAg_m \\ &=b-Ax_{m}-\alpha_mAg_m \\ &=b-A(x_{m-1}+\alpha_{m-1}g_{m-1})-\alpha_mAg_m \\ & =r_{m-1}-\alpha_{m-1}Ag_{m-1}-\alpha_mAg_m \\ &=\cdots \\ &= r_k - \alpha_{k}Ag_{k}- \alpha_{k+1}Ag_{k+1} - \cdots -\alpha_mAg_m \\ &=0 \end{aligned} rm+1=bAxm+1=bA(xm+αmgm)=rmαmAgm=bAxmαmAgm=bA(xm1+αm1gm1)αmAgm=rm1αm1Agm1αmAgm==rkαkAgkαk+1Agk+1αmAgm=0
    g k T r k = α k g k T A g k − α k + 1 g k T A g k + 1 − ⋯ − α m g k T A g m = α k g k T A g k g_k^Tr_k = \alpha_{k}g_k^TAg_{k}- \alpha_{k+1}g_k^TAg_{k+1} - \cdots -\alpha_mg_k^TAg_m=\alpha_{k}g_k^TAg_{k} gkTrk=αkgkTAgkαk+1gkTAgk+1αmgkTAgm=αkgkTAgk
    ∴ α k = g k T r k g k T A g k \therefore \alpha_{k}=\frac{g_k^Tr_k}{g_k^TAg_{k}} αk=gkTAgkgkTrk
  2. g k g_k gk
    Gram-shcmit
    1. g 0 = r 0 g_0=r_0 g0=r0
    2. 假设已求得 g 0 , g 1 , ⋯   , g k − 1 g_0,g_1,\cdots,g_{k-1} g0,g1,,gk1,有 s p a n { g 0 , g 1 , ⋯   , g k − 1 } = s p a n { r 0 , A r 0 , ⋯   , A k − 1 r 0 } span\{g_0,g_1,\cdots,g_{k-1}\}=span\{r_0,Ar_0,\cdots,A^{k-1}r_0\} span{g0,g1,,gk1}=span{r0,Ar0,,Ak1r0}
    3. 对于 r k r_k rk有:
      r k = b − A x k = b − A ( x k − 1 + α k − 1 g k − 1 ) = r k − 1 − α k − 1 A g k − 1 = ⋯ = r 0 − ∑ j = 0 k − 1 α i A g i ∈ K k + 1 ( A , r 0 ) \begin{aligned}r_k&=b-Ax_k \\ &=b-A(x_{k-1}+\alpha_{k-1}g_{k-1}) \\&=r_{k-1}-\alpha_{k-1}Ag_{k-1}\\ &= \cdots \\&=r_0 - \sum_{j=0}^{k-1}\alpha_{i}Ag_{i} \in \Kappa_{k+1}(A,r_0) \end{aligned} rk=bAxk=bA(xk1+αk1gk1)=rk1αk1Agk1==r0j=0k1αiAgiKk+1(A,r0)
      g k ∈ K k + 1 ( A , r 0 ) g_k \in \Kappa_{k+1}(A,r_0) gkKk+1(A,r0),故 r k = β 0 g 0 + β 1 g 1 + ⋯ + β k g k r_k=\beta_0g_0+\beta_1g_1+\cdots+\beta_kg_k rk=β0g0+β1g1++βkgk
      ∴ g i T A r k = β i g i T A g i \therefore g_i^TAr_k=\beta_ig_i^TAg_i giTArk=βigiTAgi i = 0 , 1 , ⋯   , k i=0,1,\cdots, k i=0,1,,k
      ∴ g k \therefore g_k gk可求

收敛速度大于最速下降法

A非对称

  • 直观法
    A x = b Ax=b Ax=b转化为 A T A x = b A^TAx=b ATAx=b
    收敛速度远小于 A A A正定时收敛速度
  • 广义最小残量法
    前面在Arnoldi分解我们得到 K k ( A , r 0 ) \Kappa_k(A,r_0) Kk(A,r0)的标准正交基 v 1 , v 2 , ⋯   , v k v_1,v_2,\cdots,v_k v1,v2,,vk
    V k = ( v 1 , v 2 , ⋯   , v k ) V_k=(v_1,v_2,\cdots,v_k) Vk=(v1,v2,,vk),则 A V k = V k H k + h k + 1 , k v k + 1 e k T = V k + 1 H ^ k AV_k=V_kH_k+h_{k+1,k}v_{k+1}e_k^T=V_{k+1}\hat{H}_k AVk=VkHk+hk+1,kvk+1ekT=Vk+1H^k
    b − A x k = b − A ( x 0 + V k y ) = r 0 − V k + 1 H ^ k y = V k + 1 ( ∥ r 0 ∥ e 1 − H ^ k y ) b-Ax_k=b-A(x_0+V_ky)=r_0-V_{k+1}\hat{H}_ky=V_{k+1}(\left\|r_0\right\|e_1-\hat{H}_ky) bAxk=bA(x0+Vky)=r0Vk+1H^ky=Vk+1(r0e1H^ky)
    ∴ ∥ b − A x k ∥ 2 = ∥ ∥ r 0 ∥ e 1 − H ^ k y ∥ 2 \therefore \left\|b-Ax_k\right\|_2= \left\|\left\|r_0\right\|e_1-\hat{H}_ky\right\|_2 bAxk2=r0e1H^ky2
    y = arg ⁡ min ⁡ y ∥ ∥ r 0 ∥ e 1 − H ^ k y ∥ 2 y=\arg \min_y\left\|\left\|r_0\right\|e_1-\hat{H}_ky\right\|_2 y=argminyr0e1H^ky2
    求解上述式子:
    1. 最小二乘法
    2. Givens旋转变换,对 H ^ k 进 行 Q R 分 解 \hat{H}_k进行QR分解 H^kQR

Convex Optimization

次梯度

g T g^T gT 次梯度

  1. 向量: f ( y ) ≥ f ( x ) + g T ( y − x ) f(y) \ge f(x) + g^T(y-x) f(y)f(x)+gT(yx)
  2. 矩阵: f ( Y ) ≥ f ( X ) + t r a c e ( g T ( Y − X ) ) f(Y) \ge f(X) + trace(g^T(Y-X)) f(Y)f(X)+trace(gT(YX))

例题:求和范数的次梯度 ∂ ∥ X ∥ ∗ \partial \left\|X\right\|_{*} X
f ( X ) ≥ f ( 0 ) + t r a c e ( g T X ) ⇒ ∑ σ i ≥ t r a c e ( g T X ) f(X) \ge f(0) + trace(g^TX) \Rightarrow \sum \sigma_i \ge trace(g^TX) f(X)f(0)+trace(gTX)σitrace(gTX)
t r ( Σ 1 ) ≥ t r ( g T U 1 Σ 1 V 1 T ) ⇒ t r ( V 1 Σ 1 V 1 T ) ≥ t r ( g T U 1 Σ 1 V 1 T ) tr(\Sigma_1) \ge tr(g^TU_1\Sigma_1V_1^T) \Rightarrow tr(V_1\Sigma_1V_1^T) \ge tr(g^TU_1\Sigma_1V_1^T) tr(Σ1)tr(gTU1Σ1V1T)tr(V1Σ1V1T)tr(gTU1Σ1V1T)
∴ ∂ ∥ X ∥ ∗ = { U 1 V 1 T + Y ∣ U 1 T Y = 0 , Y V 1 = 0 , ∥ X ∥ 2 ≤ 1 } \therefore \partial \left\|X\right\|_{*} =\{U_1V_1^T+Y|U_1^TY=0,YV_1=0,\left\|X\right\|_2 \le 1\} X={U1V1T+YU1TY=0,YV1=0,X21}

Solving Characteristic Value

vex Optimization

次梯度

g T g^T gT 次梯度

  1. 向量: f ( y ) ≥ f ( x ) + g T ( y − x ) f(y) \ge f(x) + g^T(y-x) f(y)f(x)+gT(yx)
  2. 矩阵: f ( Y ) ≥ f ( X ) + t r a c e ( g T ( Y − X ) ) f(Y) \ge f(X) + trace(g^T(Y-X)) f(Y)f(X)+trace(gT(YX))

例题:求和范数的次梯度 ∂ ∥ X ∥ ∗ \partial \left\|X\right\|_{*} X
f ( X ) ≥ f ( 0 ) + t r a c e ( g T X ) ⇒ ∑ σ i ≥ t r a c e ( g T X ) f(X) \ge f(0) + trace(g^TX) \Rightarrow \sum \sigma_i \ge trace(g^TX) f(X)f(0)+trace(gTX)σitrace(gTX)
t r ( Σ 1 ) ≥ t r ( g T U 1 Σ 1 V 1 T ) ⇒ t r ( V 1 Σ 1 V 1 T ) ≥ t r ( g T U 1 Σ 1 V 1 T ) tr(\Sigma_1) \ge tr(g^TU_1\Sigma_1V_1^T) \Rightarrow tr(V_1\Sigma_1V_1^T) \ge tr(g^TU_1\Sigma_1V_1^T) tr(Σ1)tr(gTU1Σ1V1T)tr(V1Σ1V1T)tr(gTU1Σ1V1T)
∴ ∂ ∥ X ∥ ∗ = { U 1 V 1 T + Y ∣ U 1 T Y = 0 , Y V 1 = 0 , ∥ X ∥ 2 ≤ 1 } \therefore \partial \left\|X\right\|_{*} =\{U_1V_1^T+Y|U_1^TY=0,YV_1=0,\left\|X\right\|_2 \le 1\} X={U1V1T+YU1TY=0,YV1=0,X21}

Solving Characteristic Value

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值