矩阵论(Matrix)

大纲

  • 矩阵微积分:多元微积分的一种特殊表达,尤其是在矩阵空间上进行讨论的时候
  • 逆矩阵(inverse matrix)
  • 矩阵分解:特征分解(Eigendecomposition),又称谱分解(Spectral decomposition);LU分解;奇异值分解(singular value decomposition);QR分解;科列斯基分解
  • 矩阵行列式(Determinant):在欧几里得空间中,行列式描述的是一个线性变换对“体积”所造成的影响
  • 特征向量(eigenvector) A v = λ v Av=\lambda v Av=λv,其中 λ \lambda λ特征值 v v v A A A的特征向量, A A A的所有特征值的全体叫 A A A的谱,记为 λ ( A ) \lambda(A) λ(A)
  • 迹(trance) tr ⁡ ( A ) = A 1 , 1 + ⋯ + A n , n \operatorname{tr}(\mathbf{A}) = \mathbf{A}_{1, 1} + \cdots + \mathbf{A}_{n, n} tr(A)=A1,1++An,n,一个矩阵的迹是其特征值的总和
  • 奇异矩阵(Singular Matrix):奇异矩阵是指行列式为零的方阵。如果一个矩阵 A 是奇异矩阵,那么它的奇异值中至少有一个为零。这是因为奇异矩阵的行列式为零,而行列式是矩阵奇异值的乘积。
  • 正交矩阵(orthogonal matrix):是一个方阵,其行向量與列向量皆為正交的单位向量,使得該矩陣的转置矩阵為其逆矩阵。 Q Q T = I QQ^T=I QQT=I
  • 正定矩阵和半正定矩阵(positive semi-definite matrix):一个 n × n n\times n n×n 的实对称矩阵 M M M 是正定的,当且仅当对于所有的非零实系数向量 z \mathbf {z} z,都有 z T M z > 0 \mathbf {z} ^{T}M\mathbf {z} >0 zTMz>0。其中 z T \mathbf {z} ^{T} zT表示 z \mathbf {z} z 的转置
  • 伴随矩阵(adjugate matrix):如果矩阵可逆,那么它的逆矩阵和它的伴随矩阵之间只差一个系数
  • 共轭矩阵(又叫Hermite矩阵):矩阵本身先转置再把矩阵中每个元素取共轭(虚部变号的运算)得到的矩阵
  • 共轭转置(conjugate transpose or Hermitian transpose) A ∗ = ( A ‾ ) T = A T ‾ A^* = (\overline{A})^\mathrm{T} = \overline{A^\mathrm{T}} A=(A)T=AT, A ‾ \overline{A} A表示对矩阵A元素取复共轭
  • 酉矩阵(又叫幺正矩阵,unitary matrix):指其共轭转置恰为其逆矩阵的复数方阵, U ∗ U = U U ∗ = I n U^{*}U=UU^{*}=I_{n} UU=UU=In
  • 实对称矩阵:元素都为实数的对称矩阵
  • 对角矩阵(diagonal matrix):一个主对角线之外的元素皆为0的矩阵,常写为diag(a1,a2,…,an)
  • 雅可比矩阵(Jacobian matrix) J = [ ∂ f ∂ x 1 ⋯ ∂ f ∂ x n ] = [ ∂ f 1 ∂ x 1 ⋯ ∂ f 1 ∂ x n ⋮ ⋱ ⋮ ∂ f m ∂ x 1 ⋯ ∂ f m ∂ x n ] \mathbf {J} ={\begin{bmatrix}{\dfrac {\partial \mathbf {f} }{\partial x_{1}}}&\cdots &{\dfrac {\partial \mathbf {f} }{\partial x_{n}}}\end{bmatrix}}={\begin{bmatrix}{\dfrac {\partial f_{1}}{\partial x_{1}}}&\cdots &{\dfrac {\partial f_{1}}{\partial x_{n}}}\\\vdots &\ddots &\vdots \\{\dfrac {\partial f_{m}}{\partial x_{1}}}&\cdots &{\dfrac {\partial f_{m}}{\partial x_{n}}}\end{bmatrix}} J=[x1fxnf]= x1f1x1fmxnf1xnfm
  • 黑塞矩阵(又叫海森矩阵,Hessian matrix):由多变量实值函数的所有二阶偏导数组成的方阵, H i j = ∂ 2 f ∂ x i ∂ x j \mathbf {H} _{ij}={\frac {\partial ^{2}f}{\partial x_{i}\partial x_{j}}} Hij=xixj2f
  • 矩阵范数(matrix norm)

一、矩阵微积分

向量对向量的偏导称 Jacobian Matrix:
J = ∂ y ( n ) ∂ x ( m ) = ( ∂ y 1 ∂ x 1 ⋯ ∂ y 1 ∂ x m ⋮ ⋱ ⋮ ∂ y n ∂ x 1 ⋯ ∂ y n ∂ x m ) n × m J = \frac{\partial{y_{(n)}}}{\partial{x_{(m)}}} = \begin{pmatrix} \frac{\partial{y_1}}{\partial{x_1}} & \cdots & \frac{\partial{y_1}}{\partial{x_m}} \\ \vdots & \ddots & \vdots \\ \frac{\partial{y_n}}{\partial{x_1}} & \cdots & \frac{\partial{y_n}}{\partial{x_m}} \end{pmatrix}_{n \times m} J=x(m)y(n)= x1y1x1ynxmy1xmyn n×m
标量对向量的偏导、向量对标量的偏导都是相应向量为一维的情况。
这里采用了称为分子布局的表示方法,另外还有将矩阵(向量)微积分表示为这里这种形式的转置的,称为分母布局。但用分母布局表示时,下面的运算法则没有这么好记的形式。

与标量微积分对比:

  • 加法法则不变 ∂ y + z ∂ x = ∂ y ∂ x + ∂ z ∂ x \frac{\partial{y + z}}{\partial{x}} = \frac{\partial{y}}{\partial{x}} + \frac{\partial{z}}{\partial{x}} xy+z=xy+xz

  • 链式法则不变 ∂ z ∂ x = ∂ z ∂ y ⋅ ∂ y ∂ x \frac{\partial{z}}{\partial{x}} = \frac{\partial{z}}{\partial{y}} \cdot \frac{\partial{y}}{\partial{x}} xz=yzxy

  • 乘法法则形式不变 ∂ y ⊗ z ∂ x = y ⊗ ∂ z ∂ x + z ⊗ ∂ y ∂ x \frac{\partial{y \otimes z}}{\partial{x}} = y \otimes \frac{\partial{z}}{\partial{x}} + z \otimes \frac{\partial{y}}{\partial{x}} xyz=yxz+zxy

    • 向量内积 ∂ y T z ∂ x = y T ⋅ ∂ z ∂ x + z T ⋅ ∂ y ∂ x \frac{\partial{y^Tz}}{\partial{x}} = y^T \cdot \frac{\partial{z}}{\partial{x}} + z^T \cdot \frac{\partial{y}}{\partial{x}} xyTz=yTxz+zTxy
    • 矩阵乘积(A 与 x 无关) ∂ A y ∂ x = A ⋅ ∂ y ∂ x \frac{\partial{Ay}}{\partial{x}} = A \cdot \frac{\partial{y}}{\partial{x}} xAy=Axy
    • 向量数乘(y 或 z 为标量) ∂ y z ∂ x = y ⋅ ∂ z ∂ x + z ⋅ ∂ y ∂ x \frac{\partial{yz}}{\partial{x}} = y \cdot \frac{\partial{z}}{\partial{x}} + z \cdot \frac{\partial{y}}{\partial{x}} xyz=yxz+zxy

∑ i = 1 n i 2 = n ( n + 1 ) ( 2 n + 1 ) 6 \sum_{i=1}^n i^2 = \frac{n(n+1)(2n+1)}{6} i=1ni2=6n(n+1)(2n+1)

1. 表示法

  • A , X , Y \mathbf{A}, \mathbf{X}, \mathbf{Y} A,X,Y 等:粗体的大写字母,表示一个矩阵
  • a , x , y \mathbf a, \mathbf x, \mathbf y a,x,y 等:粗体的小写字母,表示一个向量;
  • a , x , y a, x, y a,x,y 等:斜体的小写字母,表示一个标量;
  • X T \mathbf X^T XT:表示矩阵 X \mathbf X X 的转置;
  • X H \mathbf X^H XH:表示矩阵 X \mathbf X X 的共轭转置;
  • ∣ X ∣ | \mathbf X | X:表示方阵 X \mathbf X X 的行列式;
  • ∣ ∣ x ∣ ∣ || \mathbf x || ∣∣x∣∣:表示向量 x \mathbf x x 的范数;
  • I \mathbf I I:表示单位矩阵。

2. 向量微分

2.1 向量-标量

列向量函数 y = [ y 1 y 2 ⋯ y m ] T \mathbf y = \begin{bmatrix} y_1 & y_2 & \cdots & y_m \end{bmatrix}^T y=[y1y2ym]T 对标量 x x x 的导数称为 y \mathbf y y 的切向量,可以以 分子记法 表示为 ∂ y ∂ x = [ ∂ y 1 ∂ x ∂ y 2 ∂ x ⋮ ∂ y m ∂ x ] m × 1 \frac{\partial \mathbf y}{\partial x} = \begin{bmatrix} \frac{\partial y_1}{\partial x} \newline \frac{\partial y_2}{\partial x} \newline \vdots \newline \frac{\partial y_m}{\partial x}\end{bmatrix}_{m \times 1} xy= xy1xy2xym m×1

若以 分母记法 则可以表示为 ∂ y ∂ x = [ ∂ y 1 ∂ x ∂ y 2 ∂ x ⋯ ∂ y m ∂ x ] 1 × m \frac{\partial \mathbf y}{\partial x} = \begin{bmatrix} \frac{\partial y_1}{\partial x} & \frac{\partial y_2}{\partial x} & \cdots & \frac{\partial y_m}{\partial x}\end{bmatrix}_{1 \times m} xy=[xy1xy2xym]1×m

2.2 标量-向量

标量函数 y y y 对列向量 x = [ x 1 x 2 ⋯ x n ] T \mathbf x = \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix}^T x=[x1x2xn]T 的导数可以以 分子记法 表示为 ∂ y ∂ x = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋯ ∂ y ∂ x n ] 1 × n \frac{\partial y}{\partial \mathbf x} = \begin{bmatrix} \frac{\partial y}{\partial x_1} & \frac{\partial y}{\partial x_2} & \cdots & \frac{\partial y}{\partial x_n}\end{bmatrix}_{1 \times n} xy=[x1yx2yxny]1×n

若以 分母记法 则可以表示为 ∂ y ∂ x = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋮ ∂ y ∂ x n ] n × 1 \frac{\partial y}{\partial \mathbf x} = \begin{bmatrix} \frac{\partial y}{\partial x_1} \newline \frac{\partial y}{\partial x_2} \newline \vdots \newline \frac{\partial y}{\partial x_n}\end{bmatrix}_{n \times 1} xy= x1yx2yxny n×1

2.3 向量-向量

列向量函数 y = [ y 1 y 2 ⋯ y m ] T \mathbf y = \begin{bmatrix} y_1 & y_2 & \cdots & y_m \end{bmatrix}^T y=[y1y2ym]T 对列向量 x = [ x 1 x 2 ⋯ x n ] T \mathbf x = \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix}^T x=[x1x2xn]T 的导数可以以 分子记法 表示为
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 1 ∂ x 2 ⋯ ∂ y 1 ∂ x n ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 ⋯ ∂ y 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ y m ∂ x 1 ∂ y m ∂ x 2 ⋯ ∂ y m ∂ x n ] m × n \frac{\partial \mathbf y}{\partial \mathbf x} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n} \newline \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n} \newline \vdots & \vdots & \ddots & \vdots \newline \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n} \newline\end{bmatrix}_{m \times n} xy= x1y1x1y2x1ymx2y1x2y2x2ymxny1xny2xnym m×n

若以 分母记法 则可以表示为
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 2 ∂ x 1 ⋯ ∂ y m ∂ x 1 ∂ y 1 ∂ x 1 ∂ y 2 ∂ x 1 ⋯ ∂ y m ∂ x 1 ⋮ ⋮ ⋱ ⋮ ∂ y 1 ∂ x 1 ∂ y 2 ∂ x 1 ⋯ ∂ y m ∂ x 1 ] n × m \frac{\partial \mathbf y}{\partial \mathbf x} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} &\cdots & \frac{\partial y_m}{\partial x_1} \newline \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} & \cdots &\frac{\partial y_m}{\partial x_1} \newline \vdots &\vdots & \ddots & \vdots \newline \frac{\partial y_1}{\partial x_1} &\frac{\partial y_2}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_1} \newline\end{bmatrix}_{n \times m} xy= x1y1x1y1x1y1x1y2x1y2x1y2x1ymx1ymx1ym n×m

3. 矩阵微分

1. 矩阵-标量

形状为 m × n m \times n m×n 的矩阵函数 Y \mathbf Y Y 对标量 x x x 的导数称为 Y \mathbf Y Y 的切矩阵,可以以 分子记法 表示为
∂ Y ∂ x = [ ∂ y 11 ∂ x ∂ y 12 ∂ x ⋯ ∂ y 1 n ∂ x ∂ y 21 ∂ x ∂ y 22 ∂ x ⋯ ∂ y 2 n ∂ x ⋮ ⋮ ⋱ ⋮ ∂ y m 1 ∂ x ∂ y m 2 ∂ x ⋯ ∂ y m n ∂ x ] m × n \frac{\partial \mathbf Y}{\partial x} = \begin{bmatrix} \frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} & \cdots & \frac{\partial y_{1n}}{\partial x} \newline \frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \cdots & \frac{\partial y_{2n}}{\partial x} \newline \vdots & \vdots & \ddots & \vdots \newline \frac{\partial y_{m1}}{\partial x} & \frac{\partial y_{m2}}{\partial x} & \cdots & \frac{\partial y_{mn}}{\partial x} \newline\end{bmatrix}_{m \times n} xY= xy11xy21xym1xy12xy22xym2xy1nxy2nxymn m×n

2. 标量-矩阵

标量函数 y y y 对形状为 p × q p \times q p×q 的矩阵 X \mathbf X X 的导数可以 分子记法 表示为

∂ y ∂ X = [ ∂ y ∂ x 11 ∂ y ∂ x 21 ⋯ ∂ y ∂ x p 1 ∂ y ∂ x 12 ∂ y ∂ x 22 ⋯ ∂ y ∂ x p 2 ⋮ ⋮ ⋱ ⋮ ∂ y ∂ x 1 q ∂ y ∂ x 2 q ⋯ ∂ y ∂ x p q ] q × p \frac{\partial y}{\partial \mathbf X} = \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \cdots & \frac{\partial y}{\partial x_{p1}} \newline \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{p2}} \newline \vdots & \vdots & \ddots & \vdots \newline \frac{\partial y}{\partial x_{1q}} & \frac{\partial y}{\partial x_{2q}} & \cdots & \frac{\partial y}{\partial x_{pq}} \newline\end{bmatrix}_{q \times p} Xy= x11yx12yx1qyx21yx22yx2qyxp1yxp2yxpqy q×p
若以 分母记法 则可以表示为
∂ y ∂ X = [ ∂ y ∂ x 11 ∂ y ∂ x 12 ⋯ ∂ y ∂ x 1 q ∂ y ∂ x 21 ∂ y ∂ x 22 ⋯ ∂ y ∂ x 2 q ⋮ ⋮ ⋱ ⋮ ∂ y ∂ x p 1 ∂ y ∂ x p 2 ⋯ ∂ y ∂ x p q ] p × q \frac{\partial y}{\partial \mathbf X} = \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}} & \cdots & \frac{\partial y}{\partial x_{1q}} \newline \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{2q}} \newline \vdots & \vdots & \ddots & \vdots \newline \frac{\partial y}{\partial x_{p1}} & \frac{\partial y}{\partial x_{p2}} & \cdots & \frac{\partial y}{\partial x_{pq}} \newline\end{bmatrix}_{p \times q} Xy= x11yx21yxp1yx12yx22yxp2yx1qyx2qyxpqy p×q

4. 恒等式

以下各式中,无特别备注,默认被求导的复合函数的各因式皆不是求导变量的函数。

4.1. 向量-向量

表达式分子记法分母记法备注
∂ a ∂ x = \frac{\partial \mathbf a}{\partial \mathbf x} = xa= 0 \mathbf 0 0 0 \mathbf 0 0
∂ x ∂ x = \frac{\partial \mathbf x}{\partial \mathbf x} = xx= I \mathbf I I I \mathbf I I
∂ A x ∂ x = \frac{\partial \mathbf A \mathbf x}{\partial \mathbf x} = xAx= A \mathbf A A A T \mathbf A^T AT
∂ x T A ∂ x = \frac{\partial \mathbf x^T \mathbf A}{\partial \mathbf x} = xxTA= A T \mathbf A^T AT A \mathbf A A
∂ a u ∂ x = \frac{\partial a \mathbf u}{\partial \mathbf x} = xau= a ∂ u ∂ x a \frac{\partial \mathbf u}{\partial x} axu a ∂ u ∂ x a \frac{\partial \mathbf u}{\partial x} axu u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x)
∂ v u ∂ x = \frac{\partial v \mathbf u}{\partial \mathbf x} = xvu= v ∂ u ∂ x + u ∂ v ∂ x v \frac{\partial \mathbf u}{\partial \mathbf x} + \mathbf u \frac{\partial v}{\partial \mathbf x} vxu+uxv v ∂ u ∂ x + ∂ v ∂ x u T v \frac{\partial \mathbf u}{\partial \mathbf x} + \frac{\partial v}{\partial \mathbf x} \mathbf u^T vxu+xvuT v = v ( x ) , u = u ( x ) v = v(\mathbf x), \mathbf u = \mathbf u(\mathbf x) v=v(x),u=u(x)
∂ A u ∂ x = \frac{\partial \mathbf A \mathbf u}{\partial \mathbf x} = xAu= A ∂ u ∂ x \mathbf A \frac{\partial \mathbf u}{\partial \mathbf x} Axu ∂ u ∂ x A T \frac{\partial \mathbf u}{\partial \mathbf x} \mathbf A^T xuAT u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x)
∂ ( u + v ) ∂ x = \frac{\partial (\mathbf u + \mathbf v)}{\partial \mathbf x} = x(u+v)= ∂ u ∂ x + ∂ v ∂ x \frac{\partial \mathbf u}{\partial \mathbf x} + \frac{\partial \mathbf v}{\partial \mathbf x} xu+xv ∂ u ∂ x + ∂ v ∂ x \frac{\partial \mathbf u}{\partial \mathbf x} + \frac{\partial \mathbf v}{\partial \mathbf x} xu+xv u = u ( x ) , v = v ( x ) \mathbf u = \mathbf u(\mathbf x), \mathbf v = \mathbf v(\mathbf x) u=u(x),v=v(x)
∂ f ( g ( u ) ) ∂ x = \frac{\partial \mathbf f(\mathbf g(\mathbf u))}{\partial \mathbf x} = xf(g(u))= ∂ f ( g ) ∂ g ∂ g ( u ) ∂ u ∂ u ∂ x \frac{\partial \mathbf f(\mathbf g)}{\partial \mathbf g} \frac{\partial \mathbf g(\mathbf u)}{\partial \mathbf u} \frac{\partial \mathbf u}{\partial \mathbf x} gf(g)ug(u)xu ∂ u ∂ x ∂ g ( u ) ∂ u ∂ f ( g ) ∂ g \frac{\partial \mathbf u}{\partial \mathbf x} \frac{\partial \mathbf g(\mathbf u)}{\partial \mathbf u} \frac{\partial \mathbf f(\mathbf g)}{\partial \mathbf g} xuug(u)gf(g) u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x)

4.2. 标量-向量

表达式分子记法分母记法备注
∂ a ∂ x = \frac{\partial a}{\partial \mathbf x} = xa= 0 T \mathbf 0^T 0T 0 \mathbf 0 0
∂ a u ∂ x = \frac{\partial a u}{\partial \mathbf x} = xau= a ∂ u ∂ x a \frac{\partial \mathbf u}{\partial \mathbf x} axu a ∂ u ∂ x a \frac{\partial \mathbf u}{\partial \mathbf x} axu u = u ( x ) u = u(\mathbf x) u=u(x)
∂ ( u + v ) ∂ x = \frac{\partial (u + v)}{\partial \mathbf x} = x(u+v)= ∂ u ∂ x + ∂ v ∂ x \frac{\partial u}{\partial \mathbf x} + \frac{\partial v}{\partial \mathbf x} xu+xv ∂ u ∂ x + ∂ v ∂ x \frac{\partial u}{\partial \mathbf x} + \frac{\partial v}{\partial \mathbf x} xu+xv u = u ( x ) , v = v ( x ) u = u(\mathbf x), v = v(\mathbf x) u=u(x),v=v(x)
∂ u v ∂ x = \frac{\partial u v}{\partial \mathbf x} = xuv= u ∂ v ∂ x + v ∂ u ∂ x u \frac{\partial v}{\partial \mathbf x} + v \frac{\partial u}{\partial \mathbf x} uxv+vxu u ∂ v ∂ x + v ∂ u ∂ x u \frac{\partial v}{\partial \mathbf x} + v \frac{\partial u}{\partial \mathbf x} uxv+vxu u = u ( x ) , v = v ( x ) u = u(\mathbf x), v = v(\mathbf x) u=u(x),v=v(x)
∂ f ( g ( u ) ) ∂ x = \frac{\partial f(g(u))}{\partial \mathbf x} = xf(g(u))= ∂ f ( g ) ∂ g ∂ g ( u ) ∂ u ∂ u ∂ x \frac{\partial f(g)}{\partial g} \frac{\partial g(u)}{\partial u} \frac{\partial u}{\partial \mathbf x} gf(g)ug(u)xu ∂ f ( g ) ∂ g ∂ g ( u ) ∂ u ∂ u ∂ x \frac{\partial f(g)}{\partial g} \frac{\partial g(u)}{\partial u} \frac{\partial u}{\partial \mathbf x} gf(g)ug(u)xu u = u ( x ) u = u(\mathbf x) u=u(x)
∂ ( u ⋅ v ) ∂ x = ∂ u T v ∂ x = \frac{\partial (\mathbf u \cdot \mathbf v)}{\partial \mathbf x} = \frac{\partial \mathbf u^T \mathbf v}{\partial \mathbf x} = x(uv)=xuTv= u T ∂ v ∂ x + v T ∂ u ∂ x \mathbf u^T \frac{\partial \mathbf v}{\partial \mathbf x} + \mathbf v^T \frac{\partial \mathbf u}{\partial \mathbf x} uTxv+vTxu ∂ v ∂ x u + ∂ u ∂ x v \frac{\partial \mathbf v}{\partial \mathbf x} \mathbf u + \frac{\partial \mathbf u}{\partial \mathbf x} \mathbf v xvu+xuv u = u ( x ) , v = v ( x ) \mathbf u = \mathbf u(\mathbf x), \mathbf v = \mathbf v(\mathbf x) u=u(x),v=v(x)
∂ ( u ⋅ A v ) ∂ x = ∂ u T A v ∂ x = \frac{\partial (\mathbf u \cdot \mathbf A \mathbf v)}{\partial \mathbf x} = \frac{\partial \mathbf u^T \mathbf A \mathbf v}{\partial \mathbf x} = x(uAv)=xuTAv= u T A ∂ v ∂ x + v T A T ∂ u ∂ x \mathbf u^T \mathbf A \frac{\partial \mathbf v}{\partial \mathbf x} + \mathbf v^T \mathbf A^T \frac{\partial \mathbf u}{\partial \mathbf x} uTAxv+vTATxu ∂ v ∂ x A T u + ∂ u ∂ x A v \frac{\partial \mathbf v}{\partial \mathbf x} \mathbf A^T \mathbf u + \frac{\partial \mathbf u}{\partial \mathbf x} \mathbf A \mathbf v xvATu+xuAv u = u ( x ) , v = v ( x ) \mathbf u = \mathbf u(\mathbf x), \mathbf v = \mathbf v(\mathbf x) u=u(x),v=v(x)
∂ ( a ⋅ u ) ∂ x = ∂ a T u ∂ x = \frac{\partial (\mathbf a \cdot \mathbf u)}{\partial \mathbf x} = \frac{\partial \mathbf a^T \mathbf u}{\partial \mathbf x} = x(au)=xaTu= a T ∂ u ∂ x \mathbf a^T \frac{\partial \mathbf u}{\partial \mathbf x} aTxu ∂ u ∂ x a \frac{\partial \mathbf u}{\partial \mathbf x} \mathbf a xua u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x)
∂ b T A x ∂ x = \frac{\partial \mathbf b^T \mathbf A \mathbf x}{\partial \mathbf x} = xbTAx= b T A \mathbf b^T \mathbf A bTA A T b \mathbf A^T \mathbf b ATb
∂ x T A x ∂ x = \frac{\partial \mathbf x^T \mathbf A \mathbf x}{\partial \mathbf x} = xxTAx= x T ( A + A T ) \mathbf x^T (\mathbf A + \mathbf A^T) xT(A+AT) ( A + A T ) x (\mathbf A + \mathbf A^T) \mathbf x (A+AT)x
∂ 2 x T A x ∂ x ∂ x T = \frac{\partial^2 \mathbf x^T \mathbf A \mathbf x}{\partial \mathbf x \partial \mathbf x^T} = xxT2xTAx= A + A T \mathbf A + \mathbf A^T A+AT A + A T \mathbf A + \mathbf A^T A+AT
∂ a T x x T b ∂ x = \frac{\partial \mathbf a^T \mathbf x \mathbf x^T \mathbf b}{\partial \mathbf x} = xaTxxTb= x T ( a b T + b a T ) \mathbf x^T (\mathbf a \mathbf b^T + \mathbf b \mathbf a^T) xT(abT+baT) ( a b T + b a T ) x (\mathbf a \mathbf b^T + \mathbf b \mathbf a^T) \mathbf x (abT+baT)x
∂ ( A x + b ) T C ( D x + e ) ∂ x = \frac{\partial (\mathbf A \mathbf x + \mathbf b)^T \mathbf C (\mathbf D \mathbf x + \mathbf e)}{\partial \mathbf x} = x(Ax+b)TC(Dx+e)= ( A x + b ) T C D + ( D x + e ) T C T A (\mathbf A \mathbf x + \mathbf b)^T \mathbf C \mathbf D + (\mathbf D \mathbf x + \mathbf e)^T \mathbf C^T \mathbf A (Ax+b)TCD+(Dx+e)TCTA D T C T ( A x + b ) + A T C ( D x + e ) T \mathbf D^T \mathbf C^T(\mathbf A \mathbf x + \mathbf b) + \mathbf A^T \mathbf C (\mathbf D \mathbf x + \mathbf e)^T DTCT(Ax+b)+ATC(Dx+e)T
∂ ∣ ∣ x ∣ ∣ 2 ∂ x = ∂ ( x ⋅ x ) ∂ x = \frac{\partial || \mathbf x ||^2}{\partial \mathbf x} = \frac{\partial (\mathbf x \cdot \mathbf x)}{\partial \mathbf x} = x∣∣x2=x(xx)= 2 x T 2 \mathbf x^T 2xT 2 x 2 \mathbf x 2x
∂ ∣ ∣ x − a ∣ ∣ ∂ x = \frac{\partial || \mathbf x - \mathbf a || }{\partial \mathbf x} = x∣∣xa∣∣= ( x − a ) T ∣ ∣ x − a ∣ ∣ \frac{(\mathbf x - \mathbf a)^T}{ || \mathbf x - \mathbf a || } ∣∣xa∣∣(xa)T ( x − a ) ∣ ∣ x − a ∣ ∣ \frac{(\mathbf x - \mathbf a)}{ || \mathbf x - \mathbf a || } ∣∣xa∣∣(xa)

4.3. 向量-标量

表达式分子记法分母记法备注
∂ a ∂ x = \frac{\partial \mathbf a}{\partial x} = xa= 0 \mathbf 0 0 0 \mathbf 0 0
∂ a u ∂ x = \frac{\partial a \mathbf u}{\partial x} = xau= a ∂ u ∂ x a \frac{\partial \mathbf u}{\partial x} axu a ∂ u ∂ x a \frac{\partial \mathbf u}{\partial x} axu u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x)
∂ A u ∂ x = \frac{\partial \mathbf A \mathbf u}{\partial x} = xAu= A ∂ u ∂ x \mathbf A \frac{\partial \mathbf u}{\partial x} Axu ∂ u ∂ x A T \frac{\partial \mathbf u}{\partial x} \mathbf A^T xuAT u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x)
∂ u T ∂ x = \frac{\partial \mathbf u^T}{\partial x} = xuT= ( ∂ u ∂ x ) T \left( \frac{\partial \mathbf u}{\partial x} \right)^T (xu)T ( ∂ u ∂ x ) T \left( \frac{\partial \mathbf u}{\partial x} \right)^T (xu)T u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x)
∂ ( u + v ) ∂ x = \frac{\partial (\mathbf u + \mathbf v)}{\partial x} = x(u+v)= ∂ u ∂ x + ∂ v ∂ x \frac{\partial \mathbf u}{\partial x} + \frac{\partial \mathbf v}{\partial x} xu+xv ∂ u ∂ x + ∂ v ∂ x \frac{\partial \mathbf u}{\partial x} + \frac{\partial \mathbf v}{\partial x} xu+xv u = u ( x ) , v = v ( x ) \mathbf u = \mathbf u(\mathbf x), \mathbf v = \mathbf v(\mathbf x) u=u(x),v=v(x)
∂ ( u T × v ) ∂ x = \frac{\partial (\mathbf u^T \times \mathbf v)}{\partial x} = x(uT×v)= ( ∂ u ∂ x ) T × v + u T × ∂ v ∂ x \left( \frac{\partial \mathbf u}{\partial x} \right)^T \times \mathbf v + \mathbf u^T \times \frac{\partial \mathbf v}{\partial x} (xu)T×v+uT×xv ∂ u ∂ x × v + u T × ( ∂ v ∂ x ) T \frac{\partial \mathbf u}{\partial x} \times \mathbf v + \mathbf u^T \times \left( \frac{\partial \mathbf v}{\partial x} \right)^T xu×v+uT×(xv)T u = u ( x ) , v = v ( x ) \mathbf u = \mathbf u(\mathbf x), \mathbf v = \mathbf v(\mathbf x) u=u(x),v=v(x)
∂ f ( g ( u ) ) ∂ x = \frac{\partial \mathbf f(\mathbf g(\mathbf u))}{\partial x} = xf(g(u))= ∂ f ( g ) ∂ g ∂ g ( u ) ∂ u ∂ u ∂ x \frac{\partial \mathbf f(\mathbf g)}{\partial \mathbf g} \frac{\partial \mathbf g(\mathbf u)}{\partial \mathbf u} \frac{\partial \mathbf u}{\partial x} gf(g)ug(u)xu ∂ u ∂ x ∂ g ( u ) ∂ u ∂ f ( g ) ∂ g \frac{\partial \mathbf u}{\partial x}\frac{\partial \mathbf g(\mathbf u)}{\partial \mathbf u} \frac{\partial \mathbf f(\mathbf g)}{\partial \mathbf g} xuug(u)gf(g) u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x)
∂ ( U × v ) ∂ x = \frac{\partial (\mathbf U \times \mathbf v)}{\partial x} = x(U×v)= ∂ U ∂ x × v + U × ∂ v ∂ x \frac{\partial \mathbf U}{\partial x} \times \mathbf v + \mathbf U \times \frac{\partial \mathbf v}{\partial x} xU×v+U×xv v T × ∂ U ∂ x + ∂ v ∂ x × U T \mathbf v^T \times \frac{\partial \mathbf U}{\partial x} + \frac{\partial \mathbf v}{\partial x} \times \mathbf U^T vT×xU+xv×UT U = U ( x ) , v = v ( x ) \mathbf U = \mathbf U(\mathbf x), \mathbf v = \mathbf v(\mathbf x) U=U(x),v=v(x)

4.4. 标量-矩阵

表达式分子记法分母记法备注
∂ a ∂ X = \frac{\partial a}{\partial \mathbf X} = Xa= 0 T \mathbf 0^T 0T 0 \mathbf 0 0
∂ a u ∂ X = \frac{\partial a u}{\partial \mathbf X} = Xau= a ∂ u ∂ X a \frac{\partial u}{\partial \mathbf X} aXu a ∂ u ∂ X a \frac{\partial u}{\partial \mathbf X} aXu u = u ( X ) u = u(\mathbf X) u=u(X)
∂ ( u + v ) ∂ X = \frac{\partial (u + v)}{\partial \mathbf X} = X(u+v)= ∂ u ∂ X + ∂ v ∂ X \frac{\partial u}{\partial \mathbf X} + \frac{\partial v}{\partial \mathbf X} Xu+Xv ∂ u ∂ X + ∂ v ∂ X \frac{\partial u}{\partial \mathbf X} + \frac{\partial v}{\partial \mathbf X} Xu+Xv u = u ( X ) , v = v ( X ) u = u(\mathbf X), v = v(\mathbf X) u=u(X),v=v(X)
∂ u v ∂ X = \frac{\partial u v}{\partial \mathbf X} = Xuv= u ∂ v ∂ X + v ∂ u ∂ X u \frac{\partial v}{\partial \mathbf X} + v \frac{\partial u}{\partial \mathbf X} uXv+vXu u ∂ v ∂ X + v ∂ u ∂ X u \frac{\partial v}{\partial \mathbf X} + v \frac{\partial u}{\partial \mathbf X} uXv+vXu u = u ( X ) , v = v ( X ) u = u(\mathbf X), v = v(\mathbf X) u=u(X),v=v(X)
∂ f ( g ( u ) ) ∂ X = \frac{\partial f(g(u))}{\partial \mathbf X} = Xf(g(u))= ∂ f ( g ) ∂ g ∂ g ( u ) ∂ u ∂ u ∂ X \frac{\partial f(g)}{\partial g} \frac{\partial g(u)}{\partial u} \frac{\partial u}{\partial \mathbf X} gf(g)ug(u)Xu ∂ f ( g ) ∂ g ∂ g ( u ) ∂ u ∂ u ∂ X \frac{\partial f(g)}{\partial g} \frac{\partial g(u)}{\partial u} \frac{\partial u}{\partial \mathbf X} gf(g)ug(u)Xu u = u ( X ) u = u(\mathbf X) u=u(X)
∂ a T X b ∂ X = \frac{\partial \mathbf a^T \mathbf X \mathbf b}{\partial \mathbf X} = XaTXb= b a T \mathbf b \mathbf a^T baT a b T \mathbf a \mathbf b^T abT
∂ a T X T b ∂ X = \frac{\partial \mathbf a^T \mathbf X^T \mathbf b}{\partial \mathbf X} = XaTXTb= a b T \mathbf a \mathbf b^T abT b a T \mathbf b \mathbf a^T baT
∂ ( X a + b ) T C ( X a + b ) ∂ X = \frac{\partial (\mathbf X \mathbf a + \mathbf b)^T \mathbf C (\mathbf X \mathbf a + \mathbf b)}{\partial \mathbf X} = X(Xa+b)TC(Xa+b)= [ ( C + C T ) ( X a + b ) a T ] T [ (\mathbf C + \mathbf C^T) (\mathbf X \mathbf a + \mathbf b) \mathbf a^T ]^T [(C+CT)(Xa+b)aT]T ( C + C T ) ( X a + b ) a T (\mathbf C + \mathbf C^T) (\mathbf X \mathbf a + \mathbf b) \mathbf a^T (C+CT)(Xa+b)aT
∂ ( X a ) T C ( X b ) ∂ X = \frac{\partial (\mathbf X \mathbf a)^T \mathbf C (\mathbf X \mathbf b)}{\partial \mathbf X} = X(Xa)TC(Xb)= ( C X b a T + C T X a b T ) T ( \mathbf C \mathbf X \mathbf b \mathbf a^T + \mathbf C^T \mathbf X \mathbf a \mathbf b^T )^T (CXbaT+CTXabT)T C X b a T + C T X a b T \mathbf C \mathbf X \mathbf b \mathbf a^T + \mathbf C^T \mathbf X \mathbf a \mathbf b^T CXbaT+CTXabT
∂ ∣ X ∣ ∂ X = \frac{\partial | \mathbf X | }{\partial \mathbf X} = XX= ∣ X ∣ X − 1 | \mathbf X | \mathbf X^{ - 1} XX1 ∣ X ∣ ( X − 1 ) T | \mathbf X | (\mathbf X^{ - 1})^T X(X1)T
∂ ln ⁡ ∣ a X ∣ ∂ X = \frac{\partial \ln | a \mathbf X | }{\partial \mathbf X} = XlnaX= X − 1 \mathbf X^{ - 1} X1 ( X − 1 ) T (\mathbf X^{ - 1})^T (X1)T
∂ ∣ A X B ∣ ∂ X = \frac{ \partial | \mathbf A \mathbf X \mathbf B | }{\partial \mathbf X} = XAXB= ∣ A X B ∣ X − 1 | \mathbf A \mathbf X \mathbf B | \mathbf X^{ - 1} AXBX1 ∣ A X B ∣ ( X − 1 ) T | \mathbf A \mathbf X \mathbf B | (\mathbf X^{ - 1})^T AXB(X1)T
∂ ∣ X n ∣ ∂ X = \frac{ \partial | \mathbf X^n | }{\partial \mathbf X} = XXn= n ∣ X n ∣ X − 1 n | \mathbf X^n | \mathbf X^{ - 1} nXnX1 n ∣ X n ∣ ( X − 1 ) T n | \mathbf X^n | (\mathbf X^{ - 1})^T nXn(X1)T
∂ ln ⁡ ∣ X T X ∣ ∂ X = \frac{ \partial \ln | \mathbf X^T \mathbf X | }{\partial \mathbf X} = XlnXTX= 2 X + 2 \mathbf X^+ 2X+ 2 ( X + ) T 2 (\mathbf X^+)^T 2(X+)T X + \mathbf X^+ X+ X \mathbf X X 的广义逆
∂ ln ⁡ ∣ X T X ∣ ∂ X + = \frac{\partial \ln | \mathbf X^T \mathbf X | }{\partial \mathbf X^+} = X+lnXTX= − 2 X - 2 \mathbf X 2X − 2 X T - 2 \mathbf X^T 2XT X + \mathbf X^+ X+ X \mathbf X X 的广义逆
∂ ∣ X T A X ∣ ∂ X = \frac{\partial | \mathbf X^T \mathbf A \mathbf X | }{\partial \mathbf X} = XXTAX= 2 ∣ X T A X ∣ X − 1 = 2 ∣ X T ∣ ∣ A ∣ ∣ X ∣ X − 1 2 | \mathbf X^T \mathbf A \mathbf X | \mathbf X^{ - 1} = 2 | \mathbf X^T | | \mathbf A | | \mathbf X | \mathbf X^{ - 1} 2∣XTAXX1=2∣XT∣∣A∣∣XX1 2 ∣ X T A X ∣ ( X − 1 ) T 2 | \mathbf X^T \mathbf A \mathbf X | (\mathbf X^{ - 1})^T 2∣XTAX(X1)T X \mathbf X X 为方阵且可逆
∂ ∣ X T A X ∣ ∂ X = \frac{\partial | \mathbf X^T \mathbf A \mathbf X | }{\partial \mathbf X} = XXTAX= 2 ∣ X T A X ∣ ( X T A T X ) − 1 X T A T 2 | \mathbf X^T \mathbf A \mathbf X | ( \mathbf X^T \mathbf A^T \mathbf X )^{ - 1} \mathbf X^T \mathbf A^T 2∣XTAX(XTATX)1XTAT 2 ∣ X T A X ∣ A X ( X T A X ) − 1 2 | \mathbf X^T \mathbf A \mathbf X | \mathbf A \mathbf X ( \mathbf X^T \mathbf A \mathbf X )^{ - 1} 2∣XTAXAX(XTAX)1 A \mathbf A A 对称
∂ ∣ X T A X ∣ ∂ X = \frac{\partial | \mathbf X^T \mathbf A \mathbf X | }{\partial \mathbf X} = XXTAX= ∣ X T A X ∣ [ ( X T A X ) − 1 X T A + ( X T A T X ) − 1 X T A T ] | \mathbf X^T \mathbf A \mathbf X | [ ( \mathbf X^T \mathbf A \mathbf X)^{ - 1} \mathbf X^T \mathbf A + ( \mathbf X^T \mathbf A^T \mathbf X )^{ - 1} \mathbf X^T \mathbf A^T ] XTAX[(XTAX)1XTA+(XTATX)1XTAT] ∣ X T A X ∣ [ A X ( X T A X ) − 1 + A T X ( X T A T X ) − 1 ] | \mathbf X^T \mathbf A \mathbf X | [ \mathbf A \mathbf X ( \mathbf X^T \mathbf A \mathbf X )^{ - 1} + \mathbf A^T \mathbf X ( \mathbf X^T \mathbf A^T \mathbf X )^{ - 1} ] XTAX[AX(XTAX)1+ATX(XTATX)1]

4.5. 矩阵-标量

表达式分子记法备注
∂ a U ∂ x = \frac{\partial a \mathbf U}{\partial x} = xaU= a ∂ U ∂ x a \frac{\partial \mathbf U}{\partial x} axU U = U ( x ) \mathbf U = \mathbf U(x) U=U(x)
∂ A U B ∂ x = \frac{\partial \mathbf A \mathbf U \mathbf B}{\partial x} = xAUB= A ∂ U ∂ x B \mathbf A \frac{\partial \mathbf U}{\partial x} \mathbf B AxUB U = U ( x ) \mathbf U = \mathbf U(x) U=U(x)
∂ ( U + V ) ∂ x = \frac{\partial (\mathbf U + \mathbf V)}{\partial x} = x(U+V)= ∂ U ∂ x + ∂ V ∂ x \frac{\partial \mathbf U}{\partial x} + \frac{\partial \mathbf V}{\partial x} xU+xV U = U ( x ) , V = V ( x ) \mathbf U = \mathbf U(x), \mathbf V = \mathbf V(x) U=U(x),V=V(x)
∂ ( U V ) ∂ x = \frac{\partial (\mathbf U \mathbf V)}{\partial x} = x(UV)= U ∂ V ∂ x + ∂ U ∂ x V \mathbf U \frac{\partial \mathbf V}{\partial x} + \frac{\partial \mathbf U}{\partial x} \mathbf V UxV+xUV U = U ( x ) , V = V ( x ) \mathbf U = \mathbf U(x), \mathbf V = \mathbf V(x) U=U(x),V=V(x)
∂ ( U ⊗ V ) ∂ x = \frac{\partial (\mathbf U \otimes \mathbf V)}{\partial x} = x(UV)= U ⊗ ∂ V ∂ x + ∂ U ∂ x ⊗ V \mathbf U \otimes \frac{\partial \mathbf V}{\partial x} + \frac{\partial \mathbf U}{\partial x} \otimes \mathbf V UxV+xUV U = U ( x ) , V = V ( x ) \mathbf U = \mathbf U(x), \mathbf V = \mathbf V(x) U=U(x),V=V(x) ⊗ \otimes 表示 Kronecker 乘积
∂ ( U ∘ V ) ∂ x = \frac{\partial (\mathbf U \circ \mathbf V)}{\partial x} = x(UV)= U ∘ ∂ V ∂ x + ∂ U ∂ x ∘ V \mathbf U \circ \frac{\partial \mathbf V}{\partial x} + \frac{\mathbf \partial U}{\partial x} \circ \mathbf V UxV+xUV U = U ( x ) , V = V ( x ) \mathbf U = \mathbf U(x), \mathbf V = \mathbf V(x) U=U(x),V=V(x) ∘ \circ 表示 Hadamard 乘积
∂ U − 1 ∂ x = \frac{\partial \mathbf U^{ - 1}}{\partial x} = xU1= − U − 1 ∂ U ∂ x U − 1 -\mathbf U^{ - 1} \frac{\partial \mathbf U}{\partial x} \mathbf U^{ - 1} U1xUU1 U = U ( x ) \mathbf U = \mathbf U(x) U=U(x)
∂ 2 U − 1 ∂ x ∂ y = \frac{\partial^2 \mathbf U^{ - 1}}{\partial x \partial y} = xy2U1= U − 1 ( ∂ U ∂ x U − 1 ∂ U ∂ y − ∂ 2 U ∂ x ∂ y + ∂ U ∂ y U − 1 ∂ U ∂ x ) U − 1 \mathbf U^{ - 1} \left( \frac{\partial \mathbf U}{\partial x} \mathbf U^{ - 1} \frac{\partial \mathbf U}{\partial y} - \frac{\partial^2 \mathbf U}{\partial x \partial y} + \frac{\partial \mathbf U}{\partial y} \mathbf U^{ - 1} \frac{\partial \mathbf U}{\partial x} \right) \mathbf U^{ - 1} U1(xUU1yUxy2U+yUU1xU)U1 U = U ( x , y ) \mathbf U = \mathbf U(x, y) U=U(x,y)
∂ g ( x A ) ∂ x = \frac{\partial g (x \mathbf A)}{\partial x} = xg(xA)= A g ′ ( x A ) = g ′ ( x A ) A \mathbf A g' (x \mathbf A) = g' (x \mathbf A) \mathbf A Ag(xA)=g(xA)A应为 Hadamard 乘积; g ( ⋅ ) g (\cdot) g() 为逐元函数,如下例
∂ e x A ∂ x = \frac{\partial e^{x \mathbf A}}{\partial x} = xexA= A e x A = e x A A \mathbf A e^{x \mathbf A} = e^{x \mathbf A} \mathbf A AexA=exAA

4.6. 矩阵-矩阵

vectorize 矩阵


二、矩阵分解

  • QR分解: M = Q R M = QR M=QR, Q正交,R上三角。
  • 奇异值分解(Singular Value Decomposition,SVD分解) M = U Σ V T M = UΣV^T M=UΣVT, U和V正交,Σ非负对角。
  • 特征分解(Eigendecomposition),又叫谱分解(Spectral decomposition) S = Q Λ Q − 1 S =QΛQ^{-1} S=QΛQ1, S对称,Q正交,Λ对角。
  • 极分解: M = Q S M = QS M=QS, Q正交,S对称半正定。
  • 科列斯基分解(Cholesky decomposition) A = L L ∗ \mathbf {A} =\mathbf {LL} ^{*} A=LL L \mathbf{L} L 下三角矩阵且所有对角元素均为正实数, L ∗ \mathbf {L} ^{*} L表示 L \mathbf {L} L 的共轭转置。每一个正定埃尔米特矩阵都有一个唯一的科列斯基分解
  • LU分解: A = L U A=LU A=LU,L下三角, U上三角

1. 科列斯基Cholesky分解

科列斯基分解主要被用于线性方程组 A x = b \mathbf {Ax} =\mathbf {b} Ax=b 的求解。如果 A A A 是对称正定的,我们可以先求出 A = L L T \mathbf {A} =\mathbf {LL} ^{\mathbf {T} } A=LLT,随后借向后替换法对 y y y 求解 L y = b \mathbf {Ly} =\mathbf {b} Ly=b,再以向前替换法对 x x x 求解 L T x = y \mathbf {L} ^{\mathbf {T} }\mathbf {x} =\mathbf {y} LTx=y即得最终解。
另一种可避免在计算 L L T \mathbf {LL} ^{\mathbf {T} } LLT时需要解平方根的方法就是计算 A = L D L T \mathbf {A} =\mathbf {LDL} ^{\mathrm {T} } A=LDLT,然后对 y y y 求解 L y = b \mathbf {Ly} =\mathbf {b} Ly=b,最后求解 D L T x = y \mathbf {DL} ^{\mathrm {T} }\mathbf {x} =\mathbf {y} DLTx=y
对于可以被改写成对称矩阵的线性方程组,科列斯基分解及其LDL变形是一个较高效率及较高数值稳定性的求解方法。相比之下,其效率几近为LU分解的两倍。对每一个正定矩阵,Cholesky分解都唯一存在
Cholesky分解在从多元正态分布取样时使用,对满足高斯分布的变量 x ∼ N ( μ , Σ ) x\sim \mathcal N(\mu,\Sigma) xN(μ,Σ) x ∼ μ + L N ( 0 , I ) x\sim \mu + L\mathcal N(0,I) xμ+LN(0,I),这里 Σ = L L T \Sigma = LL^T Σ=LLT 即我们的协方差矩阵的平方根。所以我们从 N ( 0 , I ) \mathcal N(0,I) N(0,I)采样即可.

2. LU分解

前提:可逆方阵A
将一个矩阵分解为一个下三角矩阵和一个上三角矩阵的乘积,有时需要再乘上一个置换矩阵。LU分解可以被视为高斯消元法的矩阵形式。在数值计算上,LU分解经常被用来解线性方程组、且在求逆矩阵计算行列式中都是一个关键的步骤。
numpy.linalg.solve()使用LU分解,避免计算逆矩阵时出现的精度损失

3. 特征分解(Eigendecomposition)

特征分解(Eigendecomposition),又称谱分解(Spectral decomposition)是将矩阵分解为由其特征值和特征向量表示的矩阵之积的方法。需要注意只有对可对角化矩阵才可以施以特征分解。
对方阵S, S = Q Λ Q − 1 S =QΛQ^{-1} S=QΛQ1 Q Q Q是其特征向量组成的方阵, Λ Λ Λ是特征值组成的对角矩阵。Q 中向量的长度都被 Q − 1 Q^{−1} Q1 抵消了

4. SVD奇异值分解

所有矩阵都有一种 SVD 方法,但不唯一,这使得其比特征分解(eigendecomposition)等其它方法更加稳定。np.linalg.pinv()使用SVD分解计算伪逆。
定义:矩阵的奇异值分解是指将一个非零的 m × n m\times n m×n实矩阵 A A A A ∈ R m × n A\in \mathbb{R} ^{m\times n} ARm×n,表示为以下三个实矩阵乘积的形式
A = U Σ V T A = U\Sigma V^T A=UΣVT
其中 U U U m m m阶正交矩阵, V V V n n n阶正交矩阵, Σ \Sigma Σ是由降序排列的非负的对角元素组成的 m × n m\times n m×n的矩形对角矩阵满足下面等式
U U T = I V V T = I Σ = diag ( σ 1 , σ 2 , … , σ p ) p = min ( m , n ) σ 1 ≥ σ 2 ≥ ⋯ ≥ σ p ≥ 0 \begin{equation}\begin{aligned} & UU^T = I\\ & VV^T = I\\ & \Sigma = \text{diag}(\sigma_1, \sigma_2, \dots, \sigma_p)\\ & p = \text{min}(m, n)\\ & \sigma_1 \ge\sigma_2\ge\dots\ge\sigma_p\ge 0 \end{aligned}\end{equation} UUT=IVVT=IΣ=diag(σ1,σ2,,σp)p=min(m,n)σ1σ2σp0
U Σ V T U\Sigma V^T UΣVT称为矩阵 A A A的奇异值分解, σ i \sigma_i σi称为矩阵 A A A的奇异值, U U U的列向量称为左奇异向量, V V V的列向量称为右奇异向量
在这里插入图片描述
几何解释:
在这里插入图片描述

  • 奇异值分解可以被用来计算矩阵的广义逆阵(伪逆)

若矩阵M的奇异值分解为 M = U Σ V T M = U\Sigma V^T M=UΣVT 那么 M M M的伪逆为
M + = V Σ + U T M^+ = V \Sigma^+ U^T M+=VΣ+UT
其中 Σ + \Sigma^+ Σ+ Σ \Sigma Σ的伪逆,是将 Σ \Sigma Σ主对角线上每个非零元素都求倒数之后再转置得到的。求伪逆通常可以用来求解最小二乘法问题。

  • SVD 的另一大常见应用是降维(主成分分析PCA中)。

https://muyi110.github.io/2019/%E6%B5%85%E8%B0%88%E5%A5%87%E5%BC%82%E5%80%BC%E5%88%86%E8%A7%A3-SVD/


三、矩阵种类

1.「正定矩阵」和「半正定矩阵」

案例:多元正态分布的协方差矩阵要求是半正定的

【定义1】 给定一个大小为 n × n n\times n n×n 的实对称矩阵 A A A,若对于任意长度为 n n n 的非零向量 x \boldsymbol{x} x,有 x T A x > 0 \boldsymbol{x}^TA\boldsymbol{x}>0 xTAx>0 恒成立,则矩阵 A A A是一个正定矩阵


【定义2】 给定一个大小为 n × n n\times n n×n 的实对称矩阵 A A A ,若对于任意长度为 n n n 的向量 x \boldsymbol{x} x ,有 x T A x ≥ 0 \boldsymbol{x}^TA\boldsymbol{x}\geq0 xTAx0 恒成立,则矩阵 A A A 是一个半正定矩阵

直观解释:
若给定任意一个正定矩阵 A ∈ R n × n A\in\mathbb{R}^{n\times n} ARn×n 和一个非零向量 x ∈ R n \boldsymbol{x}\in\mathbb{R}^{n} xRn ,则两者相乘得到的向量 y = A x ∈ R n \boldsymbol{y}=A\boldsymbol{x}\in\mathbb{R}^{n} y=AxRn 与向量 x \boldsymbol{x} x 的夹角恒小于 π 2 \frac{\pi}{2} 2π . (等价于: x T A x > 0 \boldsymbol{x}^TA\boldsymbol{x}>0 xTAx>0 .)
若给定任意一个半正定矩阵 A ∈ R n × n A\in\mathbb{R}^{n\times n} ARn×n 和一个向量 x ∈ R n \boldsymbol{x}\in\mathbb{R}^{n} xRn ,则两者相乘得到的向量 y = A x ∈ R n \boldsymbol{y}=A\boldsymbol{x}\in\mathbb{R}^{n} y=AxRn 与向量 x \boldsymbol{x} x 的夹角恒小于或等于 π 2 \frac{\pi}{2} 2π . (等价于: x T A x ≥ 0 \boldsymbol{x}^TA\boldsymbol{x}\geq0 xTAx0 .)

1.1 为什么协方差矩阵是半正定的

对于任意多元随机变量 t \boldsymbol{t} t ,协方差矩阵为
C = E [ ( t − t ˉ ) ( t − t ˉ ) T ] C=\mathbb{E}\left[(\boldsymbol{t}-\bar{\boldsymbol{t}})(\boldsymbol{t}-\bar{\boldsymbol{t}})^T\right] C=E[(ttˉ)(ttˉ)T]

现给定任意一个向量 x \boldsymbol{x} x ,则 x T C x = x T E [ ( t − t ˉ ) ( t − t ˉ ) T ] x = E [ x T ( t − t ˉ ) ( t − t ˉ ) T x ] = E ( s 2 ) = σ s 2 \boldsymbol{x}^TC\boldsymbol{x}=\boldsymbol{x}^T\mathbb{E}\left[(\boldsymbol{t}-\bar{\boldsymbol{t}})(\boldsymbol{t}-\bar{\boldsymbol{t}})^T\right]\boldsymbol{x} =\mathbb{E}\left[\boldsymbol{x}^T(\boldsymbol{t}-\bar{\boldsymbol{t}})(\boldsymbol{t}-\bar{\boldsymbol{t}})^T\boldsymbol{x}\right]=\mathbb{E}(s^2)=\sigma_{s}^2 xTCx=xTE[(ttˉ)(ttˉ)T]x=E[xT(ttˉ)(ttˉ)Tx]=E(s2)=σs2
其中, σ s = x T ( t − t ˉ ) = ( t − t ˉ ) T x \sigma_s=\boldsymbol{x}^T(\boldsymbol{t}-\bar{\boldsymbol{t}})=(\boldsymbol{t}-\bar{\boldsymbol{t}})^T\boldsymbol{x} σs=xT(ttˉ)=(ttˉ)Tx。由于 σ s 2 ≥ 0 \sigma_s^2\geq0 σs20 ,因此, x T C x ≥ 0 \boldsymbol{x}^TC\boldsymbol{x}\geq0 xTCx0 ,协方差矩阵 C C C 是半正定的。

2. 逆矩阵

分块矩阵(Block matrix) 的逆矩阵恒等式:
( A B C D ) − 1 = ( M − M B D − 1 − D − 1 C M D − 1 + D − 1 C M B D − 1 ) \begin{pmatrix}A&B\\C&D\end{pmatrix}^{-1}=\begin{pmatrix}M&-MBD^{-1}\\-D^{-1}CM&D^{-1}{+D^{-1}CMBD^{-1}}\end{pmatrix} (ACBD)1=(MD1CMMBD1D1+D1CMBD1)
其中 M = ( A − B D − 1 C ) − 1 M=(A-BD^{-1}C)^{-1} M=(ABD1C)1

若A,C为可逆方阵,则有 ( A + B C D ) − 1 = A − 1 − A − 1 B ( D A − 1 B + C − 1 ) − 1 D A − 1 (A+BCD)^{-1}=A^{-1}-A^{-1}B(DA^{-1}B+C^{-1})^{-1}DA^{-1} (A+BCD)1=A1A1B(DA1B+C1)1DA1


四、如何对称化矩阵

  1. 将一个矩阵A对称化为它的转置矩阵 A T A^T AT和矩阵 A A A 的平均值矩阵(即 ( A + A T ) / 2 (A+A^T)/2 (A+AT)/2
  2. 对于一个实对称矩阵 A A A,我们可以通过正交相似变换将其对角化。这意味着存在一个正交矩阵 Q Q Q,使得 Q − 1 A Q = D Q^{-1}AQ = D Q1AQ=D,其中 D D D 是对角矩阵。具体操作为:求出矩阵 A A A 的特征向量和特征值,将特征向量作为列组成正交矩阵 Q Q Q,再利用 Q − 1 A Q = D Q^{-1}AQ = D Q1AQ=D得到对角化矩阵 D D D,最后用 Q D Q − 1 QDQ^{-1} QDQ1代替原矩阵 A A A
  3. 我们还可以通过对称正定矩阵的平方根来对称化一个矩阵。具体操作为:对于一个实对称正定矩阵 A A A,求出它的特征值和特征向量,然后得到矩阵 A A A的平方根,即 A 1 2 = Q L Q − 1 A^{\frac{1}{2}} = QLQ^{-1} A21=QLQ1,其中 Q Q Q是特征向量组成的正交矩阵, L L L是对角矩阵,对角线上的元素为 A A A的特征值的平方根。最后用 A 1 2 A A 1 2 A^{\frac{1}{2}}AA^{\frac{1}{2}} A21AA21代替原矩阵 A A A

工具网站

References

矩阵微积分 | Here4U

  • 22
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
第1章 线性空间与线性变换 1.1 基本概念 1.2 主要结论 1.3 常用方法 1.4 内容结构框图 1.5 课后习题全解 1.6 学习效果测试题及答案 第2章 范数理论及其应用 2.1 基本概念 2.2 主要结论 2.3 常用方法 2.4 内容结构框图 2.5 课后习题全解 2.6 学习效果测试题及答案 第3章 矩阵分析及其应用 3.1 基本概念 3.2 主要结论 3.3 常用方法 3.4 内容结构框图 3.5 课后习题全解 3.6 学习效果测试题及答案 第4章 矩阵分解 4.1 基本概念 4.2 主要结论 4.3 常用方法 4.4 内容结构框图 4.5 课后习题全解 4.6 学习效果测试题及答案 第5章 特征值的估计及对称矩阵的极性 5.1 基本概念 5.2 主要结论 5.3 常用方法 5.4 内容结构框图 5.5 课后习题全解 5.6 学习效果测试题及答案 第6章 广义逆矩阵 6.1 基本概念 6.2 主要结论 6.3 常用方法 6.4 内容结构框图 6.5 课后习题全解 6.6 学习效果测试题及答案 附录 课程考试真题及解答 试题一 试题一解答 试题二 试题二解答 试题三 试题三解答 试题四 试题四解答 试题五 试题五解答 试题六 试题六解答 试题七 试题七解答 试题八 试题八解答 试题九 试题九解答 试题十 试题十解答 试题十一 试题十一解答 试题十二 试题十二解答 试题十三 试题十三解答 试题十四 试题十四解答 试题十五 试题十五解答 试题十六 试题十六解答 试题十七 试题十七解答 试题十八 试题十八解答 参考文献
矩阵论是一门研究矩阵矩阵运算的数学分支。它在实际应用中有着广泛的应用,特别是在线性代数、统计学、物理学、工程学等领域。所以,对于矩阵论的学习与掌握是非常重要的。 对于北理工Matrix课后习题,我将通过以下三个方面进行回答。 首先,理论知识的掌握是解答习题的基础。我们需要掌握矩阵的基本概念及其运算规则,例如矩阵的加法、乘法、转置等。此外,我们还需要了解矩阵的性质,如可逆矩阵、对称矩阵、特征值与特征向量等。只有掌握了这些基本理论知识,才能正确地解答习题。 其次,习题的解答需要灵活运用矩阵的相关算法。在解答习题时,我们需要识别问题的类型,并选择合适的矩阵算法进行求解。例如,对于线性方程组的求解,可以利用矩阵的行列式和逆矩阵求解;对于特征值和特征向量问题,可以利用矩阵的特征多项式进行计算。通过运用这些算法,可以更加高效地解答习题。 最后,多做习题提高解题能力。习题的练习可以帮助巩固理论知识,并提高解题能力。可以利用北理工Matrix提供的习题集,逐个进行分析和解答。同时也可以寻找其他相关矩阵论的习题进行练习,加深自己的理解和掌握。 总之,对于北理工Matrix课后习题的回答需要掌握矩阵论的基本理论知识,适当运用矩阵算法进行解答,并多做练习提高解题能力。通过不断练习和学习,相信能够更好地理解和掌握矩阵论这门学科。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值