矩阵求导——学习笔记

矩阵、向量求导法则,我们这里默认的是分母布局

一、基本定理

1.1 对元素求导

  1. 行向量

y T = [ y 1 … y n ] \mathbf{y}^T=[y_1 \dots y_n] yT=[y1yn] 是行向量, x x x 是元素,则
∂ y T ∂ x = [ ∂ y 1 ∂ x ⋯ ∂ y n ∂ x ] \frac{\partial \mathbf{y}^{T}}{\partial x}=\left[\begin{array}{lll} \frac{\partial y_{1}}{\partial x} & \cdots & \frac{\partial y_{n}}{\partial x} \end{array}\right] xyT=[xy1xyn]

  1. 列向量

y = [ y 1 ⋮ y m ] \mathbf{y}=\left[\begin{array}{c} y_{1} \\ \vdots \\ y_{m} \end{array}\right] y= y1ym m m m 维列向量, x x x 是元素,则

∂ y ∂ x = [ ∂ y 1 ∂ x ⋮ ∂ y m ∂ x ] \frac{\partial \mathbf{y}}{\partial x}=\left[\begin{array}{c} \frac{\partial y_{1}}{\partial x} \\ \vdots \\ \frac{\partial y_{m}}{\partial x} \end{array}\right] xy= xy1xym

  1. 矩阵

Y = [ y 11 ⋯ y 1 n ⋮ ⋮ y m 1 ⋯ y m n ] Y=\left[\begin{array}{ccc} y_{11} & \cdots & y_{1 n} \\ \vdots & & \vdots \\ y_{m 1} & \cdots & y_{m n} \end{array}\right] Y= y11ym1y1nymn m × n m\times n m×n 矩阵, x x x 是元素,则

∂ Y ∂ x = [ ∂ y 11 ∂ x ⋯ ∂ y 1 n ∂ x ⋮ ∂ y m 1 ∂ x ⋯ ∂ y m n ∂ x ] \frac{\partial Y}{\partial x}=\left[\begin{array}{ccc} \frac{\partial y_{11}}{\partial x} & \cdots & \frac{\partial y_{1 n}}{\partial x} \\ \vdots & & \\ \frac{\partial y_{m 1}}{\partial x} & \cdots & \frac{\partial y_{m n}}{\partial x} \end{array}\right] xY= xy11xym1xy1nxymn

1.2 对行向量求导

  1. 元素
    y y y 是元素, x T = [ x 1 ⋯ x q ] \mathbf{x}^{T}=\left[\begin{array}{lll} x_{1} & \cdots & x_{q} \end{array}\right] xT=[x1xq] q q q 维行向量,则

∂ y ∂ x T = [ ∂ y ∂ x 1 ⋯ ∂ y ∂ x q ] \frac{\partial y}{\partial \mathbf{x}^{T}}=\left[\begin{array}{lll} \frac{\partial y}{\partial x_{1}} & \cdots & \frac{\partial y}{\partial x_{q}} \end{array}\right] xTy=[x1yxqy]

  1. 列向量

y = [ y 1 ⋮ y m ] \mathbf{y}=\left[\begin{array}{c} y_{1} \\ \vdots \\ y_{m} \end{array}\right] y= y1ym m m m 维列向量, x T = [ x 1 ⋯ x q ] \mathbf{x}^{T}=\left[\begin{array}{lll} x_{1} & \cdots & x_{q} \end{array}\right] xT=[x1xq] q q q 维行向量,则

∂ y ∂ x T = [ ∂ y 1 ∂ x 1 ⋯ ∂ y 1 ∂ x q ⋮ ∂ y m ∂ x 1 ⋯ ∂ y m ∂ x q ] \frac{\partial \mathbf{y}}{\partial \mathbf{x}^{T}}=\left[\begin{array}{ccc} \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{q}} \\ \vdots & & \\ \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{q}} \end{array}\right] xTy= x1y1x1ymxqy1xqym

  1. 行向量

y T = [ y 1 … y n ] \mathbf{y}^T=[y_1 \dots y_n] yT=[y1yn] n n n 维行向量, x T = [ x 1 ⋯ x q ] \mathbf{x}^{T}=\left[\begin{array}{lll} x_{1} & \cdots & x_{q} \end{array}\right] xT=[x1xq] q q q 维行向量,则
∂ y T ∂ x T = [ ∂ y T ∂ x 1 ⋯ ∂ y T ∂ x q ] \frac{\partial \mathbf{y}^{T}}{\partial \mathbf{x}^{T}}=\left[\begin{array}{lll} \frac{\partial \mathbf{y}^{T}}{\partial x_{1}} & \cdots & \frac{\partial \mathbf{y}^{T}}{\partial x_{q}} \end{array}\right] xTyT=[x1yTxqyT]

  1. 矩阵

Y = [ y 11 ⋯ y 1 n ⋮ ⋮ y m 1 ⋯ y m n ] Y=\left[\begin{array}{ccc} y_{11} & \cdots & y_{1 n} \\ \vdots & & \vdots \\ y_{m 1} & \cdots & y_{m n} \end{array}\right] Y= y11ym1y1nymn m × n m\times n m×n 矩阵, x T = [ x 1 ⋯ x q ] \mathbf{x}^{T}=\left[\begin{array}{lll} x_{1} & \cdots & x_{q} \end{array}\right] xT=[x1xq] q q q 维行向量,则

∂ Y ∂ x T = [ ∂ Y ∂ x 1 ⋯ ∂ Y ∂ x q ] \frac{\partial Y}{\partial \mathbf{x}^{T}}=\left[\begin{array}{lll} \frac{\partial Y}{\partial x_{1}} & \cdots & \frac{\partial Y}{\partial x_{q}} \end{array}\right] xTY=[x1YxqY]

1.3 对列向量求导

  1. 元素

y y y 是元素, x = [ x 1 ⋮ x p ] \mathbf{x}=\left[\begin{array}{c} x_{1} \\ \vdots \\ x_{p} \end{array}\right] x= x1xp p p p 维列向量,则
∂ y ∂ x = [ ∂ y ∂ x 1 ⋮ ∂ y ∂ x p ] \frac{\partial y}{\partial \mathbf{x}}=\left[\begin{array}{c} \frac{\partial y}{\partial x_{1}} \\ \vdots \\ \frac{\partial y}{\partial x_{p}} \end{array}\right] xy= x1yxpy

  1. 行向量

y T = [ y 1 … y n ] \mathbf{y}^T=[y_1 \dots y_n] yT=[y1yn] n n n 维行向量, x = [ x 1 ⋮ x p ] \mathbf{x}=\left[\begin{array}{c} x_{1} \\ \vdots \\ x_{p} \end{array}\right] x= x1xp p p p 维列向量,则
∂ y T ∂ x = [ ∂ y 1 ∂ x 1 ⋯ ∂ y n ∂ x 1 ⋮ ∂ y 1 ∂ x p ⋯ ∂ y n ∂ x p ] \frac{\partial \mathbf{y}^{T}}{\partial \mathbf{x}}=\left[\begin{array}{ccc} \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{n}}{\partial x_{1}} \\ \vdots & & \\ \frac{\partial y_{1}}{\partial x_{p}} & \cdots & \frac{\partial y_{n}}{\partial x_{p}} \end{array}\right] xyT= x1y1xpy1x1ynxpyn

  1. 列向量

y = [ y 1 ⋮ y m ] \mathbf{y}=\left[\begin{array}{c} y_{1} \\ \vdots \\ y_{m} \end{array}\right] y= y1ym m m m 维列向量, x = [ x 1 ⋮ x p ] \mathbf{x}=\left[\begin{array}{c} x_{1} \\ \vdots \\ x_{p} \end{array}\right] x= x1xp p p p 维列向量,则
∂ y ∂ x = [ ∂ y 1 ∂ x ⋮ ∂ y m ∂ x ] \frac{\partial \mathbf{y}}{\partial \mathbf{x}}=\left[\begin{array}{c} \frac{\partial y_{1}}{\partial \mathbf{x}} \\ \vdots \\ \frac{\partial y_{m}}{\partial \mathbf{x}} \end{array}\right] xy= xy1xym

  1. 矩阵

Y = [ y 11 ⋯ y 1 n ⋮ ⋮ y m 1 ⋯ y m n ] Y=\left[\begin{array}{ccc} y_{11} & \cdots & y_{1 n} \\ \vdots & & \vdots \\ y_{m 1} & \cdots & y_{m n} \end{array}\right] Y= y11ym1y1nymn m × n m\times n m×n 矩阵, x = [ x 1 ⋮ x p ] \mathbf{x}=\left[\begin{array}{c} x_{1} \\ \vdots \\ x_{p} \end{array}\right] x= x1xp p p p 维列向量,则
∂ Y ∂ x = [ ∂ y 11 ∂ x ⋯ ∂ y 1 n ∂ x ⋮ ⋮ ∂ y m 1 ∂ x ⋯ ∂ y m n ∂ x ] \frac{\partial Y}{\partial \mathbf{x}}=\left[\begin{array}{ccc} \frac{\partial y_{11}}{\partial \mathbf{x}} & \cdots & \frac{\partial y_{1 n}}{\partial \mathbf{x}} \\ \vdots & & \vdots \\ \frac{\partial y_{m 1}}{\partial \mathbf{x}} & \cdots & \frac{\partial y_{m n}}{\partial \mathbf{x}} \end{array}\right] xY= xy11xym1xy1nxymn

1.4 对矩阵求导

  1. 元素

y y y 是元素, X = [ x 11 ⋯ x 1 q ⋮ ⋮ x p 1 ⋯ y p q ] X=\left[\begin{array}{ccc} x_{11} & \cdots & x_{1 q} \\ \vdots & & \vdots \\ x_{p 1} & \cdots & y_{p q} \end{array}\right] X= x11xp1x1qypq p × q p\times q p×q 维矩阵,则

∂ y ∂ X T = [ ∂ y ∂ x 11 ⋯ ∂ y ∂ x 1 q ⋮ ∂ y ∂ x p 1 ⋯ ∂ y ∂ x p q ] \frac{\partial y}{\partial X^T}=\left[\begin{array}{ccc} \frac{\partial y}{\partial x_{11}} & \cdots & \frac{\partial y}{\partial x_{1 q}} \\ \vdots & & \\ \frac{\partial y}{\partial x_{p 1}} & \cdots & \frac{\partial y}{\partial x_{p q}} \end{array}\right] XTy= x11yxp1yx1qyxpqy

  1. 行向量

y T = [ y 1 … y n ] \mathbf{y}^T=[y_1 \dots y_n] yT=[y1yn] n n n 维行向量,
X = [ x 11 ⋯ x 1 q ⋮ ⋮ x p 1 ⋯ y p q ] X=\left[\begin{array}{ccc} x_{11} & \cdots & x_{1 q} \\ \vdots & & \vdots \\ x_{p 1} & \cdots & y_{p q} \end{array}\right] X= x11xp1x1qypq p × q p\times q p×q 维矩阵,则

∂ y T ∂ X = [ ∂ y T ∂ x 11 ⋯ ∂ y T ∂ x 1 q ⋮ ∂ y T ∂ x p 1 ⋯ ∂ y T ∂ x p q ] \frac{\partial \mathbf{y}^{T}}{\partial X}=\left[\begin{array}{ccc} \frac{\partial \mathbf{y}^{T}}{\partial x_{11}} & \cdots & \frac{\partial \mathbf{y}^{T}}{\partial x_{1 q}} \\ \vdots & & \\ \frac{\partial \mathbf{y}^{T}}{\partial x_{p 1}} & \cdots & \frac{\partial \mathbf{y}^{T}}{\partial x_{p q}} \end{array}\right] XyT= x11yTxp1yTx1qyTxpqyT

二、例题

这一部分的主要例题是学习了B站UP主 杂谈博士 的视频,大家有空可以去学习一下。

例1 x = ( ξ 1 , ξ 2 , ⋯ ξ n ) T \boldsymbol{x}=\left(\xi_{1}, \xi_{2}, \cdots \xi_{n}\right)^{T} x=(ξ1,ξ2,ξn)T,n 元函数 f ( x ) f(\boldsymbol{x}) f(x),求 d f   d x T \frac{\mathrm{d} f}{\mathrm{~d} \boldsymbol{x}^{T}}  dxTdf d f   d x \frac{\mathrm{d} f}{\mathrm{~d} \boldsymbol{x}}  dxdf,和 d 2 f   d x 2 \frac{\mathrm{d}^{2} f}{\mathrm{~d} \boldsymbol{x}^{2}}  dx2d2f

解:根据定义有

d f   d x T = ( ∂ f ∂ ξ 1 , ∂ f ∂ ξ 2 , ⋯   , ∂ f ∂ ξ n ) \frac{\mathrm{d} f}{\mathrm{~d} \boldsymbol{x}^{T}}=\left(\frac{\partial f}{\partial \xi_{1}}, \frac{\partial f}{\partial \xi_{2}}, \cdots, \frac{\partial f}{\partial \xi_{n}}\right)  dxTdf=(ξ1f,ξ2f,,ξnf)

梯度

∇ f ( x ) = d f d x = ( ∂ f ∂ ξ 1 ⋮ ∂ f ∂ ξ n ) \nabla f(\boldsymbol{x})=\frac{\mathbf{d} f}{\mathbf{d} \boldsymbol{x}}=\left(\begin{array}{c} \frac{\partial f}{\partial \xi_{1}} \\ \vdots \\ \frac{\partial f}{\partial \xi_{n}} \end{array}\right) f(x)=dxdf= ξ1fξnf

Hessian阵:

H ( x ) = ∇ 2 f ( x ) = d 2 f d x 2 = ( ∂ 2 f ∂ ξ 1 2 ∂ 2 f ∂ ξ 1 ∂ ξ 2 ⋯ ∂ 2 f ∂ ξ 1 ∂ ξ n ∂ 2 f ∂ ξ 2 ∂ ξ 1 ∂ 2 f ∂ ξ 2 2 ⋯ ∂ 2 f ∂ ξ 2 ∂ ξ n ⋮ ⋮ ⋱ ⋮ ∂ 2 f ∂ ξ n ∂ ξ 1 ∂ 2 f ∂ ξ n ∂ ξ 2 ⋯ ∂ 2 f ∂ ξ n 2 ) \boldsymbol{H}(\boldsymbol{x})=\nabla^{2} f(\boldsymbol{x})=\frac{\mathbf{d}^{2} f}{\mathbf{d} \boldsymbol{x}^{2}}=\left(\begin{array}{cccc} \frac{\partial^{2} f}{\partial \xi_{1}^{2}} & \frac{\partial^{2} f}{\partial \xi_{1} \partial \xi_{2}} & \cdots & \frac{\partial^{2} f}{\partial \xi_{1} \partial \xi_{n}} \\ \frac{\partial^{2} f}{\partial \xi_{2} \partial \xi_{1}} & \frac{\partial^{2} f}{\partial \xi_{2}^{2}} & \cdots & \frac{\partial^{2} f}{\partial \xi_{2} \partial \xi_{n}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^{2} f}{\partial \xi_{n} \partial \xi_{1}} & \frac{\partial^{2} f}{\partial \xi_{n} \partial \xi_{2}} & \cdots & \frac{\partial^{2} f}{\partial \xi_{n}^{2}} \end{array}\right) H(x)=2f(x)=dx2d2f= ξ122fξ2ξ12fξnξ12fξ1ξ22fξ222fξnξ22fξ1ξn2fξ2ξn2fξn22f



例2 A = ( a i j ) m × n \boldsymbol{A}=\left(a_{i j}\right)_{m \times n} A=(aij)m×n 是常矩阵, X = ( x i j ) n × m \boldsymbol{X}=\left(x_{i j}\right)_{n \times m} X=(xij)n×m 是矩阵变量,且 f ( X ) = tr ⁡ ( A X ) f(X)=\operatorname{tr}(A X) f(X)=tr(AX),求 ∂ f ∂ X \frac{\partial f}{\partial \boldsymbol{X}} Xf

分析:

( c 11 ⋯ c 1 m ⋮ ⋱ ⋮ c m 1 ⋯ c m m ) = ( a 11 ⋯ a 1 n ⋮ ⋱ ⋮ a m 1 ⋯ a m n ) ( x 11 ⋯ x 1 m ⋮ ⋱ ⋮ x n 1 ⋯ x n m ) \left(\begin{array}{ccc} c_{11} & \cdots & c_{1 m} \\ \vdots & \ddots & \vdots \\ c_{m 1} & \cdots & c_{m m} \end{array}\right)=\left(\begin{array}{ccc} a_{11} & \cdots & a_{1 n} \\ \vdots & \ddots & \vdots \\ a_{m 1} & \cdots & a_{m n} \end{array}\right)\left(\begin{array}{ccc} x_{11} & \cdots & x_{1 m} \\ \vdots & \ddots & \vdots \\ x_{n 1} & \cdots & x_{n m} \end{array}\right) c11cm1c1mcmm = a11am1a1namn x11xn1x1mxnm

解:
由于 A X = ( ∑ k = 1 n a i k x k j ) m × m \boldsymbol{A} \boldsymbol{X}=\left(\sum_{k=1}^{n} a_{i k} x_{k j}\right)_{m \times m} AX=(k=1naikxkj)m×m

所以

f ( X ) = tr ⁡ ( A X ) = ∑ s = 1 m ( ∑ k = 1 n a s k x k s ) f(X)=\operatorname{tr}(A X)=\sum_{s=1}^{m}\left(\sum_{k=1}^{n} a_{s k} x_{k s}\right) f(X)=tr(AX)=s=1m(k=1naskxks)

( ∂ f ∂ x i j ) n × m = ( a j i ) n × m ( i = 1 , 2 , ⋯   , n j = 1 , 2 , ⋯   , m ) \left(\frac{\partial f}{\partial x_{i j}}\right)_{n \times m}=\left(a_{j i}\right)_{n \times m} \quad(i=1,2, \cdots, n \quad j=1,2,\cdots,m) (xijf)n×m=(aji)n×m(i=1,2,,nj=1,2,,m)

∂ f ∂ X = ( ∂ f ∂ x i j ) n × m = ( a j i ) n × m = A T \frac{\partial f}{\partial \boldsymbol{X}}=\left(\frac{\partial f}{\partial x_{i j}}\right)_{n \times m}=\left(a_{j i}\right)_{n \times m}=\boldsymbol{A}^{T} Xf=(xijf)n×m=(aji)n×m=AT



例3 x = ( ξ 1 , ξ 2 , ⋯ ξ n ) T \boldsymbol{x}=\left(\xi_{1}, \xi_{2}, \cdots \xi_{n}\right)^{T} x=(ξ1,ξ2,ξn)T A = ( a i j ) n × n \boldsymbol{A}=\left(a_{i j}\right)_{n \times n} A=(aij)n×n 是常矩阵,n 元函数 f ( x ) = x T A x f(\boldsymbol{x})=\boldsymbol{x}^{T} \mathbf{A} \boldsymbol{x} f(x)=xTAx,求 d f   d x \frac{\mathrm{d} f}{\mathrm{~d} \boldsymbol{x}}  dxdf

解:因 f ( x ) = ξ 1 ∑ j = 1 n a 1 j ξ j + ⋯ + ξ k ∑ j = 1 n a k j ξ j + ⋯ + ξ n ∑ j = 1 n a n j ξ j f(\boldsymbol{x})=\xi_{1} \sum_{j=1}^{n} a_{1 j} \xi_{j}+\cdots+\xi_{k} \sum_{j=1}^{n} a_{k j} \xi_{j}+\cdots+\xi_{n} \sum_{j=1}^{n} a_{n j} \xi_{j} f(x)=ξ1j=1na1jξj++ξkj=1nakjξj++ξnj=1nanjξj

所以

∂ f ( x ) ∂ ξ k = ξ 1 a 1 k + … + ξ k − 1 a k − 1 , k + ( ∑ j = 1 n a k j ξ j + ξ k a k k ) + ξ k + 1 a k + 1 , k + … + ξ n a n k = ∑ i = 1 n a i k ξ i + ∑ j = 1 n a k j ξ j , k = 1 , 2 , ⋯   , n \begin{aligned} \frac{\partial f(\boldsymbol{x})}{\partial \xi_{k}} &=\xi_{1} a_{1 k}+\ldots+\xi_{k-1} a_{k-1, k}+\left(\sum_{j=1}^{n} a_{k j} \xi_{j}+\xi_{k} a_{k k}\right)+\xi_{k+1} a_{k+1, k}+\ldots+\xi_{n} a_{n k} \\ &=\sum_{i=1}^{n} a_{i k} \xi_{i}+\sum_{j=1}^{n} a_{k j} \xi_{j}, \quad k=1,2, \cdots, n\\ \end{aligned} ξkf(x)=ξ1a1k++ξk1ak1,k+(j=1nakjξj+ξkakk)+ξk+1ak+1,k++ξnank=i=1naikξi+j=1nakjξj,k=1,2,,n

d f   d x = ( ∂ f ∂ ξ 1 ∂ f ∂ ξ 2 ⋮ ∂ f ∂ ξ n ) = ( ∑ j = 1 n a 1 j ξ j ∑ j = 1 n a 2 j ξ j ⋮ ∑ j = 1 n a i j ξ j ) + ( ∑ i = 1 n a i 1 ξ i ∑ i = 1 n a i 2 ξ i ⋮ ∑ i = 1 n a i n ξ i ) = = A x + A T x = ( A + A T ) x \frac{\mathrm{d} f}{\mathrm{~d} \boldsymbol{x}}=\left(\begin{array}{c} \frac{\partial f}{\partial \xi_{1}} \\ \frac{\partial f}{\partial \xi_{2}} \\ \vdots \\ \frac{\partial f}{\partial \xi_{n}} \end{array}\right)=\left(\begin{array}{c} \sum_{j=1}^{n} a_{1j} \xi_{j} \\ \sum_{j=1}^{n} a_{2 j} \xi_{j} \\ \vdots \\ \sum_{j=1}^{n} a_{i j} \xi_{j} \end{array}\right)+\left(\begin{array}{c} \sum_{i=1}^{n} a_{i 1} \xi_{i} \\ \sum_{i=1}^{n} a_{i 2} \xi_{i} \\ \vdots \\ \sum_{i=1}^{n} a_{i n} \xi_{i} \end{array}\right)==\mathbf{A} \boldsymbol{x}+\mathbf{A}^{T} \boldsymbol{x}=\left(\mathbf{A}+\mathbf{A}^{T}\right) \boldsymbol{x}  dxdf= ξ1fξ2fξnf = j=1na1jξjj=1na2jξjj=1naijξj + i=1nai1ξii=1nai2ξii=1nainξi ==Ax+ATx=(A+AT)x

特别地,当 A A A 为对称矩阵时, d f   d x = 2 A x \frac{\mathrm{d} f}{\mathrm{~d} \boldsymbol{x}}=2 A x  dxdf=2Ax



例4 A ∈ R m × n \boldsymbol{A} \in \mathbb{R}^{m \times n} ARm×n b ∈ R m \boldsymbol{b} \in \mathbb{R}^{m} bRm x ∈ R n \boldsymbol{x} \in \mathbb{R}^{n} xRn f ( x ) = ∥ A x − b ∥ 2 2 f(x)=\|\boldsymbol{A} \boldsymbol{x}-\boldsymbol{b}\|_{2}^{2} f(x)=Axb22,试求 d f   d x \frac{\mathrm{d} f}{\mathrm{~d} \boldsymbol{x}}  dxdf

解:因为

f ( x ) = ∥ A x − b ∥ 2 2 = ( A x − b , A x − b ) = ( A x − b ) T ( A x − b ) = ( x T A T − b T ) ( A x − b ) = x T A T A x − b T A x − x T A T b + b T b = x T ( A T A ) x − 2 ( A T b ) T x + b T b \begin{aligned} f(\boldsymbol{x}) & =\|\boldsymbol{A} \boldsymbol{x}-\boldsymbol{b}\|_{2}^{2}=(\boldsymbol{A} \boldsymbol{x}-\boldsymbol{b}, \boldsymbol{A} \boldsymbol{x}-\boldsymbol{b})=(\boldsymbol{A} \boldsymbol{x}-\boldsymbol{b})^{T}(\boldsymbol{A} \boldsymbol{x}-\boldsymbol{b}) \\ & =\left(\boldsymbol{x}^{T} \boldsymbol{A}^{T}-\boldsymbol{b}^{T}\right)(\boldsymbol{A} \boldsymbol{x}-\boldsymbol{b}) \\ & =\boldsymbol{x}^{T} \boldsymbol{A}^{T} \boldsymbol{A} \boldsymbol{x}-\boldsymbol{b}^{T} \boldsymbol{A} \boldsymbol{x}-\boldsymbol{x}^{T} \boldsymbol{A}^{T} \boldsymbol{b}+\boldsymbol{b}^{T} \boldsymbol{b} \\ & =x^{T}\left(A^{T} A\right) x-2\left(A^{T} b\right)^{T} x+b^{T} b \end{aligned} f(x)=Axb22=(Axb,Axb)=(Axb)T(Axb)=(xTATbT)(Axb)=xTATAxbTAxxTATb+bTb=xT(ATA)x2(ATb)Tx+bTb

最后一行我懒得打 \boldsymbol。大家理解一下,doge。此外,二次型的导数之所以可以写成这样,是因为 A T A \boldsymbol{A}^{T} \boldsymbol{A} ATA 是一个对称矩阵。

从而

d f   d x = 2 A T A x − 2 A T b \frac{\mathrm{d} f}{\mathrm{~d} \boldsymbol{x}}=2 \boldsymbol{A}^{T} \boldsymbol{A} \boldsymbol{x}-2 \boldsymbol{A}^{T} \boldsymbol{b}  dxdf=2ATAx2ATb

我们令这个导数等于零,即 A T A x = A T b \boldsymbol{A}^T\boldsymbol{A}\boldsymbol{x}=\boldsymbol{A}^T\boldsymbol{b} ATAx=ATb 一定有解的,因为 tr ⁡ ( A T A ) = tr ⁡ ( A T b ) \operatorname{tr}(\boldsymbol{A}^T\boldsymbol{A})=\operatorname{tr}(\boldsymbol{A}^T\boldsymbol{b}) tr(ATA)=tr(ATb)

详细证明我还没学…。

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

No_one-_-2022

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值