矩阵求导之三:定义篇(中)

4 向量对向量求导

4.1 定义

4.1.1 行向量对列向量求导

也称分母布局,用 ∂ y T ∂ x \frac{\partial \boldsymbol{y}^T}{\partial \boldsymbol{x}} xyT表示。
m m m维行向量 y T = [ y 1 , y 2 , ⋯   , y m ] \boldsymbol{y}^T=\left[ y_1,y_2,\cdots ,y_m \right] yT=[y1,y2,,ym] n n n维列向量 x = [ x 1 , x 2 , ⋯   , x n ] T \boldsymbol{x}=\left[ x_1,x_2,\cdots ,x_n \right] ^T x=[x1,x2,,xn]T求导,得到的是 n × m n\times m n×m维矩阵:
∂ y T ∂ x = [ ∂ y T ∂ x 1 ∂ y T ∂ x 2 ⋮ ∂ y T ∂ x n ] = [ ∂ y 1 ∂ x 1 ∂ y 2 ∂ x 1 ⋯ ∂ y m ∂ x 1 ∂ y 1 ∂ x 2 ∂ y 2 ∂ x 2 ⋯ ∂ y m ∂ x 2 ⋮ ⋮ ⋱ ⋮ ∂ y 1 ∂ x n ∂ y 2 ∂ x n ⋯ ∂ y m ∂ x n ] \frac{\partial \boldsymbol{y}^T}{\partial \boldsymbol{x}}=\left[ \begin{array}{c} \frac{\partial \boldsymbol{y}^T}{\partial x_1}\\ \\ \frac{\partial \boldsymbol{y}^T}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial \boldsymbol{y}^T}{\partial x_n}\\ \end{array} \right] =\left[ \begin{matrix} \frac{\partial y_1}{\partial x_1}& \frac{\partial y_2}{\partial x_1}& \cdots& \frac{\partial y_m}{\partial x_1}\\ \\ \frac{\partial y_1}{\partial x_2}& \frac{\partial y_2}{\partial x_2}& \cdots& \frac{\partial y_m}{\partial x_2}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial y_1}{\partial x_n}& \frac{\partial y_2}{\partial x_n}& \cdots& \frac{\partial y_m}{\partial x_n}\\ \end{matrix} \right] xyT= x1yTx2yTxnyT = x1y1x2y1xny1x1y2x2y2xny2x1ymx2ymxnym
数学上将这种矩阵称之为梯度矩阵

4.1.2 列向量对行向量求导

也称分子布局,用 ∂ y ∂ x T \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}^T} xTy表示。
m m m维列向量 y = [ y 1 , y 2 , ⋯   , y m ] T \boldsymbol{y}=\left[ y_1,y_2,\cdots ,y_m \right]^T y=[y1,y2,,ym]T n n n维行向量 x T = [ x 1 , x 2 , ⋯   , x n ] \boldsymbol{x}^T=\left[ x_1,x_2,\cdots ,x_n \right] xT=[x1,x2,,xn]求导,得到的是 m × n m\times n m×n维矩阵:
∂ y ∂ x T = [ ∂ y 1 ∂ x   ∂ y 2 ∂ x   ⋮   ∂ y m ∂ x ] = [ ∂ y 1 ∂ x 1 ∂ y 1 ∂ x 2 ⋯ ∂ y 1 ∂ x n   ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 ⋯ ∂ y 2 ∂ x n   ⋮ ⋮ ⋱ ⋮   ∂ y m ∂ x 1 ∂ y m ∂ x 2 ⋯ ∂ y m ∂ x n ] \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}^T}=\left[ \begin{array}{c} \frac{\partial y_1}{\partial \boldsymbol{x}}\\ \\\ \frac{\partial y_2}{\partial \boldsymbol{x}}\\ \\\ \vdots\\ \\\ \frac{\partial y_m}{\partial \boldsymbol{x}}\\ \end{array} \right] =\left[ \begin{matrix} \frac{\partial y_1}{\partial x_1}& \frac{\partial y_1}{\partial x_2}& \cdots& \frac{\partial y_1}{\partial x_n}\\ \\\ \frac{\partial y_2}{\partial x_1}& \frac{\partial y_2}{\partial x_2}& \cdots& \frac{\partial y_2}{\partial x_n}\\ \\\ \vdots& \vdots& \ddots& \vdots\\ \\\ \frac{\partial y_m}{\partial x_1}& \frac{\partial y_m}{\partial x_2}& \cdots& \frac{\partial y_m}{\partial x_n}\\ \end{matrix} \right] xTy= xy1 xy2  xym = x1y1 x1y2  x1ymx2y1x2y2x2ymxny1xny2xnym
数学上将这种矩阵称之为雅克比 (Jacobian)矩阵

根据定义可以看出
∂ y T ∂ x ≠ ∂ y ∂ x T    , ∂ y T ∂ x = ( ∂ y ∂ x T ) T \frac{\partial \boldsymbol{y}^T}{\partial \boldsymbol{x}}\ne \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}^T}\,\,, \frac{\partial \boldsymbol{y}^T}{\partial \boldsymbol{x}}=\left( \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}^T} \right) ^T xyT=xTy,xyT=(xTy)T

4.2 运算法则

a ( x ) \boldsymbol{a}\left( \boldsymbol{x} \right) a(x) b ( x ) \boldsymbol{b}\left( \boldsymbol{x} \right) b(x) m m m维列向量函数, λ ( x ) \lambda \left( \boldsymbol{x} \right) λ(x)为数量函数, x \boldsymbol{x} x n n n维列向量,则有以下3个运算公式:

4.2.1 加法运算公式

∂ ( a T ( x ) ± b T ( x ) ) ∂ x = ∂ a T ( x ) ∂ x ± ∂ b T ( x ) ∂ x \frac{\partial \left( \boldsymbol{a}^T\left( \boldsymbol{x} \right) \pm \boldsymbol{b}^T\left( \boldsymbol{x} \right) \right)}{\partial \boldsymbol{x}}=\frac{\partial \boldsymbol{a}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\pm \frac{\partial \boldsymbol{b}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}} x(aT(x)±bT(x))=xaT(x)±xbT(x)

4.2.2 数乘运算公式

∂ ( λ ( x ) a T ( x ) ) ∂ x = ∂ λ ( x ) ∂ x ⋅ a T ( x ) + λ ( x ) ⋅ ∂ a T ( x ) ∂ x \frac{\partial \left( \lambda \left( \boldsymbol{x} \right) \boldsymbol{a}^T\left( \boldsymbol{x} \right) \right)}{\partial \boldsymbol{x}}=\frac{\partial \lambda \left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{a}^T\left( \boldsymbol{x} \right) +\lambda \left( \boldsymbol{x} \right) \cdot \frac{\partial \boldsymbol{a}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}} x(λ(x)aT(x))=xλ(x)aT(x)+λ(x)xaT(x)

4.2.3 乘法运算公式

∂ [ a T ( x ) ⋅ b ( x ) ] ∂ x = ∂ a T ( x ) ∂ x ⋅ b ( x ) + ∂ b T ( x ) ∂ x ⋅ a ( x ) \frac{\partial \left[ \boldsymbol{a}^T\left( \boldsymbol{x} \right) \cdot \boldsymbol{b}\left( \boldsymbol{x} \right) \right]}{\partial \boldsymbol{x}}=\frac{\partial \boldsymbol{a}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{b}\left( \boldsymbol{x} \right) +\frac{\partial \boldsymbol{b}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{a}\left( \boldsymbol{x} \right) x[aT(x)b(x)]=xaT(x)b(x)+xbT(x)a(x)
∂ x ∂ x T = ∂ x T ∂ x = E \frac{\partial \boldsymbol{x}}{\partial \boldsymbol{x}^T}=\frac{\partial \boldsymbol{x}^T}{\partial \boldsymbol{x}}=\boldsymbol{E} xTx=xxT=E

4.3 示例

【例4.1】求证:
∂ [ a T ( x ) ⋅ b ( x ) ] ∂ x = ∂ a T ( x ) ∂ x ⋅ b ( x ) + ∂ b T ( x ) ∂ x ⋅ a ( x ) \frac{\partial \left[ \boldsymbol{a}^T\left( \boldsymbol{x} \right) \cdot \boldsymbol{b}\left( \boldsymbol{x} \right) \right]}{\partial \boldsymbol{x}}=\frac{\partial \boldsymbol{a}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{b}\left( \boldsymbol{x} \right) +\frac{\partial \boldsymbol{b}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{a}\left( \boldsymbol{x} \right) x[aT(x)b(x)]=xaT(x)b(x)+xbT(x)a(x)
【证】
∂ [ a T ( x ) ⋅ b ( x ) ] ∂ x = [ ∂ a T b ∂ x 1 ⋮ ∂ a T b ∂ x i ⋮ ∂ a T b ∂ x n ] = [ ∂ a T ∂ x 1 ⋅ b + a T ⋅ ∂ b ∂ x 1 ⋮ ∂ a T ∂ x i ⋅ b + a T ⋅ ∂ b ∂ x i ⋮ ∂ a T ∂ x m ⋅ b + a T ⋅ ∂ b ∂ x m ]    = [ ∂ a T ∂ x 1 ⋅ b + ∂ b T ∂ x 1 ⋅ a ⋮ ∂ a T ∂ x i ⋅ b + ∂ b T ∂ x i ⋅ a ⋮ ∂ a T ∂ x m ⋅ b + ∂ b T ∂ x m ⋅ a ]    = ∂ a T ( x ) ∂ x ⋅ b ( x ) + ∂ b T ( x ) ∂ x ⋅ a ( x ) \begin{aligned} \frac{\partial \left[ \boldsymbol{a}^T\left( \boldsymbol{x} \right) \cdot \boldsymbol{b}\left( \boldsymbol{x} \right) \right]}{\partial \boldsymbol{x}}&=\left[ \begin{array}{c} \frac{\partial \boldsymbol{a}^T\boldsymbol{b}}{\partial x_1}\\ \vdots\\ \frac{\partial \boldsymbol{a}^T\boldsymbol{b}}{\partial x_i}\\ \vdots\\ \frac{\partial \boldsymbol{a}^T\boldsymbol{b}}{\partial x_n}\\ \end{array} \right] =\left[ \begin{array}{c} \frac{\partial \boldsymbol{a}^T}{\partial x_1}\cdot \boldsymbol{b}+\boldsymbol{a}^T\cdot \frac{\partial \boldsymbol{b}}{\partial x_1}\\ \vdots\\ \frac{\partial \boldsymbol{a}^T}{\partial x_i}\cdot \boldsymbol{b}+\boldsymbol{a}^T\cdot \frac{\partial \boldsymbol{b}}{\partial x_i}\\ \vdots\\ \frac{\partial \boldsymbol{a}^T}{\partial x_m}\cdot \boldsymbol{b}+\boldsymbol{a}^T\cdot \frac{\partial \boldsymbol{b}}{\partial x_m}\\ \end{array} \right] \\ \ \ \\ &=\left[ \begin{array}{c} \frac{\partial \boldsymbol{a}^T}{\partial x_1}\cdot \boldsymbol{b}+\frac{\partial \boldsymbol{b}^T}{\partial x_1}\cdot \boldsymbol{a}\\ \vdots\\ \frac{\partial \boldsymbol{a}^T}{\partial x_i}\cdot \boldsymbol{b}+\frac{\partial \boldsymbol{b}^T}{\partial x_i}\cdot \boldsymbol{a}\\ \vdots\\ \frac{\partial \boldsymbol{a}^T}{\partial x_m}\cdot \boldsymbol{b}+\frac{\partial \boldsymbol{b}^T}{\partial x_m}\cdot \boldsymbol{a}\\ \end{array} \right] \\ \ \ \\ &=\frac{\partial \boldsymbol{a}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{b}\left( \boldsymbol{x} \right) +\frac{\partial \boldsymbol{b}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{a}\left( \boldsymbol{x} \right) \end{aligned} x[aT(x)b(x)]    = x1aTbxiaTbxnaTb = x1aTb+aTx1bxiaTb+aTxibxmaTb+aTxmb = x1aTb+x1bTaxiaTb+xibTaxmaTb+xmbTa =xaT(x)b(x)+xbT(x)a(x)

【例4.2】求
∂ x ∂ x T \frac{\partial \boldsymbol{x}}{\partial \boldsymbol{x}^T} xTx ∂ x T ∂ x \frac{\partial \boldsymbol{x}^T}{\partial \boldsymbol{x}} xxT
其中 x \boldsymbol{x} x n n n维列向量。
【解】
∂ x ∂ x T = [ ∂ x 1 ∂ x 1 ∂ x 1 ∂ x 2 ⋯ ∂ x 1 ∂ x n ∂ x 2 ∂ x 1 ∂ x 2 ∂ x 2 ⋯ ∂ x 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ x n ∂ x 1 ∂ x n ∂ x 2 ⋯ ∂ x n ∂ x n ] = [ 1 0 ⋯ 0 0 1 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ 1 ] = E \frac{\partial \boldsymbol{x}}{\partial \boldsymbol{x}^T}=\left[ \begin{matrix} \frac{\partial x_1}{\partial x_1}& \frac{\partial x_1}{\partial x_2}& \cdots& \frac{\partial x_1}{\partial x_n}\\ \\ \frac{\partial x_2}{\partial x_1}& \frac{\partial x_2}{\partial x_2}& \cdots& \frac{\partial x_2}{\partial x_n}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial x_n}{\partial x_1}& \frac{\partial x_n}{\partial x_2}& \cdots& \frac{\partial x_n}{\partial x_n}\\ \end{matrix} \right] =\left[ \begin{matrix} 1& 0& \cdots& 0\\ \\ 0& 1& \cdots& 0\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ 0& 0& \cdots& 1\\ \end{matrix} \right] =\boldsymbol{E} xTx= x1x1x1x2x1xnx2x1x2x2x2xnxnx1xnx2xnxn = 100010001 =E
∂ x T ∂ x = [ ∂ x 1 ∂ x 1 ∂ x 2 ∂ x 1 ⋯ ∂ x n ∂ x 1 ∂ x 1 ∂ x 2 ∂ x 2 ∂ x 2 ⋯ ∂ x n ∂ x 2 ⋮ ⋮ ⋱ ⋮ ∂ x 1 ∂ x n ∂ x 2 ∂ x n ⋯ ∂ x n ∂ x n ] = [ 1 0 ⋯ 0 0 1 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ 1 ] = E \frac{\partial \boldsymbol{x}^T}{\partial \boldsymbol{x}}=\left[ \begin{matrix} \frac{\partial x_1}{\partial x_1}& \frac{\partial x_2}{\partial x_1}& \cdots& \frac{\partial x_n}{\partial x_1}\\ \\ \frac{\partial x_1}{\partial x_2}& \frac{\partial x_2}{\partial x_2}& \cdots& \frac{\partial x_n}{\partial x_2}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial x_1}{\partial x_n}& \frac{\partial x_2}{\partial x_n}& \cdots& \frac{\partial x_n}{\partial x_n}\\ \end{matrix} \right] =\left[ \begin{matrix} 1& 0& \cdots& 0\\ \\ 0& 1& \cdots& 0\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ 0& 0& \cdots& 1\\ \end{matrix} \right] =\boldsymbol{E} xxT= x1x1x2x1xnx1x1x2x2x2xnx2x1xnx2xnxnxn = 100010001 =E
【例4.3】求
∂ ( x T A ) ∂ x \frac{\partial \left( \boldsymbol{x}^T\boldsymbol{A} \right)}{\partial \boldsymbol{x}} x(xTA)
其中 x \boldsymbol{x} x n n n维列向量, A \boldsymbol{A} A n × m n\times m n×m维常数阵。
【解】设 A = [ α 1 α 2 ⋯ α m ] \boldsymbol{A}=\left[ \begin{matrix} \boldsymbol{\alpha }_1& \boldsymbol{\alpha }_2& \cdots& \boldsymbol{\alpha }_m\\ \end{matrix} \right] A=[α1α2αm]
其中 α i = [ α i 1 α i 2 ⋯ α i n ] T \boldsymbol{\alpha }_i=\left[ \begin{matrix} \alpha _{i1}& \alpha _{i2}& \cdots& \alpha\\ \end{matrix}_{in} \right] ^T αi=[αi1αi2αin]T
n n n维列向量。因此:
x T A = [ x T α 1 x T α 2 ⋯ x T α m ] \boldsymbol{x}^T\boldsymbol{A}=\left[ \begin{matrix} \boldsymbol{x}^T\boldsymbol{\alpha }_1& \boldsymbol{x}^T\boldsymbol{\alpha }_2& \cdots& \boldsymbol{x}^T\boldsymbol{\alpha }_m\\ \end{matrix} \right] xTA=[xTα1xTα2xTαm]
根据定义
∂ ( x T A ) ∂ x = [ ∂ ( x T α 1 ) ∂ x ∂ ( x T α 2 ) ∂ x ⋯ ∂ ( x T α m ) ∂ x ] \frac{\partial \left( \boldsymbol{x}^T\boldsymbol{A} \right)}{\partial \boldsymbol{x}}=\left[ \begin{matrix} \frac{\partial \left( \boldsymbol{x}^T\boldsymbol{\alpha }_1 \right)}{\partial \boldsymbol{x}}& \frac{\partial \left( \boldsymbol{x}^T\boldsymbol{\alpha }_2 \right)}{\partial \boldsymbol{x}}& \cdots& \frac{\partial \left( \boldsymbol{x}^T\boldsymbol{\alpha }_m \right)}{\partial \boldsymbol{x}}\\ \end{matrix} \right] x(xTA)=[x(xTα1)x(xTα2)x(xTαm)]
其中每一个列向量:
∂ ( x T α i ) ∂ x = ∂ x T ∂ x ⋅ α i + ∂ α i T ∂ x ⋅ x = α i \frac{\partial \left( \boldsymbol{x}^T\boldsymbol{\alpha }_i \right)}{\partial \boldsymbol{x}}=\frac{\partial \boldsymbol{x}^T}{\partial \boldsymbol{x}}\cdot \boldsymbol{\alpha }_i+\frac{\partial {\boldsymbol{\alpha }_i}^T}{\partial \boldsymbol{x}}\cdot \boldsymbol{x}=\boldsymbol{\alpha }_i x(xTαi)=xxTαi+xαiTx=αi
因此有:
∂ ( x T A ) ∂ x = [ α 1 α 2 ⋯ α m ] = A \frac{\partial \left( \boldsymbol{x}^T\boldsymbol{A} \right)}{\partial \boldsymbol{x}}=\left[ \begin{matrix} \boldsymbol{\alpha }_1& \boldsymbol{\alpha }_2& \cdots& \boldsymbol{\alpha }_m\\ \end{matrix} \right] =\boldsymbol{A} x(xTA)=[α1α2αm]=A
【推论】
A \boldsymbol{A} A n × n n\times n n×n方阵,则:
∂ x T A T ∂ x = A T \frac{\partial \boldsymbol{x}^T\boldsymbol{A}^T}{\partial \boldsymbol{x}}=\boldsymbol{A}^T xxTAT=AT
【例4.4】求 ∂ ( B x ) ∂ x T \frac{\partial \left( \boldsymbol{Bx} \right)}{\partial \boldsymbol{x}^T} xT(Bx)
其中 x \boldsymbol{x} x n n n维列向量, B \boldsymbol{B} B m × n m\times n m×n矩阵。
【解】
β i \boldsymbol{\beta }_i βi n n n维列向量,则矩阵 B \boldsymbol{B} B写成:
B = [ β 1 T β 2 T ⋯ β m T ] T \boldsymbol{B}=\left[ \begin{matrix} {\boldsymbol{\beta }_1}^T& {\boldsymbol{\beta }_2}^T& \cdots& \boldsymbol{\beta }_m\\ \end{matrix}^T \right] ^T B=[β1Tβ2TβmT]T
则:
B x = [ β 1 T x β 2 T x ⋯ β m T x ] T \boldsymbol{Bx}=\left[ \begin{matrix} {\boldsymbol{\beta }_1}^T\boldsymbol{x}& {\boldsymbol{\beta }_2}^T\boldsymbol{x}& \cdots& \boldsymbol{\beta }_m\\ \end{matrix}^T\boldsymbol{x} \right] ^T Bx=[β1Txβ2TxβmTx]T
∂ ( B x ) ∂ x T = [ ∂ ( β 1 T x ) ∂ x T ∂ ( β 2 T x ) ∂ x T ⋯ ∂ ( β m T x ) ∂ x T ] T \frac{\partial \left( \boldsymbol{Bx} \right)}{\partial \boldsymbol{x}^T}=\left[ \begin{matrix} \frac{\partial \left( {\boldsymbol{\beta }_1}^T\boldsymbol{x} \right)}{\partial \boldsymbol{x}^T}& \frac{\partial \left( {\boldsymbol{\beta }_2}^T\boldsymbol{x} \right)}{\partial \boldsymbol{x}^T}& \cdots& \frac{\partial \left( {\boldsymbol{\beta }_m}^T\boldsymbol{x} \right)}{\partial \boldsymbol{x}^T}\\ \end{matrix} \right] ^T xT(Bx)=[xT(β1Tx)xT(β2Tx)xT(βmTx)]T
其中每一个列向量,
∂ ( β i T x ) ∂ x T = [ ∂ ( β i T x ) T ∂ x ] T = [ ∂ ( x T β i ) ∂ x ] T   = [ ∂ x T ∂ x ⋅ β i + ∂ β i T ∂ x ⋅ x T ] T   = β i T \begin{aligned} \frac{\partial \left( {\boldsymbol{\beta }_i}^T\boldsymbol{x} \right)}{\partial \boldsymbol{x}^T}&=\left[ \frac{\partial \left( {\boldsymbol{\beta }_i}^T\boldsymbol{x} \right) ^T}{\partial \boldsymbol{x}} \right] ^T=\left[ \frac{\partial \left( \boldsymbol{x}^T\boldsymbol{\beta }_i \right)}{\partial \boldsymbol{x}} \right] ^T \\ \\ \ \\ &=\left[ \frac{\partial \boldsymbol{x}^T}{\partial \boldsymbol{x}}\cdot \boldsymbol{\beta }_i+\frac{\partial {\boldsymbol{\beta }_i}^T}{\partial \boldsymbol{x}}\cdot \boldsymbol{x}^T \right] ^T \\ \ \\ &={\boldsymbol{\beta }_i}^T \end{aligned} xT(βiTx)  = x(βiTx)T T=[x(xTβi)]T=[xxTβi+xβiTxT]T=βiT
因此有:
∂ ( β i T x ) ∂ x T = [ β 1 T β 2 T ⋮ β m T ] = B \frac{\partial \left( {\boldsymbol{\beta }_i}^T\boldsymbol{x} \right)}{\partial \boldsymbol{x}^T}=\left[ \begin{array}{c} {\boldsymbol{\beta }_1}^T\\ \\ {\boldsymbol{\beta }_2}^T\\ \\ \vdots\\ \\ {\boldsymbol{\beta }_m}^T\\ \end{array} \right] =\boldsymbol{B} xT(βiTx)= β1Tβ2TβmT =B
【例4.5】求二次型 x T A x \boldsymbol{x}^T\boldsymbol{Ax} xTAx x \boldsymbol{x} x的导数,其中 A \boldsymbol{A} A为对称矩阵。
【解】根据
∂ [ a T ( x ) ⋅ b ( x ) ] ∂ x = ∂ a T ( x ) ∂ x ⋅ b ( x ) + ∂ b T ( x ) ∂ x ⋅ a ( x ) \frac{\partial \left[ \boldsymbol{a}^T\left( \boldsymbol{x} \right) \cdot \boldsymbol{b}\left( \boldsymbol{x} \right) \right]}{\partial \boldsymbol{x}}=\frac{\partial \boldsymbol{a}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{b}\left( \boldsymbol{x} \right) +\frac{\partial \boldsymbol{b}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{a}\left( \boldsymbol{x} \right) x[aT(x)b(x)]=xaT(x)b(x)+xbT(x)a(x)
有:
∂ [ x T A x ] ∂ x = ∂ x T ∂ x ⋅ ( A x ) + ∂ ( A x ) T ∂ x ⋅ x = A x + ∂ ( x T A T ) ∂ x ⋅ x = A x + A T x = ( A + A T ) x = 2 A x \begin{aligned} \frac{\partial \left[ \boldsymbol{x}^T\boldsymbol{Ax} \right]}{\partial \boldsymbol{x}}&=\frac{\partial \boldsymbol{x}^T}{\partial \boldsymbol{x}}\cdot \left( \boldsymbol{Ax} \right) +\frac{\partial \left( \boldsymbol{Ax} \right) ^T}{\partial \boldsymbol{x}}\cdot \boldsymbol{x} \\ &=\boldsymbol{Ax}+\frac{\partial \left( \boldsymbol{x}^T\boldsymbol{A}^T \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{x} \\ &=\boldsymbol{Ax}+\boldsymbol{A}^T\boldsymbol{x}=\left( \boldsymbol{A}+\boldsymbol{A}^T \right) \boldsymbol{x} \\ &=2\boldsymbol{Ax} \end{aligned} x[xTAx]=xxT(Ax)+x(Ax)Tx=Ax+x(xTAT)x=Ax+ATx=(A+AT)x=2Ax
即:
∂ [ x T A x ] ∂ x = 2 A x \frac{\partial \left[ \boldsymbol{x}^T\boldsymbol{Ax} \right]}{\partial \boldsymbol{x}}=2\boldsymbol{Ax} x[xTAx]=2Ax
又:
∂ α T ( x ) ∂ x = [ ∂ α ( x ) ∂ x T ] T , ∂ α ( x ) ∂ x T = [ ∂ α T ( x ) ∂ x ] T \frac{\partial \boldsymbol{\alpha }^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}=\left[ \frac{\partial \boldsymbol{\alpha }\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}^T} \right] ^T,\frac{\partial \boldsymbol{\alpha }\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}^T}=\left[ \frac{\partial \boldsymbol{\alpha }^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}} \right] ^T xαT(x)=[xTα(x)]T,xTα(x)=[xαT(x)]T
故:
∂ [ x T A x ] ∂ x T = [ ∂ [ x T A x ] T ∂ x ] T = [ ∂ [ x T A x ] ∂ x ] T = 2 ( A x ) T = 2 x T A \frac{\partial \left[ \boldsymbol{x}^T\boldsymbol{Ax} \right]}{\partial \boldsymbol{x}^T}=\left[ \frac{\partial \left[ \boldsymbol{x}^T\boldsymbol{Ax} \right] ^T}{\partial \boldsymbol{x}} \right] ^T=\left[ \frac{\partial \left[ \boldsymbol{x}^T\boldsymbol{Ax} \right]}{\partial \boldsymbol{x}} \right] ^T=2\left( \boldsymbol{Ax} \right) ^T=2\boldsymbol{x}^T\boldsymbol{A} xT[xTAx]=[x[xTAx]T]T=[x[xTAx]]T=2(Ax)T=2xTA
【例4.6】求函数 λ T A x \boldsymbol{\lambda }^T\boldsymbol{Ax} λTAx x \boldsymbol{x} x的导数。其中 λ T \boldsymbol{\lambda }^T λT 1 × n 1\times n 1×n的行向量, A \boldsymbol{A} A n × n n\times n n×n的常数矩阵, x \boldsymbol{x } x n n n维列向量。
【解】因为 λ T A x \boldsymbol{\lambda }^T\boldsymbol{Ax} λTAx为标量,其与其转置相等:
λ T A x = ( λ T A x ) T = x T A λ \boldsymbol{\lambda }^T\boldsymbol{Ax}=\left( \boldsymbol{\lambda }^T\boldsymbol{Ax} \right) ^T=\boldsymbol{x}^T\boldsymbol{A\lambda } λTAx=(λTAx)T=xT
于是:
∂ ( λ T A x ) ∂ x = ∂ ( x T A T λ ) ∂ x = A T λ \frac{\partial \left( \boldsymbol{\lambda }^T\boldsymbol{Ax} \right)}{\partial \boldsymbol{x}}=\frac{\partial \left( \boldsymbol{x}^T\boldsymbol{A}^T\boldsymbol{\lambda } \right)}{\partial \boldsymbol{x}}=\boldsymbol{A}^T\boldsymbol{\lambda } x(λTAx)=x(xTATλ)=ATλ

5 向量对矩阵求导

5.1 定义

设矩阵
X m × n = [ x 11 x 12 ⋯ x 1 n x 21 x 22 ⋯ x 2 n ⋮ ⋮ ⋱ ⋮ x m 1 x m 2 ⋯ x m n ] \boldsymbol{X}_{m\times n}=\left[ \begin{matrix} x_{11}& x_{12}& \cdots& x_{1n}\\ x_{21}& x_{22}& \cdots& x_{2n}\\ \vdots& \vdots& \ddots& \vdots\\ x_{m1}& x_{m2}& \cdots& x_{mn}\\ \end{matrix} \right] Xm×n= x11x21xm1x12x22xm2x1nx2nxmn
以矩阵 X \boldsymbol{X} X为自变量的 n n n维列向量函数:
z ( X ) = [ z 1 ( X ) z 2 ( X ) ⋯ z n ( X ) ] T \boldsymbol{z}\left( \boldsymbol{X} \right) =\left[ \begin{matrix} z_1\left( \boldsymbol{X} \right)& z_2\left( \boldsymbol{X} \right)& \cdots& z_n\\ \end{matrix}\left( \boldsymbol{X} \right) \right] ^T z(X)=[z1(X)z2(X)zn(X)]T
在分子布局下,有:
∂ z ( X ) ∂ X = [ ∂ z 1 ( X ) ∂ X ∂ z 2 ( X ) ∂ X ⋯ ∂ z n ( X ) ∂ X ] T \frac{\partial \boldsymbol{z}\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}=\left[ \begin{matrix} \frac{\partial z_1\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}& \frac{\partial z_2\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}& \cdots& \frac{\partial z_n\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}\\ \end{matrix} \right] ^T Xz(X)=[Xz1(X)Xz2(X)Xzn(X)]T
其中:
∂ z i ( X ) ∂ X = [ ∂ z i ( X ) ∂ x 11 ∂ z i ( X ) ∂ x 12 ⋯ ∂ z i ( X ) ∂ x 1 n ∂ z i ( X ) ∂ x 21 ∂ z i ( X ) ∂ x 22 ⋯ ∂ z i ( X ) ∂ x 2 n ⋮ ⋮ ⋱ ⋮ ∂ z i ( X ) ∂ x m 1 ∂ z i ( X ) ∂ x m 2 ⋯ ∂ z i ( X ) ∂ x m n ] \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}=\left[ \begin{matrix} \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{11}}& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{12}}& \cdots& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{1n}}\\ \\ \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{21}}& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{22}}& \cdots& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{2n}}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{m1}}& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{m2}}& \cdots& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{mn}}\\ \end{matrix} \right] Xzi(X)= x11zi(X)x21zi(X)xm1zi(X)x12zi(X)x22zi(X)xm2zi(X)x1nzi(X)x2nzi(X)xmnzi(X)
在分母布局下,有:
∂ z T ( X ) ∂ X = [ ∂ z 1 ( X ) ∂ X ∂ z 2 ( X ) ∂ X ⋯ ∂ z n ( X ) ∂ X ] \frac{\partial \boldsymbol{z}^T\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}=\left[ \begin{matrix} \frac{\partial z_1\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}& \frac{\partial z_2\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}& \cdots& \frac{\partial z_n\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}\\ \end{matrix} \right] XzT(X)=[Xz1(X)Xz2(X)Xzn(X)]
其中:
∂ z i ( X ) ∂ X = [ ∂ z i ( X ) ∂ x 11 ∂ z i ( X ) ∂ x 12 ⋯ ∂ z i ( X ) ∂ x 1 n ∂ z i ( X ) ∂ x 21 ∂ z i ( X ) ∂ x 22 ⋯ ∂ z i ( X ) ∂ x 2 n ⋮ ⋮ ⋱ ⋮ ∂ z i ( X ) ∂ x m 1 ∂ z i ( X ) ∂ x m 2 ⋯ ∂ z i ( X ) ∂ x m n ] \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}=\left[ \begin{matrix} \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{11}}& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{12}}& \cdots& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{1n}}\\ \\ \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{21}}& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{22}}& \cdots& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{2n}}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{m1}}& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{m2}}& \cdots& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{mn}}\\ \end{matrix} \right] Xzi(X)= x11zi(X)x21zi(X)xm1zi(X)x12zi(X)x22zi(X)xm2zi(X)x1nzi(X)x2nzi(X)xmnzi(X)

5.2 形状规则

向量 y \boldsymbol{y} y对矩阵 X \boldsymbol{X} X求导,分为两步:
Step1:向量 y \boldsymbol{y} y的每个元素是标量,先做 y \boldsymbol{y} y的每个元素对矩阵 X \boldsymbol{X} X求导,这里按照标量对矩阵的求导规则进行。
Step2:第一步完成后,将求导结果按 y \boldsymbol{y} y的形状排列。
详细内容请阅读参考文献【1】。

参考文献

[1] 向量对矩阵求导
[2] 向量,标量对向量求导数

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值