一、矩阵求导的基本概念
在高等数学和线性代数等基础课程上,我们已经学会了标量的求导,包括多元函数求偏导,矩阵等元素的基本运算了。
求导是什么?实数对实数的求导,就是“可微”,求极限。那么同样的概念套到矩阵上,矩阵不过是多个数据的组合,元素一一对应求导便是,另外,可以用链式法则。分子、分母有可能是:标量、向量、矩阵。
二、矩阵求导的布局
矩阵求导,一般会看到有两种布局,一种分子布局,一种分母布局。矩阵求导术
首先是分子布局,Jacobian矩阵:
∂
f
∂
x
=
[
∂
f
1
∂
x
1
∂
f
1
∂
x
2
⋯
∂
f
1
∂
x
m
∂
f
2
∂
x
1
∂
f
2
∂
x
2
⋯
∂
f
2
∂
x
m
⋮
⋮
⋱
⋮
∂
f
m
∂
x
1
∂
f
m
∂
x
2
⋯
∂
f
m
∂
x
m
]
\frac{\partial \boldsymbol{f}}{\partial \boldsymbol{x}}=\left[\begin{array}{cccc} \frac{\partial f_{1}}{\partial x_{1}} & \frac{\partial f_{1}}{\partial x_{2}} & \cdots & \frac{\partial f_{1}}{\partial x_{m}} \\ \frac{\partial f_{2}}{\partial x_{1}} & \frac{\partial f_{2}}{\partial x_{2}} & \cdots & \frac{\partial f_{2}}{\partial x_{m}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_{m}}{\partial x_{1}} & \frac{\partial f_{m}}{\partial x_{2}} & \cdots & \frac{\partial f_{m}}{\partial x_{m}} \end{array}\right]
∂x∂f=
∂x1∂f1∂x1∂f2⋮∂x1∂fm∂x2∂f1∂x2∂f2⋮∂x2∂fm⋯⋯⋱⋯∂xm∂f1∂xm∂f2⋮∂xm∂fm
其次是分母布局,Hessian矩阵:
∂
f
∂
x
=
[
∂
f
1
∂
x
1
∂
f
2
∂
x
1
⋯
∂
f
p
∂
x
1
∂
f
1
∂
x
2
∂
f
2
∂
x
2
⋯
∂
f
p
∂
x
2
⋮
⋮
⋱
⋮
∂
f
1
∂
x
m
∂
f
2
∂
x
m
⋯
∂
f
p
∂
x
m
]
\frac{\partial \boldsymbol{f}}{\partial \boldsymbol{x}}=\left[\begin{array}{cccc} \frac{\partial f_{1}}{\partial x_{1}} & \frac{\partial f_{2}}{\partial x_{1}} & \cdots & \frac{\partial f_{p}}{\partial x_{1}} \\ \frac{\partial f_{1}}{\partial x_{2}} & \frac{\partial f_{2}}{\partial x_{2}} & \cdots & \frac{\partial f_{p}}{\partial x_{2}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_{1}}{\partial x_{m}} & \frac{\partial f_{2}}{\partial x_{m}} & \cdots & \frac{\partial f_{p}}{\partial x_{m}} \end{array}\right]
∂x∂f=
∂x1∂f1∂x2∂f1⋮∂xm∂f1∂x1∂f2∂x2∂f2⋮∂xm∂f2⋯⋯⋱⋯∂x1∂fp∂x2∂fp⋮∂xm∂fp
可以观察到,两者的行列分别按照分子、分母展开的。个人倾向于分母布局,即将分子当成行向量,分母当成列向量。
三、矩阵求导公式推导
鉴于网上已经有很多介绍矩阵求导的文章,所以本文中给出 矩阵求导详例 后,只简单地描述一下,后续想要验证结果的同学可以根据前面的定义式子一一展开进行计算。
首先是:
- 标量对列向量求导,结果是一个列向量,结果列向量的每个元素,等于标量对原列向量中对应的元素求导。(把列向量换成行向量/矩阵,是同理的。)
x = [ x 1 , x 2 , x 3 ] T {\bf x}=\left[x_1, x_2,x_3 \right]^T x=[x1,x2,x3]T
f ( x ) = a 1 x 1 + a 2 x 2 2 + a 3 x 2 x 3 f({\bf x})=a_1x_1+a_2x_2^2+a_3x_2x_3 f(x)=a1x1+a2x22+a3x2x3
有
∂ f ( x ) ∂ x = [ a 1 x 1 + a 2 x 2 2 + a 3 x 2 x 3 ∂ x 1 a 1 x 1 + a 2 x 2 2 + a 3 x 2 x 3 ∂ x 2 a 1 x 1 + a 2 x 2 2 + a 3 x 2 x 3 ∂ x 3 ] = [ a 1 x 1 ∂ x 1 a 2 x 2 2 + a 3 x 2 x 3 ∂ x 2 a 3 x 2 x 3 ∂ x 3 ] = [ a 1 2 a 2 + a 3 x 3 a 3 x 2 ] \begin {equation} \begin {aligned} \frac {\partial f({\bf x})}{\partial {\bf x}} &=\left[ \begin{matrix} \frac {a_1x_1+a_2x_2^2+a_3x_2x_3}{\partial {x_1}} \\ \frac {a_1x_1+a_2x_2^2+a_3x_2x_3}{\partial {x_2}}\\ \frac {a_1x_1+a_2x_2^2+a_3x_2x_3}{\partial {x_3}} \end{matrix} \right] \\ & =\left[ \begin{matrix} \frac {a_1x_1}{\partial {x_1}} \\ \frac {a_2x_2^2+a_3x_2x_3}{\partial {x_2}}\\ \frac {a_3x_2x_3}{\partial {x_3}} \end{matrix} \right] = \left[ \begin{matrix} a_1 \\ 2a_2+a_3x_3\\ a_3x_2 \end{matrix} \right] \end {aligned} \end {equation} ∂x∂f(x)= ∂x1a1x1+a2x22+a3x2x3∂x2a1x1+a2x22+a3x2x3∂x3a1x1+a2x22+a3x2x3 = ∂x1a1x1∂x2a2x22+a3x2x3∂x3a3x2x3 = a12a2+a3x3a3x2 - m维列向量 (行向量)对 n维列向量(行向量) 求导,得到的是矩阵,mn维/nm维(取决于自己的习惯),展开方式如第二章所言,两种布局。
- 矩阵对向量,矩阵对矩阵,都是多维的“张量”。
矩阵,矩阵的迹,求导速查 超级简洁的文章,推荐!
四、矩阵求导常用式子
查表大全
黑体大写字母为矩阵,黑体小写字母为向量,小写字母为常量。
v
,
v
,
u
,
u
v,{\bf v}, u,{\bf u}
v,v,u,u, 是
x
{\bf x}
x的函数。
分母布局,即分母为列向量的写法。
常用例子:
-
向量-向量
∂ A x ∂ x = A T \frac {\partial {\bf A} {\bf x}}{\partial {\bf x}} ={\bf A}^T ∂x∂Ax=AT
∂ x T A ∂ x = A \frac {\partial {\bf x}^T{\bf A} }{\partial {\bf x}} ={\bf A} ∂x∂xTA=A
∂ a u ∂ x = a ∂ u ∂ x \frac {\partial a{\bf u} }{\partial {\bf x}} =a\frac {\partial {\bf u} }{\partial {\bf x}} ∂x∂au=a∂x∂u
∂ A u ∂ x = ∂ u ∂ x A T \frac {\partial {\bf A}{\bf u} }{\partial {\bf x}} =\frac {\partial {\bf u} }{\partial {\bf x}} {\bf A}^T ∂x∂Au=∂x∂uAT
∂ g ( u ) ∂ x = ∂ u ∂ x ∂ g ( u ) ∂ u \frac {\partial {\bf g}({\bf u}) }{\partial {\bf x}} =\frac {\partial {\bf u} }{\partial {\bf x}} \frac {\partial {\bf g}({\bf u}) }{\partial {\bf u}} ∂x∂g(u)=∂x∂u∂u∂g(u) -
标量-向量
∂ g ( u ) ∂ x = ∂ g ( u ) ∂ u ∂ u ∂ x \frac {\partial {g}({u}) }{\partial {\bf x}} =\frac {\partial {g}({u}) }{\partial {u}} \frac {\partial {u} }{\partial {\bf x}} ∂x∂g(u)=∂u∂g(u)∂x∂u
∂ x T A x ∂ x = ( A + A T ) x \frac {\partial {\bf x}^T{\bf A} {\bf x}}{\partial {\bf x}} =(A + A^T ){\bf x} ∂x∂xTAx=(A+AT)x
∂ 2 x T A x ∂ x ∂ x T = A + A T \frac {\partial ^2 {\bf x}^T{\bf A} {\bf x}}{\partial {\bf x} \partial {\bf x}^T} =A + A^T ∂x∂xT∂2xTAx=A+AT
∂ b T A x ∂ x = A T b \frac {\partial {\bf b}^T {\bf A}{\bf x}}{\partial {\bf x} } = {\bf A}^T{\bf b} ∂x∂bTAx=ATb
∂ a T x x T b ∂ x = ( a b T + b a T ) x \frac {\partial {\bf a}^T {\bf x}{\bf x}^T{\bf b}}{\partial {\bf x} } =({\bf a}{\bf b}^T+{\bf b}{\bf a}^T){\bf x} ∂x∂aTxxTb=(abT+baT)x -
向量-标量
∂ A u ∂ x = ∂ u ∂ x A T \frac {\partial {\bf A}{\bf u}}{\partial {x} } =\frac {\partial {\bf u}}{\partial {x} } {\bf A}^T ∂x∂Au=∂x∂uAT -
向量-矩阵
∂ ( A X B X T C ) ∂ X = C A X B + A T C T X B T \frac {\partial ({\bf A}{\bf X}{\bf B}{\bf X}^T{\bf C})}{\partial {\bf X} } ={\bf CAXB} + {\bf A}^T{\bf C}^T{\bf X}{\bf B}^T ∂X∂(AXBXTC)=CAXB+ATCTXBT -
标量-矩阵
∂ a ∂ X = 0 \frac {\partial a}{\partial {\bf X} } ={\bf 0} ∂X∂a=0
∂ u v ∂ X = u ∂ v ∂ X + v ∂ u ∂ X \frac {\partial uv}{\partial {\bf X} } =u\frac {\partial v}{\partial {\bf X} }+v\frac {\partial u}{\partial {\bf X} } ∂X∂uv=u∂X∂v+v∂X∂u
∂ a T X b ∂ X = a b T \frac {\partial {\bf a}^T{\bf Xb}}{\partial {\bf X} } ={\bf a}{\bf b}^T ∂X∂aTXb=abT
∂ a T X T b ∂ X = b a T \frac {\partial {\bf a}^T{\bf X}^T{\bf b}}{\partial {\bf X} } ={\bf b}{\bf a}^T ∂X∂aTXTb=baT
∂ ( X a + b ) T C ( X a + b ) ∂ X = ( C + C T ) ( X a + b ) a T \frac {\partial {\bf (Xa+b)}^T{\bf C} {\bf (Xa+b)}}{\partial {\bf X} } =({\bf C}+{\bf C}^T){\bf (Xa+b)}{\bf a}^T ∂X∂(Xa+b)TC(Xa+b)=(C+CT)(Xa+b)aT.