在了解了矩阵求导的定义之后,可以借助定义进行一些机器学习中常用公式的推导。约定
x
\boldsymbol{x}
x为
n
n
n维列向量:
x
=
[
x
1
x
2
⋯
x
n
]
T
\boldsymbol{x}=\left[ \begin{matrix} x_1& x_2& \cdots& x\\ \end{matrix}_n \right] ^T
x=[x1x2⋯xn]T
结论一
∂
a
∂
x
=
0
\frac{\partial a}{\partial \boldsymbol{x}}=0
∂x∂a=0
【证明】
∂
a
∂
x
=
[
∂
a
∂
x
1
∂
a
∂
x
2
⋯
∂
a
∂
x
n
]
T
=
[
0
0
⋯
0
]
T
\frac{\partial a}{\partial \boldsymbol{x}}=\left[ \begin{matrix} \frac{\partial a}{\partial x_1}& \frac{\partial a}{\partial x_2}& \cdots& \frac{\partial a}{\partial x_n}\\ \end{matrix} \right] ^T=\left[ \begin{matrix} 0& 0& \cdots& 0\\ \end{matrix} \right] ^T
∂x∂a=[∂x1∂a∂x2∂a⋯∂xn∂a]T=[00⋯0]T
结论二
∂
(
x
T
⋅
A
)
∂
x
=
∂
(
A
T
⋅
x
)
∂
x
=
A
\frac{\partial \left( \boldsymbol{x}^T\cdot \boldsymbol{A} \right)}{\partial \boldsymbol{x}}=\frac{\partial \left( \boldsymbol{A}^T\cdot \boldsymbol{x} \right)}{\partial \boldsymbol{x}}=\boldsymbol{A}
∂x∂(xT⋅A)=∂x∂(AT⋅x)=A
【证明】
记
A
=
[
α
1
T
α
2
T
⋯
α
n
T
]
T
\boldsymbol{A}=\left[ \begin{matrix} {\boldsymbol{\alpha }_1}^T& {\boldsymbol{\alpha }_2}^T& \cdots& {\boldsymbol{\alpha }_n}^T\\ \end{matrix} \right] ^T
A=[α1Tα2T⋯αnT]T
其中:
α
i
=
[
a
i
1
a
i
2
⋯
a
i
n
]
T
\boldsymbol{\alpha }_i=\left[ \begin{matrix} a_{i1}& a_{i2}& \cdots& a_{in}\\ \end{matrix} \right] ^T
αi=[ai1ai2⋯ain]T
则有:
∂
(
x
T
⋅
A
)
∂
x
=
∂
(
A
T
⋅
x
)
∂
x
=
∂
(
x
1
⋅
α
1
T
+
x
2
⋅
α
2
T
+
⋯
+
x
n
⋅
α
n
T
)
∂
x
=
[
∂
(
x
1
⋅
α
1
T
+
x
2
⋅
α
2
T
+
⋯
+
x
n
⋅
α
n
T
)
∂
x
1
∂
(
x
1
⋅
α
1
T
+
x
2
⋅
α
2
T
+
⋯
+
x
n
⋅
α
n
T
)
∂
x
2
⋮
∂
(
x
1
⋅
α
1
T
+
x
2
⋅
α
2
T
+
⋯
+
x
n
⋅
α
n
T
)
∂
x
n
]
=
[
α
1
T
α
2
T
⋮
α
n
T
]
=
A
\begin{aligned} \frac{\partial \left( \boldsymbol{x}^T\cdot \boldsymbol{A} \right)}{\partial \boldsymbol{x}}&=\frac{\partial \left( \boldsymbol{A}^T\cdot \boldsymbol{x} \right)}{\partial \boldsymbol{x}} \\ &=\frac{\partial \left( x_1\cdot {\boldsymbol{\alpha }_1}^T+x_2\cdot {\boldsymbol{\alpha }_2}^T+\cdots +x_n\cdot {\boldsymbol{\alpha }_n}^T \right)}{\partial \boldsymbol{x}} \\ &=\left[ \begin{array}{c} \frac{\partial \left( x_1\cdot {\boldsymbol{\alpha }_1}^T+x_2\cdot {\boldsymbol{\alpha }_2}^T+\cdots +x_n\cdot {\boldsymbol{\alpha }_n}^T \right)}{\partial x_1}\\ \\ \frac{\partial \left( x_1\cdot {\boldsymbol{\alpha }_1}^T+x_2\cdot {\boldsymbol{\alpha }_2}^T+\cdots +x_n\cdot {\boldsymbol{\alpha }_n}^T \right)}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial \left( x_1\cdot {\boldsymbol{\alpha }_1}^T+x_2\cdot {\boldsymbol{\alpha }_2}^T+\cdots +x_n\cdot {\boldsymbol{\alpha }_n}^T \right)}{\partial x_n}\\ \end{array} \right] \\ &=\left[ \begin{array}{c} {\boldsymbol{\alpha }_1}^T\\ \\ {\boldsymbol{\alpha }_2}^T\\ \\ \vdots\\ \\ {\boldsymbol{\alpha }_n}^T\\ \end{array} \right] =A \end{aligned}
∂x∂(xT⋅A)=∂x∂(AT⋅x)=∂x∂(x1⋅α1T+x2⋅α2T+⋯+xn⋅αnT)=
∂x1∂(x1⋅α1T+x2⋅α2T+⋯+xn⋅αnT)∂x2∂(x1⋅α1T+x2⋅α2T+⋯+xn⋅αnT)⋮∂xn∂(x1⋅α1T+x2⋅α2T+⋯+xn⋅αnT)
=
α1Tα2T⋮αnT
=A
结论三
∂
x
T
x
∂
x
=
2
x
\frac{\partial \boldsymbol{x}^T\boldsymbol{x}}{\partial \boldsymbol{x}}=2\boldsymbol{x}
∂x∂xTx=2x
【证明】设
x
=
[
x
1
,
x
2
,
⋯
,
x
m
]
T
\boldsymbol{x}=\left[ x_1,x_2,\cdots ,x_m \right] ^T
x=[x1,x2,⋯,xm]T
f ( x ) = x T x = x 1 2 + x 2 2 + ⋯ + x n 2 f\left( \boldsymbol{x} \right) =\boldsymbol{x}^T\boldsymbol{x}={x_1}^2+{x_2}^2+\cdots +{x_n}^2 f(x)=xTx=x12+x22+⋯+xn2
∂
f
∂
x
=
[
∂
f
∂
x
1
∂
f
∂
x
2
⋮
∂
f
∂
x
n
]
=
[
2
x
1
2
x
2
⋮
2
x
n
]
=
2
x
\frac{\partial f}{\partial \boldsymbol{x}}=\left[ \begin{array}{c} \frac{\partial f}{\partial x_1}\\ \\ \frac{\partial f}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial f}{\partial x_n}\\ \end{array} \right] =\left[ \begin{array}{c} 2x_1\\ \\ 2x_2\\ \\ \vdots\\ \\ 2x_n\\ \end{array} \right] =2\boldsymbol{x}
∂x∂f=
∂x1∂f∂x2∂f⋮∂xn∂f
=
2x12x2⋮2xn
=2x
即:
∂
x
T
x
∂
x
=
2
x
\frac{\partial \boldsymbol{x}^T\boldsymbol{x}}{\partial \boldsymbol{x}}=2\boldsymbol{x}
∂x∂xTx=2x
结论四
∂
(
x
T
A
x
)
∂
x
=
A
x
+
A
T
x
\frac{\partial \left( \boldsymbol{x}^T\boldsymbol{Ax} \right)}{\partial \boldsymbol{x}}=\boldsymbol{Ax}+\boldsymbol{A}^T\boldsymbol{x}
∂x∂(xTAx)=Ax+ATx
【证明】
x
T
A
x
=
[
x
1
x
2
⋯
x
n
]
⋅
[
a
11
a
12
⋯
a
1
n
a
21
a
22
⋯
a
2
n
⋮
⋮
⋱
⋮
a
n
1
a
n
2
⋯
a
n
n
]
⋅
[
x
1
x
2
⋮
x
n
]
=
[
x
1
a
11
+
x
2
a
21
+
⋯
+
x
n
a
n
1
x
1
a
12
+
x
2
a
22
+
⋯
+
x
n
a
n
2
⋯
x
1
a
1
n
+
x
2
a
2
n
+
⋯
+
x
n
a
n
n
]
⋅
[
x
1
x
2
⋮
x
n
]
=
x
1
(
x
1
a
11
+
x
2
a
21
+
⋯
+
x
n
a
n
1
)
+
x
2
(
x
1
a
12
+
x
2
a
22
+
⋯
+
x
n
a
n
2
)
+
⋯
+
x
n
(
x
1
a
1
n
+
x
2
a
2
n
+
⋯
+
x
n
a
n
n
)
\begin{aligned} \boldsymbol{x}^T\boldsymbol{Ax}&=\left[ \begin{matrix} x_1& x_2& \cdots& x_n\\ \end{matrix} \right] \cdot \left[ \begin{matrix} a_{11}& a_{12}& \cdots& a_{1n}\\ a_{21}& a_{22}& \cdots& a_{2n}\\ \vdots& \vdots& \ddots& \vdots\\ a_{n1}& a_{n2}& \cdots& a_{nn}\\ \end{matrix} \right] \cdot \left[ \begin{array}{c} x_1\\ x_2\\ \vdots\\ x_n\\ \end{array} \right] \\ &=\left[ \begin{matrix} x_1a_{11}+x_2a_{21}+\cdots +x_na_{n1}& x_1a_{12}+x_2a_{22}+\cdots +x_na_{n2}& \cdots& x_1a_{1n}+x_2a_{2n}+\cdots +x_na_{nn}\\ \end{matrix} \right] \cdot \left[ \begin{array}{c} x_1\\ x_2\\ \vdots\\ x_n\\ \end{array} \right] \\ &=x_1\left( x_1a_{11}+x_2a_{21}+\cdots +x_na_{n1} \right) +x_2\left( x_1a_{12}+x_2a_{22}+\cdots +x_na_{n2} \right) +\cdots +x_n\left( x_1a_{1n}+x_2a_{2n}+\cdots +x_na_{nn} \right) \end{aligned}
xTAx=[x1x2⋯xn]⋅
a11a21⋮an1a12a22⋮an2⋯⋯⋱⋯a1na2n⋮ann
⋅
x1x2⋮xn
=[x1a11+x2a21+⋯+xnan1x1a12+x2a22+⋯+xnan2⋯x1a1n+x2a2n+⋯+xnann]⋅
x1x2⋮xn
=x1(x1a11+x2a21+⋯+xnan1)+x2(x1a12+x2a22+⋯+xnan2)+⋯+xn(x1a1n+x2a2n+⋯+xnann)
记
f
(
x
)
=
x
T
A
x
f\left( \boldsymbol{x} \right) =\boldsymbol{x}^T\boldsymbol{Ax}
f(x)=xTAx
则:
∂
f
(
x
)
∂
x
1
=
(
x
1
a
11
+
x
2
a
21
+
⋯
+
x
n
a
n
1
)
+
(
x
1
a
11
+
x
2
a
12
+
⋯
+
x
n
a
1
n
)
\frac{\partial f\left( \boldsymbol{x} \right)}{\partial x_1}=\left( x_1a_{11}+x_2a_{21}+\cdots +x_na_{n1} \right) +\left( x_1a_{11}+x_2a_{12}+\cdots +x_na_{1\boldsymbol{n}} \right)
∂x1∂f(x)=(x1a11+x2a21+⋯+xnan1)+(x1a11+x2a12+⋯+xna1n)
∂
f
(
x
)
∂
x
=
[
∂
f
(
x
)
∂
x
1
∂
f
(
x
)
∂
x
2
⋮
∂
f
(
x
)
∂
x
n
]
=
[
(
x
1
a
11
+
x
2
a
21
+
⋯
+
x
n
a
n
1
)
+
(
x
1
a
11
+
x
2
a
12
+
⋯
+
x
n
a
1
n
)
(
x
1
a
12
+
x
2
a
22
+
⋯
+
x
n
a
n
2
)
+
(
x
1
a
21
+
x
2
a
22
+
⋯
+
x
n
a
2
n
)
⋮
(
x
1
a
1
n
+
x
2
a
2
n
+
⋯
+
x
n
a
n
n
)
+
(
x
1
a
n
1
+
x
2
a
n
2
+
⋯
+
x
n
a
n
n
)
]
=
[
x
1
a
11
+
x
2
a
21
+
⋯
+
x
n
a
n
1
x
1
a
12
+
x
2
a
22
+
⋯
+
x
n
a
n
2
⋮
x
1
a
1
n
+
x
2
a
2
n
+
⋯
+
x
n
a
n
n
]
+
[
x
1
a
11
+
x
2
a
12
+
⋯
+
x
n
a
1
n
x
1
a
21
+
x
2
a
22
+
⋯
+
x
n
a
2
n
⋮
x
1
a
n
1
+
x
2
a
n
2
+
⋯
+
x
n
a
n
n
]
=
[
a
11
a
21
⋯
a
n
1
a
12
a
22
⋯
a
n
2
⋮
⋮
⋱
⋮
a
1
n
a
2
n
⋯
a
n
n
]
⋅
[
x
1
x
2
⋮
x
n
]
+
[
a
11
a
12
⋯
a
1
n
a
21
a
22
⋯
a
2
n
⋮
⋮
⋱
⋮
a
n
1
a
n
2
⋯
a
n
n
]
⋅
[
x
1
x
2
⋮
x
n
]
=
A
T
x
+
A
x
\begin{aligned} \frac{\partial f\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}&=\left[ \begin{array}{c} \frac{\partial f\left( \boldsymbol{x} \right)}{\partial x_1}\\ \\ \frac{\partial f\left( \boldsymbol{x} \right)}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial f\left( \boldsymbol{x} \right)}{\partial x_n}\\ \end{array} \right] =\left[ \begin{array}{c} \left( x_1a_{11}+x_2a_{21}+\cdots +x_na_{n1} \right) +\left( x_1a_{11}+x_2a_{12}+\cdots +x_na_{1n} \right)\\ \\ \left( x_1a_{12}+x_2a_{22}+\cdots +x_na_{n2} \right) +\left( x_1a_{21}+x_2a_{22}+\cdots +x_na_{2n} \right)\\ \\ \vdots\\ \\ \left( x_1a_{1n}+x_2a_{2n}+\cdots +x_na_{nn} \right) +\left( x_1a_{n1}+x_2a_{n2}+\cdots +x_na_{nn} \right)\\ \end{array} \right] \\ \\ &=\left[ \begin{array}{c} x_1a_{11}+x_2a_{21}+\cdots +x_na_{n1}\\ \\ x_1a_{12}+x_2a_{22}+\cdots +x_na_{n2}\\ \\ \vdots\\ \\ x_1a_{1n}+x_2a_{2n}+\cdots +x_na_{nn}\\ \end{array} \right] +\left[ \begin{array}{c} x_1a_{11}+x_2a_{12}+\cdots +x_na_{1n}\\ \\ x_1a_{21}+x_2a_{22}+\cdots +x_na_{2n}\\ \\ \vdots\\ \\ x_1a_{n1}+x_2a_{n2}+\cdots +x_na_{nn}\\ \end{array} \right] \\ \\ &=\left[ \begin{matrix} a_{11}& a_{21}& \cdots& a_{n1}\\ a_{12}& a_{22}& \cdots& a_{n2}\\ \vdots& \vdots& \ddots& \vdots\\ a_{1n}& a_{2n}& \cdots& a_{nn}\\ \end{matrix} \right] \cdot \left[ \begin{array}{c} x_1\\ x_2\\ \vdots\\ x_n\\ \end{array} \right] +\left[ \begin{matrix} a_{11}& a_{12}& \cdots& a_{1n}\\ a_{21}& a_{22}& \cdots& a_{2n}\\ \vdots& \vdots& \ddots& \vdots\\ a_{n1}& a_{n2}& \cdots& a_{nn}\\ \end{matrix} \right] \cdot \left[ \begin{array}{c} x_1\\ x_2\\ \vdots\\ x_n\\ \end{array} \right] \\ \\ &=\boldsymbol{A}^T\boldsymbol{x}+\boldsymbol{Ax} \end{aligned}
∂x∂f(x)=
∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x)
=
(x1a11+x2a21+⋯+xnan1)+(x1a11+x2a12+⋯+xna1n)(x1a12+x2a22+⋯+xnan2)+(x1a21+x2a22+⋯+xna2n)⋮(x1a1n+x2a2n+⋯+xnann)+(x1an1+x2an2+⋯+xnann)
=
x1a11+x2a21+⋯+xnan1x1a12+x2a22+⋯+xnan2⋮x1a1n+x2a2n+⋯+xnann
+
x1a11+x2a12+⋯+xna1nx1a21+x2a22+⋯+xna2n⋮x1an1+x2an2+⋯+xnann
=
a11a12⋮a1na21a22⋮a2n⋯⋯⋱⋯an1an2⋮ann
⋅
x1x2⋮xn
+
a11a21⋮an1a12a22⋮an2⋯⋯⋱⋯a1na2n⋮ann
⋅
x1x2⋮xn
=ATx+Ax
结论五
∂
(
x
T
a
)
∂
x
=
∂
(
a
T
x
)
∂
x
=
a
\frac{\partial \left( \boldsymbol{x}^T\boldsymbol{a} \right)}{\partial \boldsymbol{x}}=\frac{\partial \left( \boldsymbol{a}^T\boldsymbol{x} \right)}{\partial \boldsymbol{x}}=\boldsymbol{a}
∂x∂(xTa)=∂x∂(aTx)=a
其中
a
\boldsymbol{a}
a为常数向量:
a
=
[
a
1
a
2
⋯
a
n
]
T
\boldsymbol{a}=\left[ \begin{matrix} a_1& a_2& \cdots& a_n\\ \end{matrix} \right] ^T
a=[a1a2⋯an]T
【证明】
∂
(
x
T
a
)
∂
x
=
∂
(
a
T
x
)
∂
x
=
∂
(
x
1
a
1
+
x
2
a
2
+
⋯
+
x
n
a
n
)
∂
x
=
[
∂
(
x
1
a
1
+
x
2
a
2
+
⋯
+
x
n
a
n
)
∂
x
1
∂
(
x
1
a
1
+
x
2
a
2
+
⋯
+
x
n
a
n
)
∂
x
2
⋮
∂
(
x
1
a
1
+
x
2
a
2
+
⋯
+
x
n
a
n
)
∂
x
n
]
=
[
a
1
a
2
⋮
a
n
]
=
a
\begin{aligned} \frac{\partial \left( \boldsymbol{x}^T\boldsymbol{a} \right)}{\partial \boldsymbol{x}}&=\frac{\partial \left( \boldsymbol{a}^T\boldsymbol{x} \right)}{\partial \boldsymbol{x}} \\ \\ &=\frac{\partial \left( x_1a_1+x_2a_2+\cdots +x_na_n \right)}{\partial \boldsymbol{x}} \\ \\ &=\left[ \begin{array}{c} \frac{\partial \left( x_1a_1+x_2a_2+\cdots +x_na_n \right)}{\partial x_1}\\ \\ \frac{\partial \left( x_1a_1+x_2a_2+\cdots +x_na_n \right)}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial \left( x_1a_1+x_2a_2+\cdots +x_na_n \right)}{\partial x_n}\\ \\ \end{array} \right] \\ &=\left[ \begin{array}{c} a_1\\ \\ a_2\\ \\ \vdots\\ \\ a_n\\ \end{array} \right] \\ &=\boldsymbol{a} \end{aligned}
∂x∂(xTa)=∂x∂(aTx)=∂x∂(x1a1+x2a2+⋯+xnan)=
∂x1∂(x1a1+x2a2+⋯+xnan)∂x2∂(x1a1+x2a2+⋯+xnan)⋮∂xn∂(x1a1+x2a2+⋯+xnan)
=
a1a2⋮an
=a
结论六
∂
(
a
T
x
x
T
b
)
∂
x
=
a
b
T
x
+
b
a
T
x
\frac{\partial \left( \boldsymbol{a}^T\boldsymbol{xx}^T\boldsymbol{b} \right)}{\partial \boldsymbol{x}}=\boldsymbol{ab}^T\boldsymbol{x}+\boldsymbol{ba}^T\boldsymbol{x}
∂x∂(aTxxTb)=abTx+baTx
其中
a
\boldsymbol{a}
a,
b
\boldsymbol{b}
b为常数向量:
a
=
[
a
1
a
2
⋯
a
n
]
T
b
=
[
b
1
b
2
⋯
b
n
]
T
\boldsymbol{a}=\left[ \begin{matrix} a_1& a_2& \cdots& a_n\\ \end{matrix} \right] ^T \\ \ \ \\ \boldsymbol{b}=\left[ \begin{matrix} b_1& b_2& \cdots& b_n\\ \end{matrix} \right] ^T
a=[a1a2⋯an]T b=[b1b2⋯bn]T
【证明】
因为
a
T
x
=
x
T
a
,
x
T
b
=
b
T
x
\boldsymbol{a}^T\boldsymbol{x}=\boldsymbol{x}^T\boldsymbol{a}, \boldsymbol{x}^T\boldsymbol{b}=\boldsymbol{b}^T\boldsymbol{x}
aTx=xTa,xTb=bTx,所以有:
∂
(
a
T
x
x
T
b
)
∂
x
=
∂
(
x
T
a
b
T
x
)
∂
x
\frac{\partial \left( \boldsymbol{a}^T\boldsymbol{xx}^T\boldsymbol{b} \right)}{\partial \boldsymbol{x}}=\frac{\partial \left( \boldsymbol{x}^T\boldsymbol{ab}^T\boldsymbol{x} \right)}{\partial \boldsymbol{x}}
∂x∂(aTxxTb)=∂x∂(xTabTx)
又因为
a
b
T
\boldsymbol{ab}^T
abT是
n
×
n
n\times n
n×n的常数矩阵,由结论四
可知:
∂
(
a
T
x
x
T
b
)
∂
x
=
∂
(
x
T
a
b
T
x
)
∂
x
=
a
b
T
x
+
b
a
T
x
\frac{\partial \left( \boldsymbol{a}^T\boldsymbol{xx}^T\boldsymbol{b} \right)}{\partial \boldsymbol{x}}=\frac{\partial \left( \boldsymbol{x}^T\boldsymbol{ab}^T\boldsymbol{x} \right)}{\partial \boldsymbol{x}}=\boldsymbol{ab}^T\boldsymbol{x}+\boldsymbol{ba}^T\boldsymbol{x}
∂x∂(aTxxTb)=∂x∂(xTabTx)=abTx+baTx
结论七
∂
(
a
T
X
b
)
∂
X
=
a
b
T
\frac{\partial \left( \boldsymbol{a}^T\boldsymbol{Xb} \right)}{\partial \boldsymbol{X}}=\boldsymbol{ab}^T
∂X∂(aTXb)=abT
其中
a
\boldsymbol{a}
a,
b
\boldsymbol{b}
b为常数向量:
a
=
[
a
1
a
2
⋯
a
m
]
T
b
=
[
b
1
b
2
⋯
b
n
]
T
\boldsymbol{a}=\left[ \begin{matrix} a_1& a_2& \cdots& a_m\\ \end{matrix} \right] ^T \\ \ \ \\ \boldsymbol{b}=\left[ \begin{matrix} b_1& b_2& \cdots& b_n\\ \end{matrix} \right] ^T
a=[a1a2⋯am]T b=[b1b2⋯bn]T
【证明】
a
T
X
b
=
[
a
1
a
2
⋯
a
m
]
⋅
[
x
11
x
12
⋯
x
1
n
x
21
x
22
⋯
x
2
n
⋮
⋮
⋱
⋮
x
m
1
x
m
2
⋯
x
m
n
]
⋅
[
b
1
b
2
⋮
b
n
]
=
[
a
1
x
11
+
a
2
x
21
+
⋯
+
a
m
a
m
1
a
1
x
12
+
a
2
x
22
+
⋯
+
a
m
a
m
2
⋯
a
1
x
1
n
+
a
2
x
2
n
+
⋯
+
a
m
a
m
n
]
⋅
[
b
1
b
2
⋮
b
n
]
=
b
1
(
a
1
x
11
+
a
2
x
21
+
⋯
+
a
m
a
m
1
)
+
b
2
(
a
1
x
12
+
a
2
x
22
+
⋯
+
a
m
a
m
2
)
+
⋯
+
b
n
(
a
1
x
1
n
+
a
2
x
2
n
+
⋯
+
a
m
a
m
n
)
\begin{aligned} \boldsymbol{a}^T\boldsymbol{Xb}&=\left[ \begin{matrix} a_1& a_2& \cdots& a_m\\ \end{matrix} \right] \cdot \left[ \begin{matrix} x_{11}& x_{12}& \cdots& x_{1n}\\ x_{21}& x_{22}& \cdots& x_{2n}\\ \vdots& \vdots& \ddots& \vdots\\ x_{m1}& x_{m2}& \cdots& x_{mn}\\ \end{matrix} \right] \cdot \left[ \begin{array}{c} b_1\\ b_2\\ \vdots\\ b_n\\ \end{array} \right] \\ &=\left[ \begin{matrix} a_1x_{11}+a_2x_{21}+\cdots +a_ma_{m1}& a_1x_{12}+a_2x_{22}+\cdots +a_ma_{m2}& \cdots& a_1x_{1n}+a_2x_{2n}+\cdots +a_ma_{mn}\\ \end{matrix} \right] \cdot \left[ \begin{array}{c} b_1\\ b_2\\ \vdots\\ b_n\\ \end{array} \right] \\ &=b_1\left( a_1x_{11}+a_2x_{21}+\cdots +a_ma_{m1} \right) +b_2\left( a_1x_{12}+a_2x_{22}+\cdots +a_ma_{m2} \right) +\cdots +b_n\left( a_1x_{1n}+a_2x_{2n}+\cdots +a_ma_{mn} \right) \end{aligned}
aTXb=[a1a2⋯am]⋅
x11x21⋮xm1x12x22⋮xm2⋯⋯⋱⋯x1nx2n⋮xmn
⋅
b1b2⋮bn
=[a1x11+a2x21+⋯+amam1a1x12+a2x22+⋯+amam2⋯a1x1n+a2x2n+⋯+amamn]⋅
b1b2⋮bn
=b1(a1x11+a2x21+⋯+amam1)+b2(a1x12+a2x22+⋯+amam2)+⋯+bn(a1x1n+a2x2n+⋯+amamn)
记
f
(
X
)
=
a
T
X
b
f\left( \boldsymbol{X} \right) =\boldsymbol{a}^T\boldsymbol{Xb}
f(X)=aTXb
则:
∂
(
a
T
X
b
)
∂
X
=
∂
f
(
X
)
∂
X
=
[
∂
f
∂
x
11
∂
f
∂
x
12
⋯
∂
f
∂
x
1
n
∂
f
∂
x
21
∂
f
∂
x
22
⋯
∂
f
∂
x
2
n
⋮
⋮
⋱
⋮
∂
f
∂
x
m
1
∂
f
∂
x
m
2
⋯
∂
f
∂
x
m
n
]
m
×
n
=
[
a
1
b
1
a
1
b
2
⋯
a
1
b
n
a
2
b
1
a
2
b
2
⋯
a
2
b
n
⋮
⋮
⋱
⋮
a
m
b
1
a
m
b
2
⋯
a
m
b
n
]
m
×
n
=
[
a
1
a
2
⋮
a
m
]
⋅
[
b
1
b
2
⋯
b
n
]
=
a
b
T
\begin{aligned} \frac{\partial \left( \boldsymbol{a}^T\boldsymbol{Xb} \right)}{\partial \boldsymbol{X}}&=\frac{\partial f\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}} \\ \\ &=\left[ \begin{matrix} \frac{\partial f}{\partial x_{11}}& \frac{\partial f}{\partial x_{12}}& \cdots& \frac{\partial f}{\partial x_{1n}}\\ \\ \frac{\partial f}{\partial x_{21}}& \frac{\partial f}{\partial x_{22}}& \cdots& \frac{\partial f}{\partial x_{2n}}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial f}{\partial x_{m1}}& \frac{\partial f}{\partial x_{m2}}& \cdots& \frac{\partial f}{\partial x_{mn}}\\ \end{matrix} \right] _{m\times n} \\ \\ &=\left[ \begin{matrix} a_1b_1& a_1b_2& \cdots& a_1b_n\\ \\ a_2b_1& a_2b_2& \cdots& a_2b_n\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ a_mb_1& a_mb_2& \cdots& a_mb_n\\ \end{matrix} \right] _{m\times n} \\ \\ &=\left[ \begin{array}{c} a_1\\ \\ a_2\\ \\ \vdots\\ \\ a_m\\ \end{array} \right] \cdot \left[ \begin{matrix} b_1& b_2& \cdots& b_n\\ \end{matrix} \right] \\ \\ &=\boldsymbol{ab}^T \end{aligned}
∂X∂(aTXb)=∂X∂f(X)=
∂x11∂f∂x21∂f⋮∂xm1∂f∂x12∂f∂x22∂f⋮∂xm2∂f⋯⋯⋱⋯∂x1n∂f∂x2n∂f⋮∂xmn∂f
m×n=
a1b1a2b1⋮amb1a1b2a2b2⋮amb2⋯⋯⋱⋯a1bna2bn⋮ambn
m×n=
a1a2⋮am
⋅[b1b2⋯bn]=abT
参考文献
[1] 机器学习中的矩阵求导方法
[2] 矩阵求导公式的数学推导
[3] 矩阵的求导