结论八
∂
(
a
T
X
T
b
)
∂
X
=
b
a
T
\frac{\partial \left( \boldsymbol{a}^T\boldsymbol{X}^T\boldsymbol{b} \right)}{\partial \boldsymbol{X}}=\boldsymbol{ba}^T
∂X∂(aTXTb)=baT
其中
a
\boldsymbol{a}
a,
b
\boldsymbol{b}
b为常数向量:
a
=
[
a
1
a
2
⋯
a
n
]
T
b
=
[
b
1
b
2
⋯
b
m
]
T
\boldsymbol{a}=\left[ \begin{matrix} a_1& a_2& \cdots& a_n\\ \end{matrix} \right] ^T \\ \ \ \\ \boldsymbol{b}=\left[ \begin{matrix} b_1& b_2& \cdots& b_m\\ \end{matrix} \right] ^T
a=[a1a2⋯an]T b=[b1b2⋯bm]T
【证明】
由于标量的转置仍然等于自己,故:
∂
(
a
T
X
T
b
)
∂
X
=
∂
(
a
T
X
T
b
)
T
∂
X
=
∂
(
b
T
X
a
)
∂
X
\frac{\partial \left( \boldsymbol{a}^T\boldsymbol{X}^T\boldsymbol{b} \right)}{\partial \boldsymbol{X}}=\frac{\partial \left( \boldsymbol{a}^T\boldsymbol{X}^T\boldsymbol{b} \right) ^T}{\partial \boldsymbol{X}}=\frac{\partial \left( \boldsymbol{b}^T\boldsymbol{Xa} \right)}{\partial \boldsymbol{X}}
∂X∂(aTXTb)=∂X∂(aTXTb)T=∂X∂(bTXa)
由结论七
可知:
∂
(
a
T
X
T
b
)
∂
X
=
b
a
T
\frac{\partial \left( \boldsymbol{a}^T\boldsymbol{X}^T\boldsymbol{b} \right)}{\partial \boldsymbol{X}}=\boldsymbol{ba}^T
∂X∂(aTXTb)=baT
结论九
∂
(
a
T
X
X
T
b
)
∂
X
=
a
b
T
X
+
b
a
T
X
\frac{\partial \left( \boldsymbol{a}^T\boldsymbol{XX}^T\boldsymbol{b} \right)}{\partial \boldsymbol{X}}=\boldsymbol{ab}^T\boldsymbol{X}+\boldsymbol{ba}^T\boldsymbol{X}
∂X∂(aTXXTb)=abTX+baTX
其中
a
\boldsymbol{a}
a,
b
\boldsymbol{b}
b为常数向量:
a
=
[
a
1
a
2
⋯
a
m
]
T
b
=
[
b
1
b
2
⋯
b
m
]
T
\boldsymbol{a}=\left[ \begin{matrix} a_1& a_2& \cdots& a_m\\ \end{matrix} \right] ^T \\ \ \ \\ \boldsymbol{b}=\left[ \begin{matrix} b_1& b_2& \cdots& b_m\\ \end{matrix} \right] ^T
a=[a1a2⋯am]T b=[b1b2⋯bm]T
【证明】
记
f
(
X
)
=
a
T
X
X
T
b
=
(
[
(
a
1
b
1
)
(
x
11
x
11
+
x
12
x
12
+
⋯
+
x
1
n
x
1
n
)
]
+
[
(
a
1
b
2
)
(
x
11
x
21
+
x
12
x
22
+
⋯
+
x
1
n
x
2
n
)
]
+
⋯
+
[
(
a
1
b
m
)
(
x
11
x
m
1
+
x
12
x
m
2
+
⋯
+
x
1
n
x
m
n
)
]
+
[
(
a
2
b
1
)
(
x
21
x
11
+
x
22
x
12
+
⋯
+
x
2
n
x
1
n
)
]
+
[
(
a
2
b
2
)
(
x
21
x
21
+
x
22
x
22
+
⋯
+
x
2
n
x
2
n
)
]
+
⋯
+
[
(
a
2
b
m
)
(
x
21
x
m
1
+
x
22
x
m
2
+
⋯
+
x
2
n
x
m
n
)
]
+
⋯
+
[
(
a
m
b
1
)
(
x
m
1
x
11
+
x
m
2
x
12
+
⋯
+
x
m
n
x
1
n
)
]
+
[
(
a
m
b
2
)
(
x
m
1
x
21
+
x
m
2
x
22
+
⋯
+
x
m
n
x
2
n
)
]
+
⋯
+
[
(
a
m
b
m
)
(
x
m
1
x
m
1
+
x
m
2
x
m
2
+
⋯
+
x
m
n
x
m
n
)
]
)
f\left( \boldsymbol{X} \right) = \boldsymbol{a}^T\boldsymbol{XX}^T\boldsymbol{b} =\left( \begin{array}{c} [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots +x_{1n}x_{1n})]+\\ [(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots +x_{1n}x_{2n})]+\\ \cdots +\\ [(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots +x_{1n}x_{mn})]+\\ [(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots +x_{2n}x_{1n})]+\\ [(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots +x_{2n}x_{2n})]+\\ \cdots +\\ [(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots +x_{2n}x_{mn})]+\\ \cdots +\\ [(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots +x_{mn}x_{1n})]+\\ [(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots +x_{mn}x_{2n})]+\\ \cdots +\\ [(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots +x_{mn}x_{mn})]\\ \end{array} \right)
f(X)=aTXXTb=
[(a1b1)(x11x11+x12x12+⋯+x1nx1n)]+[(a1b2)(x11x21+x12x22+⋯+x1nx2n)]+⋯+[(a1bm)(x11xm1+x12xm2+⋯+x1nxmn)]+[(a2b1)(x21x11+x22x12+⋯+x2nx1n)]+[(a2b2)(x21x21+x22x22+⋯+x2nx2n)]+⋯+[(a2bm)(x21xm1+x22xm2+⋯+x2nxmn)]+⋯+[(amb1)(xm1x11+xm2x12+⋯+xmnx1n)]+[(amb2)(xm1x21+xm2x22+⋯+xmnx2n)]+⋯+[(ambm)(xm1xm1+xm2xm2+⋯+xmnxmn)]
则
∂
(
a
T
X
X
T
b
)
∂
X
=
[
∂
f
∂
x
11
∂
f
∂
x
12
⋯
∂
f
∂
x
1
n
∂
f
∂
x
21
∂
f
∂
x
22
⋯
∂
f
∂
x
2
n
⋮
⋮
⋱
⋮
∂
f
∂
x
m
1
∂
f
∂
x
m
2
⋯
∂
f
∂
x
m
n
]
m
×
n
=
[
(
a
1
b
1
x
11
+
a
1
b
2
x
21
+
⋯
+
a
1
b
m
x
m
1
)
+
(
b
1
a
1
x
11
+
b
1
a
2
x
21
+
⋯
+
b
1
a
m
x
m
1
)
(
a
1
b
1
x
12
+
a
1
b
2
x
22
+
⋯
+
a
1
b
m
x
m
2
)
+
(
b
1
a
1
x
12
+
b
1
a
2
x
22
+
⋯
+
b
1
a
m
x
m
2
)
⋯
(
a
1
b
1
x
1
n
+
a
1
b
2
x
2
n
+
⋯
+
a
1
b
m
x
m
n
)
+
(
b
1
a
1
x
1
n
+
b
1
a
2
x
2
n
+
⋯
+
b
1
a
m
x
m
n
)
(
a
2
b
1
x
11
+
a
2
b
2
x
21
+
⋯
+
a
2
b
m
x
m
1
)
+
(
b
2
a
1
x
11
+
b
2
a
2
x
21
+
⋯
+
b
2
a
m
x
m
1
)
(
a
2
b
1
x
12
+
a
2
b
2
x
22
+
⋯
+
a
2
b
m
x
m
2
)
+
(
b
2
a
1
x
12
+
b
2
a
2
x
22
+
⋯
+
b
2
a
m
x
m
2
)
⋯
(
a
2
b
1
x
1
n
+
a
2
b
2
x
2
n
+
⋯
+
a
2
b
m
x
m
n
)
+
(
b
2
a
1
x
1
n
+
b
2
a
2
x
2
n
+
⋯
+
b
2
a
m
x
m
n
)
⋮
⋮
⋱
⋮
(
a
m
b
1
x
11
+
a
m
b
2
x
21
+
⋯
+
a
m
b
m
x
m
1
)
+
(
b
m
a
1
x
11
+
b
m
a
2
x
21
+
⋯
+
b
m
a
m
x
m
1
)
(
a
m
b
1
x
12
+
a
m
b
2
x
22
+
⋯
+
a
m
b
m
x
m
2
)
+
(
b
m
a
1
x
12
+
b
m
a
2
x
22
+
⋯
+
b
m
a
m
x
m
2
)
⋯
(
a
m
b
1
x
1
n
+
a
m
b
2
x
2
n
+
⋯
+
a
m
b
m
x
m
n
)
+
(
b
m
a
1
x
1
n
+
b
m
a
2
x
2
n
+
⋯
+
b
m
a
m
x
m
n
)
]
=
[
a
1
b
1
x
11
+
a
1
b
2
x
21
+
⋯
+
a
1
b
m
x
m
1
a
1
b
1
x
12
+
a
1
b
2
x
22
+
⋯
+
a
1
b
m
x
m
2
⋯
a
1
b
1
x
1
n
+
a
1
b
2
x
2
n
+
⋯
+
a
1
b
m
x
m
n
a
2
b
1
x
11
+
a
2
b
2
x
21
+
⋯
+
a
2
b
m
x
m
1
a
2
b
1
x
12
+
a
2
b
2
x
22
+
⋯
+
a
2
b
m
x
m
2
⋯
a
2
b
1
x
1
n
+
a
2
b
2
x
2
n
+
⋯
+
a
2
b
m
x
m
n
⋮
⋮
⋮
⋮
a
m
b
1
x
11
+
a
m
b
2
x
21
+
⋯
+
a
m
b
m
x
m
1
a
m
b
1
x
12
+
a
m
b
2
x
22
+
⋯
+
a
m
b
m
x
m
2
⋯
a
m
b
1
x
1
n
+
a
m
b
2
x
2
n
+
⋯
+
a
m
b
m
x
m
n
]
+
[
b
1
a
1
x
11
+
b
1
a
2
x
21
+
⋯
+
b
1
a
m
x
m
1
b
1
a
1
x
12
+
b
1
a
2
x
22
+
⋯
+
b
1
a
m
x
m
2
⋯
b
1
a
1
x
1
n
+
b
1
a
2
x
2
n
+
⋯
+
b
1
a
m
x
m
n
b
2
a
1
x
11
+
b
2
a
2
x
21
+
⋯
+
b
2
a
m
x
m
1
b
2
a
1
x
12
+
b
2
a
2
x
22
+
⋯
+
b
2
a
m
x
m
2
⋯
b
2
a
1
x
1
n
+
b
2
a
2
x
2
n
+
⋯
+
b
2
a
m
x
m
n
⋮
⋮
⋱
⋮
b
m
a
1
x
11
+
b
m
a
2
x
21
+
⋯
+
b
m
a
m
x
m
1
b
m
a
1
x
12
+
b
m
a
2
x
22
+
⋯
+
b
m
a
m
x
m
2
⋯
b
m
a
1
x
1
n
+
b
m
a
2
x
2
n
+
⋯
+
b
m
a
m
x
m
n
]
=
[
a
1
b
1
a
1
b
2
⋯
a
1
b
m
a
2
b
1
a
2
b
2
⋯
a
2
b
m
⋮
⋮
⋱
⋮
a
m
b
1
a
m
b
2
⋯
a
m
b
m
]
[
x
11
x
12
⋯
x
1
n
x
21
x
22
⋯
x
2
n
⋮
⋮
⋱
⋮
x
m
1
x
m
2
⋯
x
m
n
]
+
[
b
1
a
1
b
1
a
2
⋯
b
1
a
m
b
2
a
1
b
2
a
2
⋯
b
2
a
m
⋮
⋮
⋱
⋮
b
m
a
1
b
m
a
2
⋯
b
m
a
m
]
[
x
11
x
12
⋯
x
1
n
x
21
x
22
⋯
x
2
n
⋮
⋮
⋱
⋮
x
m
1
x
m
2
⋯
x
m
n
]
=
[
a
1
a
2
⋮
a
m
]
[
b
1
,
b
2
,
⋯
,
b
m
]
[
x
11
x
12
⋯
x
1
n
x
21
x
22
⋯
x
2
n
⋮
⋮
⋱
⋮
x
m
1
x
m
2
⋯
x
m
n
]
+
[
b
1
b
2
⋮
b
m
]
[
a
1
,
a
2
,
⋯
,
a
m
]
[
x
11
x
12
⋯
x
1
n
x
21
x
22
⋯
x
2
n
⋮
⋮
⋱
⋮
x
m
1
x
m
2
⋯
x
m
n
]
=
a
b
T
X
+
b
a
T
X
\begin{aligned} \frac{\partial \left( \boldsymbol{a}^T\boldsymbol{XX}^T\boldsymbol{b} \right)}{\partial \boldsymbol{X}}&=\left[ \begin{matrix} \frac{\partial f}{\partial x_{11}}& \frac{\partial f}{\partial x_{12}}& \cdots& \frac{\partial f}{\partial x_{1n}}\\ & & & \\ \frac{\partial f}{\partial x_{21}}& \frac{\partial f}{\partial x_{22}}& \cdots& \frac{\partial f}{\partial x_{2n}}\\ & & & \\ \vdots& \vdots& \ddots& \vdots\\ & & & \\ \frac{\partial f}{\partial x_{m1}}& \frac{\partial f}{\partial x_{m2}}& \cdots& \frac{\partial f}{\partial x_{mn}}\\ \end{matrix} \right] _{m\times n} \\ \\ &=\left[ \begin{matrix} (a_1b_1x_{11}+a_1b_2x_{21}+\cdots +a_1b_mx_{m1})+(b_1a_1x_{11}+b_1a_2x_{21}+\cdots +b_1a_mx_{m1})& (a_1b_1x_{12}+a_1b_2x_{22}+\cdots +a_1b_mx_{m2})+(b_1a_1x_{12}+b_1a_2x_{22}+\cdots +b_1a_mx_{m2})& \cdots& (a_1b_1x_{1n}+a_1b_2x_{2n}+\cdots +a_1b_mx_{mn})+(b_1a_1x_{1n}+b_1a_2x_{2n}+\cdots +b_1a_mx_{mn})\\ \\ (a_2b_1x_{11}+a_2b_2x_{21}+\cdots +a_2b_mx_{m1})+(b_2a_1x_{11}+b_2a_2x_{21}+\cdots +b_2a_mx_{m1})& (a_2b_1x_{12}+a_2b_2x_{22}+\cdots +a_2b_mx_{m2})+(b_2a_1x_{12}+b_2a_2x_{22}+\cdots +b_2a_mx_{m2})& \cdots& (a_2b_1x_{1n}+a_2b_2x_{2n}+\cdots +a_2b_mx_{mn})+(b_2a_1x_{1n}+b_2a_2x_{2n}+\cdots +b_2a_mx_{mn})\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ (a_mb_1x_{11}+a_mb_2x_{21}+\cdots +a_mb_mx_{m1})+(b_ma_1x_{11}+b_ma_2x_{21}+\cdots +b_ma_mx_{m1})& (a_mb_1x_{12}+a_mb_2x_{22}+\cdots +a_mb_mx_{m2})+(b_ma_1x_{12}+b_ma_2x_{22}+\cdots +b_ma_mx_{m2})& \cdots& (a_mb_1x_{1n}+a_mb_2x_{2n}+\cdots +a_mb_mx_{mn})+(b_ma_1x_{1n}+b_ma_2x_{2n}+\cdots +b_ma_mx_{mn})\\ \end{matrix} \right] \\ \\ &=\left[ \begin{matrix} a_1b_1x_{11}+a_1b_2x_{21}+\cdots +a_1b_mx_{m1}& a_1b_1x_{12}+a_1b_2x_{22}+\cdots +a_1b_mx_{m2}& \cdots& a_1b_1x_{1n}+a_1b_2x_{2n}+\cdots +a_1b_mx_{mn}\\ \\ a_2b_1x_{11}+a_2b_2x_{21}+\cdots +a_2b_mx_{m1}& a_2b_1x_{12}+a_2b_2x_{22}+\cdots +a_2b_mx_{m2}& \cdots& a_2b_1x_{1n}+a_2b_2x_{2n}+\cdots +a_2b_mx_{mn}\\ \\ \vdots& \vdots& \vdots& \vdots\\ \\ a_mb_1x_{11}+a_mb_2x_{21}+\cdots +a_mb_mx_{m1}& a_mb_1x_{12}+a_mb_2x_{22}+\cdots +a_mb_mx_{m2}& \cdots& a_mb_1x_{1n}+a_mb_2x_{2n}+\cdots +a_mb_mx_{mn}\\ \end{matrix} \right] +\left[ \begin{matrix} b_1a_1x_{11}+b_1a_2x_{21}+\cdots +b_1a_mx_{m1}& b_1a_1x_{12}+b_1a_2x_{22}+\cdots +b_1a_mx_{m2}& \cdots& b_1a_1x_{1n}+b_1a_2x_{2n}+\cdots +b_1a_mx_{mn}\\ b_2a_1x_{11}+b_2a_2x_{21}+\cdots +b_2a_mx_{m1}& b_2a_1x_{12}+b_2a_2x_{22}+\cdots +b_2a_mx_{m2}& \cdots& b_2a_1x_{1n}+b_2a_2x_{2n}+\cdots +b_2a_mx_{mn}\\ \vdots& \vdots& \ddots& \vdots\\ b_ma_1x_{11}+b_ma_2x_{21}+\cdots +b_ma_mx_{m1}& b_ma_1x_{12}+b_ma_2x_{22}+\cdots +b_ma_mx_{m2}& \cdots& b_ma_1x_{1n}+b_ma_2x_{2n}+\cdots +b_ma_mx_{mn}\\ \end{matrix} \right] \\ \\ &=\left[ \begin{matrix} a_1b_1& a_1b_2& \cdots& a_1b_m\\ \\ a_2b_1& a_2b_2& \cdots& a_2b_m\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ a_mb_1& a_mb_2& \cdots& a_mb_m\\ \end{matrix} \right] \left[ \begin{matrix} x_{11}& x_{12}& \cdots& x_{1n}\\ \\ x_{21}& x_{22}& \cdots& x_{2n}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ x_{m1}& x_{m2}& \cdots& x_{mn}\\ \end{matrix} \right] +\left[ \begin{matrix} b_1a_1& b_1a_2& \cdots& b_1a_m\\ \\ b_2a_1& b_2a_2& \cdots& b_2a_m\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ b_ma_1& b_ma_2& \cdots& b_ma_m\\ \end{matrix} \right] \left[ \begin{matrix} x_{11}& x_{12}& \cdots& x_{1n}\\ \\ x_{21}& x_{22}& \cdots& x_{2n}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ x_{m1}& x_{m2}& \cdots& x_{mn}\\ \end{matrix} \right] \\ \\ &=\left[ \begin{array}{c} a_1\\ \\ a_2\\ \\ \vdots\\ \\ a_m\\ \end{array} \right] [b_1,b_2,\cdots ,b_m]\left[ \begin{matrix} x_{11}& x_{12}& \cdots& x_{1n}\\ \\ x_{21}& x_{22}& \cdots& x_{2n}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ x_{m1}& x_{m2}& \cdots& x_{mn}\\ \end{matrix} \right] +\left[ \begin{array}{c} b_1\\ \\ b_2\\ \\ \vdots\\ \\ b_m\\ \end{array} \right] [a_1,a_2,\cdots ,a_m]\left[ \begin{matrix} x_{11}& x_{12}& \cdots& x_{1n}\\ \\ x_{21}& x_{22}& \cdots& x_{2n}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ x_{m1}& x_{m2}& \cdots& x_{mn}\\ \end{matrix} \right] \\ \\ &=\boldsymbol{ab}^T\boldsymbol{X}+\boldsymbol{ba}^T\boldsymbol{X} \end{aligned}
∂X∂(aTXXTb)=
∂x11∂f∂x21∂f⋮∂xm1∂f∂x12∂f∂x22∂f⋮∂xm2∂f⋯⋯⋱⋯∂x1n∂f∂x2n∂f⋮∂xmn∂f
m×n=
(a1b1x11+a1b2x21+⋯+a1bmxm1)+(b1a1x11+b1a2x21+⋯+b1amxm1)(a2b1x11+a2b2x21+⋯+a2bmxm1)+(b2a1x11+b2a2x21+⋯+b2amxm1)⋮(amb1x11+amb2x21+⋯+ambmxm1)+(bma1x11+bma2x21+⋯+bmamxm1)(a1b1x12+a1b2x22+⋯+a1bmxm2)+(b1a1x12+b1a2x22+⋯+b1amxm2)(a2b1x12+a2b2x22+⋯+a2bmxm2)+(b2a1x12+b2a2x22+⋯+b2amxm2)⋮(amb1x12+amb2x22+⋯+ambmxm2)+(bma1x12+bma2x22+⋯+bmamxm2)⋯⋯⋱⋯(a1b1x1n+a1b2x2n+⋯+a1bmxmn)+(b1a1x1n+b1a2x2n+⋯+b1amxmn)(a2b1x1n+a2b2x2n+⋯+a2bmxmn)+(b2a1x1n+b2a2x2n+⋯+b2amxmn)⋮(amb1x1n+amb2x2n+⋯+ambmxmn)+(bma1x1n+bma2x2n+⋯+bmamxmn)
=
a1b1x11+a1b2x21+⋯+a1bmxm1a2b1x11+a2b2x21+⋯+a2bmxm1⋮amb1x11+amb2x21+⋯+ambmxm1a1b1x12+a1b2x22+⋯+a1bmxm2a2b1x12+a2b2x22+⋯+a2bmxm2⋮amb1x12+amb2x22+⋯+ambmxm2⋯⋯⋮⋯a1b1x1n+a1b2x2n+⋯+a1bmxmna2b1x1n+a2b2x2n+⋯+a2bmxmn⋮amb1x1n+amb2x2n+⋯+ambmxmn
+
b1a1x11+b1a2x21+⋯+b1amxm1b2a1x11+b2a2x21+⋯+b2amxm1⋮bma1x11+bma2x21+⋯+bmamxm1b1a1x12+b1a2x22+⋯+b1amxm2b2a1x12+b2a2x22+⋯+b2amxm2⋮bma1x12+bma2x22+⋯+bmamxm2⋯⋯⋱⋯b1a1x1n+b1a2x2n+⋯+b1amxmnb2a1x1n+b2a2x2n+⋯+b2amxmn⋮bma1x1n+bma2x2n+⋯+bmamxmn
=
a1b1a2b1⋮amb1a1b2a2b2⋮amb2⋯⋯⋱⋯a1bma2bm⋮ambm
x11x21⋮xm1x12x22⋮xm2⋯⋯⋱⋯x1nx2n⋮xmn
+
b1a1b2a1⋮bma1b1a2b2a2⋮bma2⋯⋯⋱⋯b1amb2am⋮bmam
x11x21⋮xm1x12x22⋮xm2⋯⋯⋱⋯x1nx2n⋮xmn
=
a1a2⋮am
[b1,b2,⋯,bm]
x11x21⋮xm1x12x22⋮xm2⋯⋯⋱⋯x1nx2n⋮xmn
+
b1b2⋮bm
[a1,a2,⋯,am]
x11x21⋮xm1x12x22⋮xm2⋯⋯⋱⋯x1nx2n⋮xmn
=abTX+baTX
结论十
∂
(
a
T
X
T
X
b
)
∂
X
=
X
b
a
T
+
X
a
b
T
\frac{\partial \left( \boldsymbol{a}^T\boldsymbol{X}^T\boldsymbol{Xb} \right)}{\partial \boldsymbol{X}}=\boldsymbol{Xba}^T+\boldsymbol{Xab}^T
∂X∂(aTXTXb)=XbaT+XabT
其中
a
\boldsymbol{a}
a,
b
\boldsymbol{b}
b为常数向量:
a
=
[
a
1
a
2
⋯
a
n
]
T
b
=
[
b
1
b
2
⋯
b
n
]
T
\boldsymbol{a}=\left[ \begin{matrix} a_1& a_2& \cdots& a_n\\ \end{matrix} \right] ^T \\ \ \ \\ \boldsymbol{b}=\left[ \begin{matrix} b_1& b_2& \cdots& b_n\\ \end{matrix} \right] ^T
a=[a1a2⋯an]T b=[b1b2⋯bn]T
【证明】
根据结论九
,有:
[
∂
(
a
T
X
X
T
b
)
∂
X
]
T
=
∂
(
a
T
X
X
T
b
)
∂
X
T
=
(
a
b
T
X
+
b
a
T
X
)
T
=
X
T
b
a
T
+
X
T
a
b
T
\begin{aligned} \left[ \frac{\partial \left( \boldsymbol{a}^T\boldsymbol{XX}^T\boldsymbol{b} \right)}{\partial \boldsymbol{X}} \right] ^T&=\frac{\partial \left( \boldsymbol{a}^T\boldsymbol{XX}^T\boldsymbol{b} \right)}{\partial \boldsymbol{X}^T} \\ &=\left( \boldsymbol{ab}^T\boldsymbol{X}+\boldsymbol{ba}^T\boldsymbol{X} \right) ^T \\ \\ &=\boldsymbol{X}^T\boldsymbol{ba}^T+\boldsymbol{X}^T\boldsymbol{ab}^T \end{aligned}
∂X∂(aTXXTb)
T=∂XT∂(aTXXTb)=(abTX+baTX)T=XTbaT+XTabT
于是:
∂
(
a
T
X
T
X
b
)
∂
X
=
∂
(
a
T
X
T
(
X
T
)
T
b
)
∂
(
X
T
)
T
=
(
X
T
)
T
b
a
T
+
(
X
T
)
T
a
b
T
=
X
b
a
T
+
X
a
b
T
\begin{aligned} \frac{\partial \left( \boldsymbol{a}^T\boldsymbol{X}^T\boldsymbol{Xb} \right)}{\partial \boldsymbol{X}}&=\frac{\partial \left( \boldsymbol{a}^T\boldsymbol{X}^T\left( \boldsymbol{X}^T \right) ^T\boldsymbol{b} \right)}{\partial \left( \boldsymbol{X}^T \right) ^T} \\ \\ &=\left( \boldsymbol{X}^T \right) ^T\boldsymbol{ba}^T+\left( \boldsymbol{X}^T \right) ^T\boldsymbol{ab}^T \\ \\ &=\boldsymbol{Xba}^T+\boldsymbol{Xab}^T \end{aligned}
∂X∂(aTXTXb)=∂(XT)T∂(aTXT(XT)Tb)=(XT)TbaT+(XT)TabT=XbaT+XabT
参考文献
[1] 机器学习中的矩阵求导方法
[2] 矩阵求导公式的数学推导
[3] 矩阵的求导