4 向量对向量求导
4.1 定义
4.1.1 行向量对列向量求导
也称分母布局,用
∂
y
T
∂
x
\frac{\partial \boldsymbol{y}^T}{\partial \boldsymbol{x}}
∂x∂yT表示。
m
m
m维行向量
y
T
=
[
y
1
,
y
2
,
⋯
,
y
m
]
\boldsymbol{y}^T=\left[ y_1,y_2,\cdots ,y_m \right]
yT=[y1,y2,⋯,ym]对
n
n
n维列向量
x
=
[
x
1
,
x
2
,
⋯
,
x
n
]
T
\boldsymbol{x}=\left[ x_1,x_2,\cdots ,x_n \right] ^T
x=[x1,x2,⋯,xn]T求导,得到的是
n
×
m
n\times m
n×m维矩阵:
∂
y
T
∂
x
=
[
∂
y
T
∂
x
1
∂
y
T
∂
x
2
⋮
∂
y
T
∂
x
n
]
=
[
∂
y
1
∂
x
1
∂
y
2
∂
x
1
⋯
∂
y
m
∂
x
1
∂
y
1
∂
x
2
∂
y
2
∂
x
2
⋯
∂
y
m
∂
x
2
⋮
⋮
⋱
⋮
∂
y
1
∂
x
n
∂
y
2
∂
x
n
⋯
∂
y
m
∂
x
n
]
\frac{\partial \boldsymbol{y}^T}{\partial \boldsymbol{x}}=\left[ \begin{array}{c} \frac{\partial \boldsymbol{y}^T}{\partial x_1}\\ \\ \frac{\partial \boldsymbol{y}^T}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial \boldsymbol{y}^T}{\partial x_n}\\ \end{array} \right] =\left[ \begin{matrix} \frac{\partial y_1}{\partial x_1}& \frac{\partial y_2}{\partial x_1}& \cdots& \frac{\partial y_m}{\partial x_1}\\ \\ \frac{\partial y_1}{\partial x_2}& \frac{\partial y_2}{\partial x_2}& \cdots& \frac{\partial y_m}{\partial x_2}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial y_1}{\partial x_n}& \frac{\partial y_2}{\partial x_n}& \cdots& \frac{\partial y_m}{\partial x_n}\\ \end{matrix} \right]
∂x∂yT=
∂x1∂yT∂x2∂yT⋮∂xn∂yT
=
∂x1∂y1∂x2∂y1⋮∂xn∂y1∂x1∂y2∂x2∂y2⋮∂xn∂y2⋯⋯⋱⋯∂x1∂ym∂x2∂ym⋮∂xn∂ym
数学上将这种矩阵称之为梯度矩阵
4.1.2 列向量对行向量求导
也称分子布局,用
∂
y
∂
x
T
\frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}^T}
∂xT∂y表示。
m
m
m维列向量
y
=
[
y
1
,
y
2
,
⋯
,
y
m
]
T
\boldsymbol{y}=\left[ y_1,y_2,\cdots ,y_m \right]^T
y=[y1,y2,⋯,ym]T对
n
n
n维行向量
x
T
=
[
x
1
,
x
2
,
⋯
,
x
n
]
\boldsymbol{x}^T=\left[ x_1,x_2,\cdots ,x_n \right]
xT=[x1,x2,⋯,xn]求导,得到的是
m
×
n
m\times n
m×n维矩阵:
∂
y
∂
x
T
=
[
∂
y
1
∂
x
∂
y
2
∂
x
⋮
∂
y
m
∂
x
]
=
[
∂
y
1
∂
x
1
∂
y
1
∂
x
2
⋯
∂
y
1
∂
x
n
∂
y
2
∂
x
1
∂
y
2
∂
x
2
⋯
∂
y
2
∂
x
n
⋮
⋮
⋱
⋮
∂
y
m
∂
x
1
∂
y
m
∂
x
2
⋯
∂
y
m
∂
x
n
]
\frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}^T}=\left[ \begin{array}{c} \frac{\partial y_1}{\partial \boldsymbol{x}}\\ \\\ \frac{\partial y_2}{\partial \boldsymbol{x}}\\ \\\ \vdots\\ \\\ \frac{\partial y_m}{\partial \boldsymbol{x}}\\ \end{array} \right] =\left[ \begin{matrix} \frac{\partial y_1}{\partial x_1}& \frac{\partial y_1}{\partial x_2}& \cdots& \frac{\partial y_1}{\partial x_n}\\ \\\ \frac{\partial y_2}{\partial x_1}& \frac{\partial y_2}{\partial x_2}& \cdots& \frac{\partial y_2}{\partial x_n}\\ \\\ \vdots& \vdots& \ddots& \vdots\\ \\\ \frac{\partial y_m}{\partial x_1}& \frac{\partial y_m}{\partial x_2}& \cdots& \frac{\partial y_m}{\partial x_n}\\ \end{matrix} \right]
∂xT∂y=
∂x∂y1 ∂x∂y2 ⋮ ∂x∂ym
=
∂x1∂y1 ∂x1∂y2 ⋮ ∂x1∂ym∂x2∂y1∂x2∂y2⋮∂x2∂ym⋯⋯⋱⋯∂xn∂y1∂xn∂y2⋮∂xn∂ym
数学上将这种矩阵称之为雅克比 (Jacobian)矩阵。
根据定义可以看出
∂
y
T
∂
x
≠
∂
y
∂
x
T
,
∂
y
T
∂
x
=
(
∂
y
∂
x
T
)
T
\frac{\partial \boldsymbol{y}^T}{\partial \boldsymbol{x}}\ne \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}^T}\,\,, \frac{\partial \boldsymbol{y}^T}{\partial \boldsymbol{x}}=\left( \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}^T} \right) ^T
∂x∂yT=∂xT∂y,∂x∂yT=(∂xT∂y)T
4.2 运算法则
若 a ( x ) \boldsymbol{a}\left( \boldsymbol{x} \right) a(x)和 b ( x ) \boldsymbol{b}\left( \boldsymbol{x} \right) b(x)为 m m m维列向量函数, λ ( x ) \lambda \left( \boldsymbol{x} \right) λ(x)为数量函数, x \boldsymbol{x} x为 n n n维列向量,则有以下3个运算公式:
4.2.1 加法运算公式
∂ ( a T ( x ) ± b T ( x ) ) ∂ x = ∂ a T ( x ) ∂ x ± ∂ b T ( x ) ∂ x \frac{\partial \left( \boldsymbol{a}^T\left( \boldsymbol{x} \right) \pm \boldsymbol{b}^T\left( \boldsymbol{x} \right) \right)}{\partial \boldsymbol{x}}=\frac{\partial \boldsymbol{a}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\pm \frac{\partial \boldsymbol{b}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}} ∂x∂(aT(x)±bT(x))=∂x∂aT(x)±∂x∂bT(x)
4.2.2 数乘运算公式
∂ ( λ ( x ) a T ( x ) ) ∂ x = ∂ λ ( x ) ∂ x ⋅ a T ( x ) + λ ( x ) ⋅ ∂ a T ( x ) ∂ x \frac{\partial \left( \lambda \left( \boldsymbol{x} \right) \boldsymbol{a}^T\left( \boldsymbol{x} \right) \right)}{\partial \boldsymbol{x}}=\frac{\partial \lambda \left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{a}^T\left( \boldsymbol{x} \right) +\lambda \left( \boldsymbol{x} \right) \cdot \frac{\partial \boldsymbol{a}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}} ∂x∂(λ(x)aT(x))=∂x∂λ(x)⋅aT(x)+λ(x)⋅∂x∂aT(x)
4.2.3 乘法运算公式
∂
[
a
T
(
x
)
⋅
b
(
x
)
]
∂
x
=
∂
a
T
(
x
)
∂
x
⋅
b
(
x
)
+
∂
b
T
(
x
)
∂
x
⋅
a
(
x
)
\frac{\partial \left[ \boldsymbol{a}^T\left( \boldsymbol{x} \right) \cdot \boldsymbol{b}\left( \boldsymbol{x} \right) \right]}{\partial \boldsymbol{x}}=\frac{\partial \boldsymbol{a}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{b}\left( \boldsymbol{x} \right) +\frac{\partial \boldsymbol{b}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{a}\left( \boldsymbol{x} \right)
∂x∂[aT(x)⋅b(x)]=∂x∂aT(x)⋅b(x)+∂x∂bT(x)⋅a(x)
∂
x
∂
x
T
=
∂
x
T
∂
x
=
E
\frac{\partial \boldsymbol{x}}{\partial \boldsymbol{x}^T}=\frac{\partial \boldsymbol{x}^T}{\partial \boldsymbol{x}}=\boldsymbol{E}
∂xT∂x=∂x∂xT=E
4.3 示例
【例4.1】求证:
∂
[
a
T
(
x
)
⋅
b
(
x
)
]
∂
x
=
∂
a
T
(
x
)
∂
x
⋅
b
(
x
)
+
∂
b
T
(
x
)
∂
x
⋅
a
(
x
)
\frac{\partial \left[ \boldsymbol{a}^T\left( \boldsymbol{x} \right) \cdot \boldsymbol{b}\left( \boldsymbol{x} \right) \right]}{\partial \boldsymbol{x}}=\frac{\partial \boldsymbol{a}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{b}\left( \boldsymbol{x} \right) +\frac{\partial \boldsymbol{b}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{a}\left( \boldsymbol{x} \right)
∂x∂[aT(x)⋅b(x)]=∂x∂aT(x)⋅b(x)+∂x∂bT(x)⋅a(x)
【证】
∂
[
a
T
(
x
)
⋅
b
(
x
)
]
∂
x
=
[
∂
a
T
b
∂
x
1
⋮
∂
a
T
b
∂
x
i
⋮
∂
a
T
b
∂
x
n
]
=
[
∂
a
T
∂
x
1
⋅
b
+
a
T
⋅
∂
b
∂
x
1
⋮
∂
a
T
∂
x
i
⋅
b
+
a
T
⋅
∂
b
∂
x
i
⋮
∂
a
T
∂
x
m
⋅
b
+
a
T
⋅
∂
b
∂
x
m
]
=
[
∂
a
T
∂
x
1
⋅
b
+
∂
b
T
∂
x
1
⋅
a
⋮
∂
a
T
∂
x
i
⋅
b
+
∂
b
T
∂
x
i
⋅
a
⋮
∂
a
T
∂
x
m
⋅
b
+
∂
b
T
∂
x
m
⋅
a
]
=
∂
a
T
(
x
)
∂
x
⋅
b
(
x
)
+
∂
b
T
(
x
)
∂
x
⋅
a
(
x
)
\begin{aligned} \frac{\partial \left[ \boldsymbol{a}^T\left( \boldsymbol{x} \right) \cdot \boldsymbol{b}\left( \boldsymbol{x} \right) \right]}{\partial \boldsymbol{x}}&=\left[ \begin{array}{c} \frac{\partial \boldsymbol{a}^T\boldsymbol{b}}{\partial x_1}\\ \vdots\\ \frac{\partial \boldsymbol{a}^T\boldsymbol{b}}{\partial x_i}\\ \vdots\\ \frac{\partial \boldsymbol{a}^T\boldsymbol{b}}{\partial x_n}\\ \end{array} \right] =\left[ \begin{array}{c} \frac{\partial \boldsymbol{a}^T}{\partial x_1}\cdot \boldsymbol{b}+\boldsymbol{a}^T\cdot \frac{\partial \boldsymbol{b}}{\partial x_1}\\ \vdots\\ \frac{\partial \boldsymbol{a}^T}{\partial x_i}\cdot \boldsymbol{b}+\boldsymbol{a}^T\cdot \frac{\partial \boldsymbol{b}}{\partial x_i}\\ \vdots\\ \frac{\partial \boldsymbol{a}^T}{\partial x_m}\cdot \boldsymbol{b}+\boldsymbol{a}^T\cdot \frac{\partial \boldsymbol{b}}{\partial x_m}\\ \end{array} \right] \\ \ \ \\ &=\left[ \begin{array}{c} \frac{\partial \boldsymbol{a}^T}{\partial x_1}\cdot \boldsymbol{b}+\frac{\partial \boldsymbol{b}^T}{\partial x_1}\cdot \boldsymbol{a}\\ \vdots\\ \frac{\partial \boldsymbol{a}^T}{\partial x_i}\cdot \boldsymbol{b}+\frac{\partial \boldsymbol{b}^T}{\partial x_i}\cdot \boldsymbol{a}\\ \vdots\\ \frac{\partial \boldsymbol{a}^T}{\partial x_m}\cdot \boldsymbol{b}+\frac{\partial \boldsymbol{b}^T}{\partial x_m}\cdot \boldsymbol{a}\\ \end{array} \right] \\ \ \ \\ &=\frac{\partial \boldsymbol{a}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{b}\left( \boldsymbol{x} \right) +\frac{\partial \boldsymbol{b}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{a}\left( \boldsymbol{x} \right) \end{aligned}
∂x∂[aT(x)⋅b(x)] =
∂x1∂aTb⋮∂xi∂aTb⋮∂xn∂aTb
=
∂x1∂aT⋅b+aT⋅∂x1∂b⋮∂xi∂aT⋅b+aT⋅∂xi∂b⋮∂xm∂aT⋅b+aT⋅∂xm∂b
=
∂x1∂aT⋅b+∂x1∂bT⋅a⋮∂xi∂aT⋅b+∂xi∂bT⋅a⋮∂xm∂aT⋅b+∂xm∂bT⋅a
=∂x∂aT(x)⋅b(x)+∂x∂bT(x)⋅a(x)
【例4.2】求
∂
x
∂
x
T
\frac{\partial \boldsymbol{x}}{\partial \boldsymbol{x}^T}
∂xT∂x与
∂
x
T
∂
x
\frac{\partial \boldsymbol{x}^T}{\partial \boldsymbol{x}}
∂x∂xT
其中
x
\boldsymbol{x}
x为
n
n
n维列向量。
【解】
∂
x
∂
x
T
=
[
∂
x
1
∂
x
1
∂
x
1
∂
x
2
⋯
∂
x
1
∂
x
n
∂
x
2
∂
x
1
∂
x
2
∂
x
2
⋯
∂
x
2
∂
x
n
⋮
⋮
⋱
⋮
∂
x
n
∂
x
1
∂
x
n
∂
x
2
⋯
∂
x
n
∂
x
n
]
=
[
1
0
⋯
0
0
1
⋯
0
⋮
⋮
⋱
⋮
0
0
⋯
1
]
=
E
\frac{\partial \boldsymbol{x}}{\partial \boldsymbol{x}^T}=\left[ \begin{matrix} \frac{\partial x_1}{\partial x_1}& \frac{\partial x_1}{\partial x_2}& \cdots& \frac{\partial x_1}{\partial x_n}\\ \\ \frac{\partial x_2}{\partial x_1}& \frac{\partial x_2}{\partial x_2}& \cdots& \frac{\partial x_2}{\partial x_n}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial x_n}{\partial x_1}& \frac{\partial x_n}{\partial x_2}& \cdots& \frac{\partial x_n}{\partial x_n}\\ \end{matrix} \right] =\left[ \begin{matrix} 1& 0& \cdots& 0\\ \\ 0& 1& \cdots& 0\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ 0& 0& \cdots& 1\\ \end{matrix} \right] =\boldsymbol{E}
∂xT∂x=
∂x1∂x1∂x1∂x2⋮∂x1∂xn∂x2∂x1∂x2∂x2⋮∂x2∂xn⋯⋯⋱⋯∂xn∂x1∂xn∂x2⋮∂xn∂xn
=
10⋮001⋮0⋯⋯⋱⋯00⋮1
=E
∂
x
T
∂
x
=
[
∂
x
1
∂
x
1
∂
x
2
∂
x
1
⋯
∂
x
n
∂
x
1
∂
x
1
∂
x
2
∂
x
2
∂
x
2
⋯
∂
x
n
∂
x
2
⋮
⋮
⋱
⋮
∂
x
1
∂
x
n
∂
x
2
∂
x
n
⋯
∂
x
n
∂
x
n
]
=
[
1
0
⋯
0
0
1
⋯
0
⋮
⋮
⋱
⋮
0
0
⋯
1
]
=
E
\frac{\partial \boldsymbol{x}^T}{\partial \boldsymbol{x}}=\left[ \begin{matrix} \frac{\partial x_1}{\partial x_1}& \frac{\partial x_2}{\partial x_1}& \cdots& \frac{\partial x_n}{\partial x_1}\\ \\ \frac{\partial x_1}{\partial x_2}& \frac{\partial x_2}{\partial x_2}& \cdots& \frac{\partial x_n}{\partial x_2}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial x_1}{\partial x_n}& \frac{\partial x_2}{\partial x_n}& \cdots& \frac{\partial x_n}{\partial x_n}\\ \end{matrix} \right] =\left[ \begin{matrix} 1& 0& \cdots& 0\\ \\ 0& 1& \cdots& 0\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ 0& 0& \cdots& 1\\ \end{matrix} \right] =\boldsymbol{E}
∂x∂xT=
∂x1∂x1∂x2∂x1⋮∂xn∂x1∂x1∂x2∂x2∂x2⋮∂xn∂x2⋯⋯⋱⋯∂x1∂xn∂x2∂xn⋮∂xn∂xn
=
10⋮001⋮0⋯⋯⋱⋯00⋮1
=E
【例4.3】求
∂
(
x
T
A
)
∂
x
\frac{\partial \left( \boldsymbol{x}^T\boldsymbol{A} \right)}{\partial \boldsymbol{x}}
∂x∂(xTA)
其中
x
\boldsymbol{x}
x为
n
n
n维列向量,
A
\boldsymbol{A}
A为
n
×
m
n\times m
n×m维常数阵。
【解】设
A
=
[
α
1
α
2
⋯
α
m
]
\boldsymbol{A}=\left[ \begin{matrix} \boldsymbol{\alpha }_1& \boldsymbol{\alpha }_2& \cdots& \boldsymbol{\alpha }_m\\ \end{matrix} \right]
A=[α1α2⋯αm]
其中
α
i
=
[
α
i
1
α
i
2
⋯
α
i
n
]
T
\boldsymbol{\alpha }_i=\left[ \begin{matrix} \alpha _{i1}& \alpha _{i2}& \cdots& \alpha\\ \end{matrix}_{in} \right] ^T
αi=[αi1αi2⋯αin]T
为
n
n
n维列向量。因此:
x
T
A
=
[
x
T
α
1
x
T
α
2
⋯
x
T
α
m
]
\boldsymbol{x}^T\boldsymbol{A}=\left[ \begin{matrix} \boldsymbol{x}^T\boldsymbol{\alpha }_1& \boldsymbol{x}^T\boldsymbol{\alpha }_2& \cdots& \boldsymbol{x}^T\boldsymbol{\alpha }_m\\ \end{matrix} \right]
xTA=[xTα1xTα2⋯xTαm]
根据定义
∂
(
x
T
A
)
∂
x
=
[
∂
(
x
T
α
1
)
∂
x
∂
(
x
T
α
2
)
∂
x
⋯
∂
(
x
T
α
m
)
∂
x
]
\frac{\partial \left( \boldsymbol{x}^T\boldsymbol{A} \right)}{\partial \boldsymbol{x}}=\left[ \begin{matrix} \frac{\partial \left( \boldsymbol{x}^T\boldsymbol{\alpha }_1 \right)}{\partial \boldsymbol{x}}& \frac{\partial \left( \boldsymbol{x}^T\boldsymbol{\alpha }_2 \right)}{\partial \boldsymbol{x}}& \cdots& \frac{\partial \left( \boldsymbol{x}^T\boldsymbol{\alpha }_m \right)}{\partial \boldsymbol{x}}\\ \end{matrix} \right]
∂x∂(xTA)=[∂x∂(xTα1)∂x∂(xTα2)⋯∂x∂(xTαm)]
其中每一个列向量:
∂
(
x
T
α
i
)
∂
x
=
∂
x
T
∂
x
⋅
α
i
+
∂
α
i
T
∂
x
⋅
x
=
α
i
\frac{\partial \left( \boldsymbol{x}^T\boldsymbol{\alpha }_i \right)}{\partial \boldsymbol{x}}=\frac{\partial \boldsymbol{x}^T}{\partial \boldsymbol{x}}\cdot \boldsymbol{\alpha }_i+\frac{\partial {\boldsymbol{\alpha }_i}^T}{\partial \boldsymbol{x}}\cdot \boldsymbol{x}=\boldsymbol{\alpha }_i
∂x∂(xTαi)=∂x∂xT⋅αi+∂x∂αiT⋅x=αi
因此有:
∂
(
x
T
A
)
∂
x
=
[
α
1
α
2
⋯
α
m
]
=
A
\frac{\partial \left( \boldsymbol{x}^T\boldsymbol{A} \right)}{\partial \boldsymbol{x}}=\left[ \begin{matrix} \boldsymbol{\alpha }_1& \boldsymbol{\alpha }_2& \cdots& \boldsymbol{\alpha }_m\\ \end{matrix} \right] =\boldsymbol{A}
∂x∂(xTA)=[α1α2⋯αm]=A
【推论】
若
A
\boldsymbol{A}
A为
n
×
n
n\times n
n×n方阵,则:
∂
x
T
A
T
∂
x
=
A
T
\frac{\partial \boldsymbol{x}^T\boldsymbol{A}^T}{\partial \boldsymbol{x}}=\boldsymbol{A}^T
∂x∂xTAT=AT
【例4.4】求
∂
(
B
x
)
∂
x
T
\frac{\partial \left( \boldsymbol{Bx} \right)}{\partial \boldsymbol{x}^T}
∂xT∂(Bx)
其中
x
\boldsymbol{x}
x为
n
n
n维列向量,
B
\boldsymbol{B}
B为
m
×
n
m\times n
m×n矩阵。
【解】
记
β
i
\boldsymbol{\beta }_i
βi为
n
n
n维列向量,则矩阵
B
\boldsymbol{B}
B写成:
B
=
[
β
1
T
β
2
T
⋯
β
m
T
]
T
\boldsymbol{B}=\left[ \begin{matrix} {\boldsymbol{\beta }_1}^T& {\boldsymbol{\beta }_2}^T& \cdots& \boldsymbol{\beta }_m\\ \end{matrix}^T \right] ^T
B=[β1Tβ2T⋯βmT]T
则:
B
x
=
[
β
1
T
x
β
2
T
x
⋯
β
m
T
x
]
T
\boldsymbol{Bx}=\left[ \begin{matrix} {\boldsymbol{\beta }_1}^T\boldsymbol{x}& {\boldsymbol{\beta }_2}^T\boldsymbol{x}& \cdots& \boldsymbol{\beta }_m\\ \end{matrix}^T\boldsymbol{x} \right] ^T
Bx=[β1Txβ2Tx⋯βmTx]T
∂
(
B
x
)
∂
x
T
=
[
∂
(
β
1
T
x
)
∂
x
T
∂
(
β
2
T
x
)
∂
x
T
⋯
∂
(
β
m
T
x
)
∂
x
T
]
T
\frac{\partial \left( \boldsymbol{Bx} \right)}{\partial \boldsymbol{x}^T}=\left[ \begin{matrix} \frac{\partial \left( {\boldsymbol{\beta }_1}^T\boldsymbol{x} \right)}{\partial \boldsymbol{x}^T}& \frac{\partial \left( {\boldsymbol{\beta }_2}^T\boldsymbol{x} \right)}{\partial \boldsymbol{x}^T}& \cdots& \frac{\partial \left( {\boldsymbol{\beta }_m}^T\boldsymbol{x} \right)}{\partial \boldsymbol{x}^T}\\ \end{matrix} \right] ^T
∂xT∂(Bx)=[∂xT∂(β1Tx)∂xT∂(β2Tx)⋯∂xT∂(βmTx)]T
其中每一个列向量,
∂
(
β
i
T
x
)
∂
x
T
=
[
∂
(
β
i
T
x
)
T
∂
x
]
T
=
[
∂
(
x
T
β
i
)
∂
x
]
T
=
[
∂
x
T
∂
x
⋅
β
i
+
∂
β
i
T
∂
x
⋅
x
T
]
T
=
β
i
T
\begin{aligned} \frac{\partial \left( {\boldsymbol{\beta }_i}^T\boldsymbol{x} \right)}{\partial \boldsymbol{x}^T}&=\left[ \frac{\partial \left( {\boldsymbol{\beta }_i}^T\boldsymbol{x} \right) ^T}{\partial \boldsymbol{x}} \right] ^T=\left[ \frac{\partial \left( \boldsymbol{x}^T\boldsymbol{\beta }_i \right)}{\partial \boldsymbol{x}} \right] ^T \\ \\ \ \\ &=\left[ \frac{\partial \boldsymbol{x}^T}{\partial \boldsymbol{x}}\cdot \boldsymbol{\beta }_i+\frac{\partial {\boldsymbol{\beta }_i}^T}{\partial \boldsymbol{x}}\cdot \boldsymbol{x}^T \right] ^T \\ \ \\ &={\boldsymbol{\beta }_i}^T \end{aligned}
∂xT∂(βiTx) =
∂x∂(βiTx)T
T=[∂x∂(xTβi)]T=[∂x∂xT⋅βi+∂x∂βiT⋅xT]T=βiT
因此有:
∂
(
β
i
T
x
)
∂
x
T
=
[
β
1
T
β
2
T
⋮
β
m
T
]
=
B
\frac{\partial \left( {\boldsymbol{\beta }_i}^T\boldsymbol{x} \right)}{\partial \boldsymbol{x}^T}=\left[ \begin{array}{c} {\boldsymbol{\beta }_1}^T\\ \\ {\boldsymbol{\beta }_2}^T\\ \\ \vdots\\ \\ {\boldsymbol{\beta }_m}^T\\ \end{array} \right] =\boldsymbol{B}
∂xT∂(βiTx)=
β1Tβ2T⋮βmT
=B
【例4.5】求二次型
x
T
A
x
\boldsymbol{x}^T\boldsymbol{Ax}
xTAx对
x
\boldsymbol{x}
x的导数,其中
A
\boldsymbol{A}
A为对称矩阵。
【解】根据
∂
[
a
T
(
x
)
⋅
b
(
x
)
]
∂
x
=
∂
a
T
(
x
)
∂
x
⋅
b
(
x
)
+
∂
b
T
(
x
)
∂
x
⋅
a
(
x
)
\frac{\partial \left[ \boldsymbol{a}^T\left( \boldsymbol{x} \right) \cdot \boldsymbol{b}\left( \boldsymbol{x} \right) \right]}{\partial \boldsymbol{x}}=\frac{\partial \boldsymbol{a}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{b}\left( \boldsymbol{x} \right) +\frac{\partial \boldsymbol{b}^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{a}\left( \boldsymbol{x} \right)
∂x∂[aT(x)⋅b(x)]=∂x∂aT(x)⋅b(x)+∂x∂bT(x)⋅a(x)
有:
∂
[
x
T
A
x
]
∂
x
=
∂
x
T
∂
x
⋅
(
A
x
)
+
∂
(
A
x
)
T
∂
x
⋅
x
=
A
x
+
∂
(
x
T
A
T
)
∂
x
⋅
x
=
A
x
+
A
T
x
=
(
A
+
A
T
)
x
=
2
A
x
\begin{aligned} \frac{\partial \left[ \boldsymbol{x}^T\boldsymbol{Ax} \right]}{\partial \boldsymbol{x}}&=\frac{\partial \boldsymbol{x}^T}{\partial \boldsymbol{x}}\cdot \left( \boldsymbol{Ax} \right) +\frac{\partial \left( \boldsymbol{Ax} \right) ^T}{\partial \boldsymbol{x}}\cdot \boldsymbol{x} \\ &=\boldsymbol{Ax}+\frac{\partial \left( \boldsymbol{x}^T\boldsymbol{A}^T \right)}{\partial \boldsymbol{x}}\cdot \boldsymbol{x} \\ &=\boldsymbol{Ax}+\boldsymbol{A}^T\boldsymbol{x}=\left( \boldsymbol{A}+\boldsymbol{A}^T \right) \boldsymbol{x} \\ &=2\boldsymbol{Ax} \end{aligned}
∂x∂[xTAx]=∂x∂xT⋅(Ax)+∂x∂(Ax)T⋅x=Ax+∂x∂(xTAT)⋅x=Ax+ATx=(A+AT)x=2Ax
即:
∂
[
x
T
A
x
]
∂
x
=
2
A
x
\frac{\partial \left[ \boldsymbol{x}^T\boldsymbol{Ax} \right]}{\partial \boldsymbol{x}}=2\boldsymbol{Ax}
∂x∂[xTAx]=2Ax
又:
∂
α
T
(
x
)
∂
x
=
[
∂
α
(
x
)
∂
x
T
]
T
,
∂
α
(
x
)
∂
x
T
=
[
∂
α
T
(
x
)
∂
x
]
T
\frac{\partial \boldsymbol{\alpha }^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}=\left[ \frac{\partial \boldsymbol{\alpha }\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}^T} \right] ^T,\frac{\partial \boldsymbol{\alpha }\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}^T}=\left[ \frac{\partial \boldsymbol{\alpha }^T\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}} \right] ^T
∂x∂αT(x)=[∂xT∂α(x)]T,∂xT∂α(x)=[∂x∂αT(x)]T
故:
∂
[
x
T
A
x
]
∂
x
T
=
[
∂
[
x
T
A
x
]
T
∂
x
]
T
=
[
∂
[
x
T
A
x
]
∂
x
]
T
=
2
(
A
x
)
T
=
2
x
T
A
\frac{\partial \left[ \boldsymbol{x}^T\boldsymbol{Ax} \right]}{\partial \boldsymbol{x}^T}=\left[ \frac{\partial \left[ \boldsymbol{x}^T\boldsymbol{Ax} \right] ^T}{\partial \boldsymbol{x}} \right] ^T=\left[ \frac{\partial \left[ \boldsymbol{x}^T\boldsymbol{Ax} \right]}{\partial \boldsymbol{x}} \right] ^T=2\left( \boldsymbol{Ax} \right) ^T=2\boldsymbol{x}^T\boldsymbol{A}
∂xT∂[xTAx]=[∂x∂[xTAx]T]T=[∂x∂[xTAx]]T=2(Ax)T=2xTA
【例4.6】求函数
λ
T
A
x
\boldsymbol{\lambda }^T\boldsymbol{Ax}
λTAx对
x
\boldsymbol{x}
x的导数。其中
λ
T
\boldsymbol{\lambda }^T
λT为
1
×
n
1\times n
1×n的行向量,
A
\boldsymbol{A}
A为
n
×
n
n\times n
n×n的常数矩阵,
x
\boldsymbol{x }
x为
n
n
n维列向量。
【解】因为
λ
T
A
x
\boldsymbol{\lambda }^T\boldsymbol{Ax}
λTAx为标量,其与其转置相等:
λ
T
A
x
=
(
λ
T
A
x
)
T
=
x
T
A
λ
\boldsymbol{\lambda }^T\boldsymbol{Ax}=\left( \boldsymbol{\lambda }^T\boldsymbol{Ax} \right) ^T=\boldsymbol{x}^T\boldsymbol{A\lambda }
λTAx=(λTAx)T=xTAλ
于是:
∂
(
λ
T
A
x
)
∂
x
=
∂
(
x
T
A
T
λ
)
∂
x
=
A
T
λ
\frac{\partial \left( \boldsymbol{\lambda }^T\boldsymbol{Ax} \right)}{\partial \boldsymbol{x}}=\frac{\partial \left( \boldsymbol{x}^T\boldsymbol{A}^T\boldsymbol{\lambda } \right)}{\partial \boldsymbol{x}}=\boldsymbol{A}^T\boldsymbol{\lambda }
∂x∂(λTAx)=∂x∂(xTATλ)=ATλ
5 向量对矩阵求导
5.1 定义
设矩阵
X
m
×
n
=
[
x
11
x
12
⋯
x
1
n
x
21
x
22
⋯
x
2
n
⋮
⋮
⋱
⋮
x
m
1
x
m
2
⋯
x
m
n
]
\boldsymbol{X}_{m\times n}=\left[ \begin{matrix} x_{11}& x_{12}& \cdots& x_{1n}\\ x_{21}& x_{22}& \cdots& x_{2n}\\ \vdots& \vdots& \ddots& \vdots\\ x_{m1}& x_{m2}& \cdots& x_{mn}\\ \end{matrix} \right]
Xm×n=
x11x21⋮xm1x12x22⋮xm2⋯⋯⋱⋯x1nx2n⋮xmn
以矩阵
X
\boldsymbol{X}
X为自变量的
n
n
n维列向量函数:
z
(
X
)
=
[
z
1
(
X
)
z
2
(
X
)
⋯
z
n
(
X
)
]
T
\boldsymbol{z}\left( \boldsymbol{X} \right) =\left[ \begin{matrix} z_1\left( \boldsymbol{X} \right)& z_2\left( \boldsymbol{X} \right)& \cdots& z_n\\ \end{matrix}\left( \boldsymbol{X} \right) \right] ^T
z(X)=[z1(X)z2(X)⋯zn(X)]T
在分子布局下,有:
∂
z
(
X
)
∂
X
=
[
∂
z
1
(
X
)
∂
X
∂
z
2
(
X
)
∂
X
⋯
∂
z
n
(
X
)
∂
X
]
T
\frac{\partial \boldsymbol{z}\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}=\left[ \begin{matrix} \frac{\partial z_1\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}& \frac{\partial z_2\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}& \cdots& \frac{\partial z_n\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}\\ \end{matrix} \right] ^T
∂X∂z(X)=[∂X∂z1(X)∂X∂z2(X)⋯∂X∂zn(X)]T
其中:
∂
z
i
(
X
)
∂
X
=
[
∂
z
i
(
X
)
∂
x
11
∂
z
i
(
X
)
∂
x
12
⋯
∂
z
i
(
X
)
∂
x
1
n
∂
z
i
(
X
)
∂
x
21
∂
z
i
(
X
)
∂
x
22
⋯
∂
z
i
(
X
)
∂
x
2
n
⋮
⋮
⋱
⋮
∂
z
i
(
X
)
∂
x
m
1
∂
z
i
(
X
)
∂
x
m
2
⋯
∂
z
i
(
X
)
∂
x
m
n
]
\frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}=\left[ \begin{matrix} \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{11}}& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{12}}& \cdots& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{1n}}\\ \\ \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{21}}& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{22}}& \cdots& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{2n}}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{m1}}& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{m2}}& \cdots& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{mn}}\\ \end{matrix} \right]
∂X∂zi(X)=
∂x11∂zi(X)∂x21∂zi(X)⋮∂xm1∂zi(X)∂x12∂zi(X)∂x22∂zi(X)⋮∂xm2∂zi(X)⋯⋯⋱⋯∂x1n∂zi(X)∂x2n∂zi(X)⋮∂xmn∂zi(X)
在分母布局下,有:
∂
z
T
(
X
)
∂
X
=
[
∂
z
1
(
X
)
∂
X
∂
z
2
(
X
)
∂
X
⋯
∂
z
n
(
X
)
∂
X
]
\frac{\partial \boldsymbol{z}^T\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}=\left[ \begin{matrix} \frac{\partial z_1\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}& \frac{\partial z_2\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}& \cdots& \frac{\partial z_n\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}\\ \end{matrix} \right]
∂X∂zT(X)=[∂X∂z1(X)∂X∂z2(X)⋯∂X∂zn(X)]
其中:
∂
z
i
(
X
)
∂
X
=
[
∂
z
i
(
X
)
∂
x
11
∂
z
i
(
X
)
∂
x
12
⋯
∂
z
i
(
X
)
∂
x
1
n
∂
z
i
(
X
)
∂
x
21
∂
z
i
(
X
)
∂
x
22
⋯
∂
z
i
(
X
)
∂
x
2
n
⋮
⋮
⋱
⋮
∂
z
i
(
X
)
∂
x
m
1
∂
z
i
(
X
)
∂
x
m
2
⋯
∂
z
i
(
X
)
∂
x
m
n
]
\frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}=\left[ \begin{matrix} \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{11}}& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{12}}& \cdots& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{1n}}\\ \\ \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{21}}& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{22}}& \cdots& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{2n}}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{m1}}& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{m2}}& \cdots& \frac{\partial z_i\left( \boldsymbol{X} \right)}{\partial x_{mn}}\\ \end{matrix} \right]
∂X∂zi(X)=
∂x11∂zi(X)∂x21∂zi(X)⋮∂xm1∂zi(X)∂x12∂zi(X)∂x22∂zi(X)⋮∂xm2∂zi(X)⋯⋯⋱⋯∂x1n∂zi(X)∂x2n∂zi(X)⋮∂xmn∂zi(X)
5.2 形状规则
向量
y
\boldsymbol{y}
y对矩阵
X
\boldsymbol{X}
X求导,分为两步:
Step1:向量
y
\boldsymbol{y}
y的每个元素是标量,先做
y
\boldsymbol{y}
y的每个元素对矩阵
X
\boldsymbol{X}
X求导,这里按照标量对矩阵的求导规则进行。
Step2:第一步完成后,将求导结果按
y
\boldsymbol{y}
y的形状排列。
详细内容请阅读参考文献【1】。
参考文献
[1] 向量对矩阵求导
[2] 向量,标量对向量求导数