求导后维度本质
求导的本质:
d
A
d
B
\frac{dA}{dB}
dBdAA中每一个元素对B中每一个元素求导。
考虑以下情景
1、A/B都是1
×
\times
× 1矩阵,求导得到
1
×
1
1 \times 1
1×1矩阵
2、A是
1
×
p
1 \times p
1×p矩阵,B是
1
×
n
1 \times n
1×n矩阵,A中每个元素对B中每个元素求导,排列组合得到
p
×
n
p \times n
p×n个元素
3、A是
q
×
p
q \times p
q×p矩阵,B是
m
×
n
m \times n
m×n矩阵,A中每个元素对B中每个元素求导,排列组合得到
q
×
p
×
m
×
n
q \times p \times m\times n
q×p×m×n个元素
向量求导的 Y X拉伸技巧
技巧核心:
1、标量不变,向量拉伸
2、前面横向拉,后面纵向拉(dy/dx的话就是Y横向拉伸,X纵向拉伸)
上一节写到,求导得到的元素非常多,因此需要有合适的方法把元素写出来。
例子1:
f(x)是标量函数,X是
n
×
1
n \times 1
n×1列向量(x1,x2,…,xn):
d
f
(
x
)
d
x
\frac{d f(x)}{d x}
dxdf(x)
标量不变,所以f(x)不需要拉伸。
X纵向拉伸,就称为一个向量:
d
f
(
x
)
d
x
=
[
∂
f
(
x
)
∂
x
1
∂
f
(
x
)
∂
x
2
⋮
∂
f
(
x
)
∂
x
n
]
\frac{d f(x)}{d x}=\left[\begin{array}{c} \frac{\partial f(x)}{\partial x_{1}} \\ \frac{\partial f(x)}{\partial x_{2}} \\ \vdots \\ \frac{\partial f(x)}{\partial x_{n}} \end{array}\right]
dxdf(x)=⎣⎢⎢⎢⎢⎡∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x)⎦⎥⎥⎥⎥⎤
向量纵向拉伸,相当于对每个分量求导的结果拉成列向量。
最终得到
n
×
1
n \times 1
n×1列向量。
例子2
f(x)是列向量函数
n
×
1
n \times 1
n×1,X是标量:
同理,前面的Y横向拉,而且x不要拉,得到:
d
f
(
x
)
d
x
=
[
∂
f
(
x
)
∂
x
∂
f
2
(
x
)
∂
x
⋯
⋅
∂
f
n
(
x
)
∂
x
]
\frac{d f(x)}{d x}=\left[\frac{\partial f(x)}{\partial x} \frac{\partial f_{2}(x)}{\partial x} \cdots \cdot \frac{\partial f_{n}(x)}{\partial x}\right]
dxdf(x)=[∂x∂f(x)∂x∂f2(x)⋯⋅∂x∂fn(x)]
最终是一个横着的
1
×
n
1 \times n
1×n向量。
例子3
两个都是向量
f
(
x
)
:
[
f
(
x
)
f
2
(
x
)
⋮
f
n
(
x
)
]
x
=
[
x
1
x
2
⋮
⋮
x
n
]
f(x):\left[\begin{array}{c} f(x) \\ f_{2}(x) \\ \vdots \\ f_{n}(x) \end{array}\right] \quad x=\left[\begin{array}{c} x_{1} \\ x_{2} \\ \vdots \\ \vdots \\ x_{n} \end{array}\right]
f(x):⎣⎢⎢⎢⎡f(x)f2(x)⋮fn(x)⎦⎥⎥⎥⎤x=⎣⎢⎢⎢⎢⎢⎢⎡x1x2⋮⋮xn⎦⎥⎥⎥⎥⎥⎥⎤
首先,拉成一个列向量,每个都是f(x)对标量求导,之后再横向拉开:
第一步:
d
f
(
x
)
d
x
=
[
∂
f
(
x
)
∂
x
∂
f
(
x
)
∂
x
2
⋮
∂
f
(
x
)
∂
x
n
]
\frac{d f(x)}{d x}=\left[\begin{array}{c} \frac{\partial f(x)}{\partial x} \\ \frac{\partial f(x)}{\partial x_{2}} \\ \vdots \\ \frac{\partial f(x)}{\partial x_{n}} \end{array}\right]
dxdf(x)=⎣⎢⎢⎢⎢⎡∂x∂f(x)∂x2∂f(x)⋮∂xn∂f(x)⎦⎥⎥⎥⎥⎤
第二步就得到:
[
∂
f
(
x
)
∂
x
1
∂
f
2
(
x
)
∂
x
1
⋯
∂
f
n
(
x
)
∂
x
1
∂
f
1
(
x
)
∂
x
2
∂
f
2
r
x
1
∂
x
2
⋯
∂
f
n
(
x
)
∂
x
2
∂
f
1
∂
x
n
∂
f
2
(
x
)
∂
n
⋯
∂
f
n
(
x
)
∂
λ
n
]
\left[\begin{array}{ll} \frac{\partial f(x)}{\partial x_{1}} & \frac{\partial f_{2}(x)}{\partial x_{1}} \cdots \frac{\partial f_{n}(x)}{\partial x_{1}} \\ \frac{\partial f_{1}(x)}{\partial x_{2}} & \frac{\partial f_{2} r x_{1}}{\partial x_{2}} \cdots \frac{\partial f_{n}(x)}{\partial x_{2}} \\ \frac{\partial f_{1}}{\partial x_{n}} & \frac{\partial f_{2}(x)}{\partial_{n}}\cdots \frac{\partial f_{n}(x)}{\partial \lambda_{n}} \end{array}\right]
⎣⎢⎡∂x1∂f(x)∂x2∂f1(x)∂xn∂f1∂x1∂f2(x)⋯∂x1∂fn(x)∂x2∂f2rx1⋯∂x2∂fn(x)∂n∂f2(x)⋯∂λn∂fn(x)⎦⎥⎤
得到了
n
×
n
n \times n
n×n个元素,符合第一节的原理。
推导矩阵求导公式
最简单的f(x):
f
(
x
)
=
A
T
x
=
[
a
1
a
2
⋮
a
n
]
n
×
1
x
=
[
x
2
x
2
⋮
x
n
]
f(x)=A^{T} x=\left[\begin{array}{c} a_{1} \\ a_{2} \\ \vdots \\ a_{n} \end{array}\right]_{n \times 1} \quad x=\left[\begin{array}{c} x_{2} \\ x_{2} \\ \vdots \\ x_{n} \end{array}\right]
f(x)=ATx=⎣⎢⎢⎢⎡a1a2⋮an⎦⎥⎥⎥⎤n×1x=⎣⎢⎢⎢⎡x2x2⋮xn⎦⎥⎥⎥⎤
此时f(x)是标量(
∑
i
=
1
n
a
i
x
i
\sum_{i=1}^{n} a_{i} x_{i}
∑i=1naixi),根据第二节有:
d f ( x ) d x = [ ∂ f ( x ) ∂ x ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] = [ a 1 a 2 ⋮ a 3 ] = A \frac{d f(x)}{d x}=\left[\begin{array}{c} \frac{\partial f(x)}{\partial x} \\ \frac{\partial f(x)}{\partial x_{2}} \\ \vdots \\ \frac{\partial f(x)}{\partial x_{n}} \end{array}\right] =\left[\begin{array}{c} a_1 \\ a_2 \\ \vdots \\ a_3 \end{array}\right] =A dxdf(x)=⎣⎢⎢⎢⎢⎡∂x∂f(x)∂x2∂f(x)⋮∂xn∂f(x)⎦⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎡a1a2⋮a3⎦⎥⎥⎥⎤=A
此处 f ( x ) = A ⊤ ⋅ x = x ⊤ ⋅ A f(x)=A^{\top} \cdot x=x^{\top} \cdot A f(x)=A⊤⋅x=x⊤⋅A即有很常用的 d A ⋅ x d x = d x ⊤ ⋅ A d x = A \frac{d A \cdot x}{d x}=\frac{d x^{\top} \cdot A}{d x} = A dxdA⋅x=dxdx⊤⋅A=A
例子2
λ
=
[
x
1
x
2
⋮
⋮
x
n
]
A
=
[
a
11
a
12
⋯
a
1
n
a
21
a
22
⋯
a
2
n
⋮
a
n
1
a
n
2
⋯
a
n
n
]
\lambda=\left[\begin{array}{c} x_{1} \\ x_{2} \\ \vdots \\ \vdots \\ x_{n} \end{array}\right] \quad A=\left[\begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1 n} \\ a_{21} & a_{22} & \cdots & a_{2 n} \\ \vdots & & & \\ a_{n 1} & a_{n 2} & \cdots & a_{n n} \end{array}\right]
λ=⎣⎢⎢⎢⎢⎢⎢⎡x1x2⋮⋮xn⎦⎥⎥⎥⎥⎥⎥⎤A=⎣⎢⎢⎢⎡a11a21⋮an1a12a22an2⋯⋯⋯a1na2nann⎦⎥⎥⎥⎤
f ( x ) = x T A x f(x)=x^TAx f(x)=xTAx, 求 d f ( x ) d x \frac{df(x)}{dx} dxdf(x)
第一拉伸:
d
f
(
x
)
d
x
=
[
∂
f
(
x
)
∂
x
1
∂
f
(
x
)
∂
x
2
⋮
∂
f
(
x
)
∂
x
n
]
\frac{d f(x)}{d x}=\left[\begin{array}{c} \frac{\partial f(x)}{\partial x_1} \\ \frac{\partial f(x)}{\partial x_{2}} \\ \vdots \\ \frac{\partial f(x)}{\partial x_{n}} \end{array}\right]
dxdf(x)=⎣⎢⎢⎢⎢⎡∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x)⎦⎥⎥⎥⎥⎤
考虑原式中:
f
(
x
)
=
∑
i
=
1
n
∑
j
=
1
r
a
i
j
x
j
x
j
f(x)=\sum_{i=1}^{n} \sum_{j=1}^{r} a_{i j} x_{j} x_{j}
f(x)=i=1∑nj=1∑raijxjxj
线性代数的东西,看起来复杂,其实没那么难,代入稍微化简一步之后:
[
∑
j
=
1
n
a
1
j
x
j
+
∑
i
=
1
n
a
i
2
x
i
.
.
.
∑
j
=
1
n
a
n
j
x
j
+
∑
i
=
1
n
a
i
n
x
i
]
\left[\begin{array}{c} \sum_{j=1}^{n} a_{1j} x_{j}+\sum_{i=1}^{n} a_{i 2} x_{i} \\ ... \\ \sum_{j=1}^{n} a_{n j x_{j}}+\sum_{i=1}^{n} {a_{i n}} x_{i} \end{array}\right]
⎣⎡∑j=1na1jxj+∑i=1nai2xi...∑j=1nanjxj+∑i=1nainxi⎦⎤
上式即为:
[
a
11
a
12
⋯
a
1
n
a
21
a
22
⋯
a
2
n
⋮
a
n
1
a
n
2
⋯
a
n
n
]
[
x
1
x
2
⋮
x
n
]
+
[
a
11
a
21
⋯
a
n
1
a
12
a
22
⋯
a
12
⋮
a
i
n
a
2
n
⋯
a
n
n
]
[
x
1
x
2
⋮
x
n
]
=
A
X
+
A
T
X
\left[\begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1 n} \\ a_{21} & a_{22} & \cdots & a_{2 n} \\ \vdots & & & \\ a_{n 1} & a_{n 2} & \cdots & a_{n n} \end{array}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{array}\right]+\left[\begin{array}{ccc} a_{11} & a_{21} & \cdots & a_{n 1} \\ a_{12} & a_{22} & \cdots & a_{12} \\ \vdots & & \\ a_{in} & a_{2 n} & \cdots & a_{n n} \end{array}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{array}\right]=AX+A^TX
⎣⎢⎢⎢⎡a11a21⋮an1a12a22an2⋯⋯⋯a1na2nann⎦⎥⎥⎥⎤⎣⎢⎢⎢⎡x1x2⋮xn⎦⎥⎥⎥⎤+⎣⎢⎢⎢⎡a11a12⋮aina21a22a2n⋯⋯⋯an1a12ann⎦⎥⎥⎥⎤⎣⎢⎢⎢⎡x1x2⋮xn⎦⎥⎥⎥⎤=AX+ATX
应用:最小二乘求导
笔者手推
应用2:和矩阵相求导
更正上图中最后一个向量的 1,1 位置应该是2xq
更正上图中最后矩阵的 1,1,1 位置应该是2xq
附录
常用公式
d
V
⊤
U
d
x
=
∂
U
∂
x
V
+
∂
V
∂
x
U
\frac{d V^{\top} U}{d x}=\frac{\partial U}{\partial x} V+\frac{\partial V}{\partial x} U
dxdV⊤U=∂x∂UV+∂x∂VU
d ( U + V ) d x = d U d x + d V d x \frac{d(U+V)}{d x}=\frac{d U}{d x}+\frac{d V}{d x} dxd(U+V)=dxdU+dxdV