文章目录
0.参考链接
1.标量函数和向量函数
{ y 1 = w 1 x 11 + w 2 x 12 + ⋯ + w n x 1 n ⋮ y m = w 1 x m 1 + w 2 x m 2 + ⋯ + w n x m n \begin{cases} y_1=w_1x_{11}+w_2x_{12}+\cdots+w_nx_{1n} \\ \vdots \\ y_m=w_1x_{m1}+w_2x_{m2}+\cdots+w_nx_{mn} \end{cases} ⎩ ⎨ ⎧y1=w1x11+w2x12+⋯+wnx1n⋮ym=w1xm1+w2xm2+⋯+wnxmn
1.1标量函数:
1.输出为标量的函数为标量函数:
输入,输出都是标量:
f
(
x
)
=
x
2
,
R
x
→
R
x
2
f(x)=x^2,\R_x \rightarrow \R_{x^2}
f(x)=x2,Rx→Rx2
输入不是标量,输出是标量:
f
(
x
)
=
x
1
2
+
x
2
2
,
R
(
x
1
,
x
2
)
T
2
→
R
x
1
2
+
x
2
2
f(x)=x_1^2+x_2^2,\R^2_{(x_1,x_2)^T} \rightarrow \R_{x_1^2+x_2^2}
f(x)=x12+x22,R(x1,x2)T2→Rx12+x22
2.输出为向量的函数为向量函数:
输入是标量,输出是向量:
f
(
x
)
=
[
f
1
(
x
)
=
x
f
2
(
x
)
=
x
2
]
,
R
x
→
R
(
x
,
x
2
)
T
2
f(x)=\left[ \begin{matrix}f_1(x)=x \\ f_2(x)=x^2\end{matrix} \right],\R_x \rightarrow \R^2_{(x,x^2)^T}
f(x)=[f1(x)=xf2(x)=x2],Rx→R(x,x2)T2
输入,输出都是向量:
f
(
x
)
=
[
f
11
(
x
)
=
x
f
12
(
x
)
=
x
2
f
21
(
x
)
=
x
3
f
22
(
x
)
=
x
4
]
,
R
x
→
R
2
∗
2
f(x)=\left[ \begin{matrix}f_{11}(x)=x & f_{12}(x)=x^2\\f_{21}(x)=x^3 & f_{22}(x)=x^4\end{matrix} \right],\R_x \rightarrow \R^{2*2}
f(x)=[f11(x)=xf21(x)=x3f12(x)=x2f22(x)=x4],Rx→R2∗2
f ( x ) = [ f 11 ( x ) = x 1 + x 2 f 12 ( x ) = x 1 2 + x 2 2 f 21 ( x ) = x 1 3 + x 2 3 f 22 ( x ) = x 1 4 + x 2 4 ] , R ( x 1 , x 2 ) T 2 → R 2 ∗ 2 f(x)=\left[ \begin{matrix}f_{11}(x)=x_1+x_2 & f_{12}(x)=x_1^2+x_2^2\\f_{21}(x)=x_1^3+x_2^3 & f_{22}(x)=x_1^4+x_2^4\end{matrix} \right],\R^2_{(x_1,x_2)^T} \rightarrow \R^{2*2} f(x)=[f11(x)=x1+x2f21(x)=x13+x23f12(x)=x12+x22f22(x)=x14+x24],R(x1,x2)T2→R2∗2
1.2总结
x
→
标量,向量,矩阵
f
(
x
)
→
标量,向量,矩阵
x\rightarrow 标量,向量,矩阵\\f(x)\rightarrow 标量,向量,矩阵
x→标量,向量,矩阵f(x)→标量,向量,矩阵
只看标量和向量,那
d
f
(
x
)
d
x
\frac{df(x)}{dx}
dxdf(x)就有四种可能。
矩阵求导的本质:
d
A
d
B
\frac{dA}{dB}
dBdA就是矩阵
A
A
A中的每一个元素对矩阵
B
B
B中的每一个元素求导。
1.3从求导后元素个数的角度
A
1
∗
1
,
B
1
∗
1
→
d
A
d
B
1
∗
1
A_{1*1},B_{1*1} \rightarrow \frac{dA}{dB}_{1*1}
A1∗1,B1∗1→dBdA1∗1,
A
1
∗
p
,
B
1
∗
n
→
d
A
d
B
p
∗
n
A_{1*p},B_{1*n} \rightarrow \frac{dA}{dB}_{p*n}
A1∗p,B1∗n→dBdAp∗n,
A
q
∗
p
,
B
m
∗
n
→
d
A
d
B
p
∗
q
∗
m
∗
n
A_{q*p},B_{m*n} \rightarrow \frac{dA}{dB}_{p*q*m*n}
Aq∗p,Bm∗n→dBdAp∗q∗m∗n
2.求导法(YX拉伸法)(重点)
{
1.
标量不变,向量拉伸
2.
前面横向拉,后面纵向拉
\begin{cases} 1.标量不变,向量拉伸 \\ 2.前面横向拉,后面纵向拉 \end{cases}
{1.标量不变,向量拉伸2.前面横向拉,后面纵向拉
以下是三个例子:
例1:
d
f
(
x
)
d
x
,
f
(
x
)
=
f
(
x
1
,
⋯
,
x
n
)
是标量函数
,
x
=
[
x
1
,
x
2
,
⋯
,
x
n
]
T
是向量
\frac{df(x)}{dx},f(x)=f(x_1,\cdots,x_n)是标量函数,x=[x_1,x_2,\cdots,x_n]^T是向量
dxdf(x),f(x)=f(x1,⋯,xn)是标量函数,x=[x1,x2,⋯,xn]T是向量,
d
f
(
x
)
d
x
=
[
∂
f
(
x
)
∂
x
1
∂
f
(
x
)
∂
x
2
⋮
∂
f
(
x
)
∂
x
n
]
\frac{df(x)}{dx}=\left[ \begin{matrix}\frac{\partial f(x)}{\partial x_1} \\ \frac{\partial f(x)}{\partial x_2}\\ \vdots\\\frac{\partial f(x)}{\partial x_n}\end{matrix} \right]
dxdf(x)=
∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x)
,可以看出结果是
f
(
x
)
f(x)
f(x)是标量不变,
x
x
x是向量纵向拉伸,实际就是将多元函数的偏导数写在一个列向量中。
例2: d f ( x ) d x , f ( x ) = [ f 1 ( x ) f 2 ( x ) ⋮ f n ( x ) ] 是向量函数 , x 是标量 \frac{df(x)}{dx},f(x)=\left[ \begin{matrix}f_1(x)\\ f_2(x)\\\vdots\\f_n(x)\end{matrix} \right]是向量函数,x是标量 dxdf(x),f(x)= f1(x)f2(x)⋮fn(x) 是向量函数,x是标量, d f ( x ) d x = [ ∂ f 1 ( x ) ∂ x ∂ f 2 ( x ) ∂ x ⋯ ∂ f n ( x ) ∂ x ] \frac{df(x)}{dx}=\left[ \begin{matrix}\frac{\partial f_1(x)}{\partial x} &\frac{\partial f_2(x)}{\partial x}& \cdots&\frac{\partial f_n(x)}{\partial x}\end{matrix} \right] dxdf(x)=[∂x∂f1(x)∂x∂f2(x)⋯∂x∂fn(x)],符合前面横向拉伸。
例3:
d
f
(
x
)
d
x
,
f
(
x
)
=
[
f
1
(
x
)
f
2
(
x
)
⋮
f
n
(
x
)
]
是向量函数
,
[
x
1
,
x
2
,
⋯
,
x
n
]
T
是向量
\frac{df(x)}{dx},f(x)=\left[ \begin{matrix}f_1(x)\\ f_2(x)\\\vdots\\f_n(x)\end{matrix} \right]是向量函数,[x_1,x_2,\cdots,x_n]^T是向量
dxdf(x),f(x)=
f1(x)f2(x)⋮fn(x)
是向量函数,[x1,x2,⋯,xn]T是向量,
d
f
(
x
)
d
x
=
[
∂
f
(
x
)
∂
x
1
∂
f
(
x
)
∂
x
2
⋮
∂
f
(
x
)
∂
x
n
]
\frac{df(x)}{dx} =\left[ \begin{matrix} \frac{\partial f(x)}{\partial x_1} \\ \frac{\partial f(x)}{\partial x_2}\\ \vdots\\ \frac{\partial f(x)}{\partial x_n} \end{matrix} \right]
dxdf(x)=
∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x)
,此时
∂
f
(
x
)
\partial f(x)
∂f(x)是向量,而
x
1
x_1
x1等是标量,所以
d
f
(
x
)
d
x
=
[
∂
f
(
x
)
∂
x
1
∂
f
(
x
)
∂
x
2
⋮
∂
f
(
x
)
∂
x
n
]
=
[
∂
f
1
(
x
)
∂
x
1
∂
f
2
(
x
)
∂
x
1
⋯
∂
f
n
(
x
)
∂
x
1
∂
f
1
(
x
)
∂
x
2
∂
f
2
(
x
)
∂
x
2
⋯
∂
f
n
(
x
)
∂
x
2
⋮
⋮
⋯
⋮
∂
f
1
(
x
)
∂
x
n
∂
f
2
(
x
)
∂
x
n
⋯
∂
f
n
(
x
)
∂
x
n
]
\frac{df(x)}{dx} =\left[\begin{matrix} \frac{\partial f(x)}{\partial x_1} \\ \frac{\partial f(x)}{\partial x_2}\\ \vdots\\ \frac{\partial f(x)}{\partial x_n} \end{matrix} \right] =\left[ \begin{matrix} \frac{\partial f_1(x)}{\partial x_1} & \frac{\partial f_2(x)}{\partial x_1} & \cdots &\frac{\partial f_n(x)}{\partial x_1} &\\ \frac{\partial f_1(x)}{\partial x_2}& \frac{\partial f_2(x)}{\partial x_2}&\cdots & \frac{\partial f_n(x)}{\partial x_2}&\\ \vdots & \vdots & \cdots & \vdots\\ \frac{\partial f_1(x)}{\partial x_n} & \frac{\partial f_2(x)}{\partial x_n} & \cdots & \frac{\partial f_n(x)}{\partial x_n}\end{matrix} \right]
dxdf(x)=
∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x)
=
∂x1∂f1(x)∂x2∂f1(x)⋮∂xn∂f1(x)∂x1∂f2(x)∂x2∂f2(x)⋮∂xn∂f2(x)⋯⋯⋯⋯∂x1∂fn(x)∂x2∂fn(x)⋮∂xn∂fn(x)
,符合先纵向拉伸,再横向拉伸。
3.常见的矩阵求导公式推导
例1:
f
(
x
)
=
A
T
X
,
A
=
[
a
1
a
2
⋮
a
n
]
n
∗
1
,
X
=
[
x
1
x
2
⋮
x
n
]
n
∗
1
,
求
d
f
(
x
)
d
x
f(x)=A^TX,A=\left[\begin{matrix}a_1\\a_2\\\vdots\\a_n\end{matrix} \right]_{n*1},X=\left[\begin{matrix}x_1\\x_2\\\vdots\\x_n\end{matrix} \right]_{n*1},求\frac{df(x)}{dx}
f(x)=ATX,A=
a1a2⋮an
n∗1,X=
x1x2⋮xn
n∗1,求dxdf(x)
解:
f
(
x
)
是标量函数,
X
是向量,
f
(
x
)
=
A
T
X
=
∑
i
=
1
n
a
i
x
i
f(x)是标量函数,X是向量,f(x)=A^TX=\sum_{i=1}^na_ix_i
f(x)是标量函数,X是向量,f(x)=ATX=∑i=1naixi
d
f
(
x
)
d
X
=
[
∂
f
(
x
)
∂
x
1
∂
f
(
x
)
∂
x
2
⋮
∂
f
(
x
)
∂
x
n
]
=
[
a
1
a
2
⋮
a
n
]
=
A
\frac{df(x)}{dX}=\left[ \begin{matrix}\frac{\partial f(x)}{\partial x_1} \\ \frac{\partial f(x)}{\partial x_2}\\ \vdots\\\frac{\partial f(x)}{\partial x_n}\end{matrix} \right]=\left[ \begin{matrix}a_1 \\ a_2\\ \vdots \\ a_n\end{matrix} \right]=A
dXdf(x)=
∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x)
=
a1a2⋮an
=A,注意到,
f
(
x
)
=
A
T
X
=
X
T
A
(标
量
T
=
标量)
f(x)=A^TX=X^TA(标量^T=标量)
f(x)=ATX=XTA(标量T=标量),所以
d
A
T
X
d
X
=
d
X
T
A
d
X
=
A
\frac{dA^TX}{dX}=\frac{dX^TA}{dX}=A
dXdATX=dXdXTA=A。
例2,
f
(
x
)
=
X
T
A
X
,
X
=
[
x
1
x
2
⋮
x
n
]
n
∗
1
,
A
=
[
a
11
a
12
⋯
a
1
n
a
21
a
22
⋯
a
2
n
⋮
⋮
⋯
⋮
a
n
1
a
n
2
⋯
a
n
n
]
n
∗
n
,
求
d
f
(
x
)
d
X
f(x)=X^TAX,X=\left[\begin{matrix}x_1\\x_2\\\vdots\\x_n\end{matrix} \right]_{n*1},A=\left[\begin{matrix}a_{11}& a_{12}&\cdots&a_{1n}\\a_{21}& a_{22}&\cdots&a_{2n}\\\vdots&\vdots&\cdots&\vdots\\a_{n1}& a_{n2}&\cdots&a_{nn}\end{matrix} \right]_{n*n},求\frac{df(x)}{dX}
f(x)=XTAX,X=
x1x2⋮xn
n∗1,A=
a11a21⋮an1a12a22⋮an2⋯⋯⋯⋯a1na2n⋮ann
n∗n,求dXdf(x)
解:
f
(
x
)
=
X
T
A
X
是标量函数
f(x)=X^TAX是标量函数
f(x)=XTAX是标量函数,
f
(
x
)
=
[
x
1
x
2
⋯
x
n
]
⋅
[
a
11
a
12
⋯
a
1
n
a
21
a
22
⋯
a
2
n
⋮
⋮
⋯
⋮
a
n
1
a
n
2
⋯
a
n
n
]
⋅
[
x
1
x
2
⋮
x
n
]
f(x)= \left[\begin{matrix} x_1&x_2&\cdots&x_n \end{matrix} \right] \cdot \left[\begin{matrix} a_{11}& a_{12}&\cdots&a_{1n}\\ a_{21}& a_{22}&\cdots&a_{2n}\\ \vdots&\vdots&\cdots&\vdots\\ a_{n1}& a_{n2}&\cdots&a_{nn} \end{matrix} \right] \cdot \left[\begin{matrix} x_1\\ x_2\\ \vdots\\ x_n \end{matrix} \right]
f(x)=[x1x2⋯xn]⋅
a11a21⋮an1a12a22⋮an2⋯⋯⋯⋯a1na2n⋮ann
⋅
x1x2⋮xn
f ( x ) = ∑ i = 1 n ∑ j = 1 n a i j x i x j f(x)=\sum_{i=1}^n\sum_{j=1}^na_{ij}x_ix_j f(x)=∑i=1n∑j=1naijxixj,
d f ( x ) d X = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] = [ ∑ j = 1 n a 1 j x j + ∑ j = 1 n a i 1 x i ∑ j = 1 n a 2 j x j + ∑ i = 1 n a i 2 x i ⋮ ∑ j = 1 n a n j x j + ∑ i = 1 n a i n x i ] = [ ∑ j = 1 n a 1 j x j ∑ j = 1 n a 2 j x j ⋮ ∑ j = 1 n a n j x j ] + [ ∑ j = 1 n a i 1 x i ∑ i = 1 n a i 2 x i ⋮ ∑ i = 1 n a i n x i ] = [ a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋯ ⋮ a n 1 a n 2 ⋯ a n n ] ⋅ [ x 1 x 2 ⋮ x n ] + [ a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋯ ⋮ a n 1 a n 2 ⋯ a n n ] T ⋅ [ x 1 x 2 ⋮ x n ] = A X + A T X = ( A + A T ) X \begin{aligned}\frac{df(x)}{dX}& =\left[ \begin{matrix} \frac{\partial f(x)}{\partial x_1} \\ \frac{\partial f(x)}{\partial x_2}\\ \vdots\\ \frac{\partial f(x)}{\partial x_n} \end{matrix} \right]\\& =\left[ \begin{matrix} \sum_{j=1}^na_{1j}x_j+\sum_{j=1}^na_{i1}x_i \\ \sum_{j=1}^na_{2j}x_j+\sum_{i=1}^na_{i2}x_i\\ \vdots \\ \sum_{j=1}^na_{nj}x_j+\sum_{i=1}^na_{in}x_i \end{matrix} \right]\\& =\left[ \begin{matrix} \sum_{j=1}^na_{1j}x_j\\ \sum_{j=1}^na_{2j}x_j\\ \vdots \\ \sum_{j=1}^na_{nj}x_j \end{matrix} \right] + \left[ \begin{matrix} \sum_{j=1}^na_{i1}x_i \\ \sum_{i=1}^na_{i2}x_i\\ \vdots \\ \sum_{i=1}^na_{in}x_i \end{matrix} \right]\\& =\left[\begin{matrix} a_{11}& a_{12}&\cdots&a_{1n}\\ a_{21}& a_{22}&\cdots&a_{2n}\\ \vdots&\vdots&\cdots&\vdots\\ a_{n1}& a_{n2}&\cdots&a_{nn} \end{matrix} \right] \cdot \left[\begin{matrix} x_1\\ x_2\\ \vdots\\ x_n \end{matrix} \right]\\& + \left[\begin{matrix} a_{11}& a_{12}&\cdots&a_{1n}\\ a_{21}& a_{22}&\cdots&a_{2n}\\ \vdots&\vdots&\cdots&\vdots\\ a_{n1}& a_{n2}&\cdots&a_{nn} \end{matrix} \right]^T \cdot \left[\begin{matrix} x_1\\ x_2\\ \vdots\\ x_n \end{matrix} \right]\\& =AX+A^TX=(A+A^T)X \end{aligned} dXdf(x)= ∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x) = ∑j=1na1jxj+∑j=1nai1xi∑j=1na2jxj+∑i=1nai2xi⋮∑j=1nanjxj+∑i=1nainxi = ∑j=1na1jxj∑j=1na2jxj⋮∑j=1nanjxj + ∑j=1nai1xi∑i=1nai2xi⋮∑i=1nainxi = a11a21⋮an1a12a22⋮an2⋯⋯⋯⋯a1na2n⋮ann ⋅ x1x2⋮xn + a11a21⋮an1a12a22⋮an2⋯⋯⋯⋯a1na2n⋮ann T⋅ x1x2⋮xn =AX+ATX=(A+AT)X,所以 d X T A X d X = ( A + A T ) X \frac{dX^TAX}{dX}=(A+A^T)X dXdXTAX=(A+AT)X,特别地, ( X T A X ) T = X T A T X , d X T A T X d X = ( A + A T ) X (X^TAX)^T=X^TA^TX,\frac{dX^TA^TX}{dX}=(A+A^T)X (XTAX)T=XTATX,dXdXTATX=(A+AT)X
4.矩阵求导布局
{
分母布局
→
Y
X
拉伸
分子布局
→
X
Y
拉伸
\begin{cases} 分母布局\rightarrow YX拉伸\\ 分子布局 \rightarrow XY拉伸 \end{cases}
{分母布局→YX拉伸分子布局→XY拉伸
区别:总的不变:前面横向拉,后面纵向拉。
YX拉伸(分母布局),Y横向(f(x)),X纵向拉。
XY拉伸(分子布局),相反
通常
(
分母布局
)
T
=
(
分子布局
)
(分母布局)^T=(分子布局)
(分母布局)T=(分子布局)
例:
f
(
x
)
=
X
T
X
,
X
=
[
x
1
,
x
2
,
⋯
,
x
n
]
T
f(x)=X^TX,X=[x_1,x_2,\cdots,x_n]^T
f(x)=XTX,X=[x1,x2,⋯,xn]T
分母布局:
d
f
(
x
)
d
x
=
[
∂
f
(
x
)
∂
x
1
∂
f
(
x
)
∂
x
2
⋮
∂
f
(
x
)
∂
x
n
]
=
[
2
x
1
2
x
2
⋮
2
x
n
]
=
2
X
\frac{df(x)}{dx} =\left[ \begin{matrix} \frac{\partial f(x)}{\partial x_1} \\ \frac{\partial f(x)}{\partial x_2}\\ \vdots\\ \frac{\partial f(x)}{\partial x_n} \end{matrix} \right] =\left[ \begin{matrix} 2 x_1 \\ 2 x_2 \\ \vdots\\ 2 x_n \end{matrix} \right] =2X
dxdf(x)=
∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x)
=
2x12x2⋮2xn
=2X
分子布局:
d
f
(
x
)
d
x
=
[
∂
f
(
x
)
∂
x
1
∂
f
(
x
)
∂
x
2
⋯
∂
f
(
x
)
∂
x
n
]
=
[
2
x
1
2
x
2
⋯
2
x
n
]
=
2
X
T
\frac{df(x)}{dx} =\left[ \begin{matrix} \frac{\partial f(x)}{\partial x_1} & \frac{\partial f(x)}{\partial x_2}& \cdots & \frac{\partial f(x)}{\partial x_n} \end{matrix} \right] =\left[ \begin{matrix} 2x_1 & 2x_2 & \cdots & 2x_n \end{matrix} \right] =2X^T
dxdf(x)=[∂x1∂f(x)∂x2∂f(x)⋯∂xn∂f(x)]=[2x12x2⋯2xn]=2XT
5.矩阵求导的乘法和加法公式
U
=
[
u
1
(
x
)
u
2
(
x
)
⋮
u
n
(
x
)
]
n
∗
1
,
V
=
[
v
1
(
x
)
v
2
(
x
)
⋮
v
n
(
x
)
]
n
∗
1
,
X
=
[
x
1
x
2
⋮
x
n
]
n
∗
1
U=\left[ \begin{matrix} u_1(x)\\ u_2(x) \\ \vdots\\ u_n(x) \end{matrix} \right]_{n*1}, V=\left[ \begin{matrix} v_1(x)\\ v_2(x) \\ \vdots\\ v_n(x) \end{matrix} \right]_{n*1}, X=\left[ \begin{matrix} x_1\\ x_2 \\ \vdots\\ x_n \end{matrix} \right]_{n*1}
U=
u1(x)u2(x)⋮un(x)
n∗1,V=
v1(x)v2(x)⋮vn(x)
n∗1,X=
x1x2⋮xn
n∗1
注意到
U
T
V
U^TV
UTV是标量
d
U
T
V
d
X
=
∂
U
∂
X
V
+
∂
V
∂
X
U
\frac{dU^TV}{dX} =\frac{\partial U} {\partial X}V +\frac{\partial V} {\partial X}U
dXdUTV=∂X∂UV+∂X∂VU
d ( U + V ) d X = d U d X + d V d X \frac{d(U+V)}{dX}=\frac{dU}{dX}+\frac{dV}{dX} dXd(U+V)=dXdU+dXdV