一.ML为什么需要矩阵求导
1.原因
- vectorization
- 求导在优化算法中的广泛应用
2.优点
- 简介
- 加速计算机运行
二.向量函数与矩阵函数初印象
1.标量函数
-
标量函数:输出为标量的函数
-
例如:
- f ( x ) = x 2 f(x)=x^2 f(x)=x2, R → R : x → x 2 \mathbf{R} \rightarrow \mathbf{R}:x \rightarrow x^2 R→R:x→x2
- f ( x ) = x 1 2 + x 2 2 , R 2 → R : f(x)=x_1^2+x_2^2,\mathbf{R}^2 \rightarrow \mathbf{R}: f(x)=x12+x22,R2→R: [ x 1 x 2 ] → x 1 2 + x 2 2 \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \rightarrow x_1^2+x_2^2 [x1x2]→x12+x22
2.向量函数
-
向量函数:输出为向量(矩阵)的函数
-
例如:
- f ( x ) = [ f 1 ( x ) = x f 2 ( x ) = x 2 ] , R → R 2 : x → [ x x 2 ] f(x)=\begin{bmatrix} f_1(x)=x \\ f_2(x)=x^2\end{bmatrix},\mathbf{R} \rightarrow \mathbf{R}^2:x \rightarrow \begin{bmatrix} x \\ x^2\end{bmatrix} f(x)=[f1(x)=xf2(x)=x2],R→R2:x→[xx2]
- f ( x ) = [ f 1 ( x ) = x f 2 ( x ) = x 2 f 3 ( x ) = x 3 f 4 ( x ) = x 4 ] , R → R 2 × 2 : x → [ x x 2 x 3 x 4 ] f(x)=\begin{bmatrix} f_1(x)=x&f_2(x)=x^2\\ f_3(x)=x^3 &f_4(x)=x^4\end{bmatrix},\mathbf{R} \rightarrow \mathbf{R}^{2\times2}:x \rightarrow \begin{bmatrix} x &x^2\\x^3&x^4\end{bmatrix} f(x)=[f1(x)=xf3(x)=x3f2(x)=x2f4(x)=x4],R→R2×2:x→[xx3x2x4]
- f ( x ) = [ f 1 ( x ) = x 1 + x 2 f 2 ( x ) = x 1 2 + x 2 2 f 3 ( x ) = x 1 3 + x 2 3 f 4 ( x ) = x 1 4 + x 2 4 ] , R 2 → R 2 × 2 : [ x 1 x 2 ] → [ x 1 + x 2 x 1 2 + x 2 2 x 1 3 + x 2 3 x 1 4 + x 2 4 ] f(x)=\begin{bmatrix} f_1(x)=x_1+x_2&f_2(x)=x_1^2+x_2^2\\ f_3(x)=x_1^3 +x_2^3&f_4(x)=x_1^4+x_2^4\end{bmatrix},\mathbf{R}^2 \rightarrow \mathbf{R}^{2\times2}: \begin{bmatrix} x_1 \\ x_2\end{bmatrix}\rightarrow \begin{bmatrix} x_1+x_2 &x_1^2+x_2^2\\x_1^3 +x_2^3&x_1^4+x_2^4\end{bmatrix} f(x)=[f1(x)=x1+x2f3(x)=x13+x23f2(x)=x12+x22f4(x)=x14+x24],R2→R2×2:[x1x2]→[x1+x2x13+x23x12+x22x14+x24]
-
总结: x x x:可以为标量,向量,矩阵; f ( x ) f(x) f(x):可以为标量,向量,矩阵
3.矩阵求导的本质
- d A d B \frac{\mathrm{d}\mathbf{A}}{\mathrm{d}\mathbf{B}} dBdA:矩阵 A \mathbf{A} A的每个元素对矩阵 B \mathbf{B} B的每个元素求导
- 求导后的元素个数为 A \mathbf{A} A和 B \mathbf{B} B的元素个数之积
三.求导秘术:YX拉伸
-
口诀:1)标量不变,向量拉伸 ;2)前者横向拉,后者纵向拉
-
例子:
-
f ( x ) = f ( x 1 , x 2 , … , x n ) , x f(x)=f(x_1,x_2,\dots,x_n),x f(x)=f(x1,x2,…,xn),x为向量,纵向拉, x = ( x 1 , x 2 , … , x n ) T x=(x_1,x_2,\dots,x_n)^{\mathrm{T}} x=(x1,x2,…,xn)T, f ( x ) : Y f(x):Y f(x):Y为标量不变
d f d x = ( ∂ f ∂ x 1 , ∂ f ∂ x 2 , … , ∂ f ∂ x n ) T \frac{\mathrm{d}f}{\mathrm{d}x}=(\frac{\partial{f}}{\partial{x_1}},\frac{\partial{f}}{\partial{x_2}},\dots,\frac{\partial{f}}{\partial{x_n}})^{\mathrm{T}} dxdf=(∂x1∂f,∂x2∂f,…,∂xn∂f)T -
f ( x ) = ( f 1 ( x ) , f 2 ( x ) , … , f n ( x ) ) T f(x)=(f_1(x),f_2(x),\dots,f_n(x))^{\mathrm{T}} f(x)=(f1(x),f2(x),…,fn(x))T为向量,横向拉, x x x为标量
d f d x = ( d f 1 ( x ) d x , d f 2 ( x ) d x , … , d f n ( x ) d x ) \frac{\mathrm{d}f}{\mathrm{d}x}=(\frac{\mathrm{d}f_1(x)}{\mathrm{d}x},\frac{\mathrm{d}f_2(x)}{\mathrm{d}x},\dots,\frac{\mathrm{d}f_n(x)}{\mathrm{d}x}) dxdf=(dxdf1(x),dxdf2(x),…,dxdfn(x)) -
f ( x ) = ( f 1 ( x ) , f 2 ( x ) , … , f n ( x ) ) T , x = ( x 1 , x 2 , … , x n ) T f(x)=(f_1(x),f_2(x),\dots,f_n(x))^{\mathrm{T}},x=(x_1,x_2,\dots,x_n)^{\mathrm{T}} f(x)=(f1(x),f2(x),…,fn(x))T,x=(x1,x2,…,xn)T
d f d x = 先 拉 x ( ∂ f ∂ x 1 , ∂ f ∂ x 2 , … , ∂ f ∂ x n ) T = 后 拉 Y [ ∂ f 1 ( x ) ∂ x 1 ∂ f 2 ( x ) ∂ x 1 … ∂ f n ( x ) ∂ x 1 ∂ f 1 ( x ) ∂ x 2 ∂ f 2 ( x ) ∂ x 2 … ∂ f n ( x ) ∂ x 2 … … … … ∂ f 1 ( x ) ∂ x n ∂ f 2 ( x ) ∂ x n … ∂ f n ( x ) ∂ x n ] \frac{\mathrm{d}f}{\mathrm{d}x}\xlongequal{先拉x}(\frac{\partial{f}}{\partial{x_1}},\frac{\partial{f}}{\partial{x_2}},\dots,\frac{\partial{f}}{\partial{x_n}})^{\mathrm{T}}\xlongequal{后拉Y}\begin{bmatrix}\frac{\partial{f_1(x)}}{\partial{x_1}}&\frac{\partial{f_2(x)}}{\partial{x_1}}&\dots&\frac{\partial{f_n(x)}}{\partial{x_1}}\\ \frac{\partial{f_1(x)}}{\partial{x_2}}&\frac{\partial{f_2(x)}}{\partial{x_2}}&\dots&\frac{\partial{f_n(x)}}{\partial{x_2}} \\ \dots&\dots&\dots&\dots\\ \frac{\partial{f_1(x)}}{\partial{x_n}}&\frac{\partial{f_2(x)}}{\partial{x_n}}&\dots&\frac{\partial{f_n(x)}}{\partial{x_n}}\end{bmatrix} dxdf先拉x(∂x1∂f,∂x2∂f,…,∂xn∂f)T后拉Y⎣⎢⎢⎢⎡∂x1∂f1(x)∂x2∂f1(x)…∂xn∂f1(x)∂x1∂f2(x)∂x2∂f2(x)…∂xn∂f2(x)…………∂x1∂fn(x)∂x2∂fn(x)…∂xn∂fn(x)⎦⎥⎥⎥⎤
-
四.常见的矩阵求导
- 结论: