文章目录
引用
- 下文中的出现的所有源代码,由于码云的链接非注册用户无法查看,因此备份了一下代码文件到Gist上。点此查看
Preliminaries
跟上一篇一样,要读懂这些内容,需要掌握以下内容:
梯度
- 对于一维函数 f ( x ) f(x) f(x),其导数定义为:
f ′ ( x ) = lim Δ x → 0 f ( x 0 + Δ x ) − f ( x 0 ) Δ x f'(x)=\lim \limits_{\Delta x \rightarrow 0} \frac{f(x_0+\small{\Delta} x)-f(x_0)}{\small{\Delta} x} f′(x)=Δx→0limΔxf(x0+Δx)−f(x0) - 对于多维函数 f ( x 1 , . . . , x n ) f(x_1,...,x_n) f(x1,...,xn),对 x i x_i xi求导数 d f d x i \frac{df}{dx_i} dxidf,将其记为偏导数 ∂ f ∂ x I \frac{\partial f}{\partial x_I} ∂xI∂f。特别的,记录梯度 ▽ f ( x ) \triangledown f(x) ▽f(x)或简记为 ▽ f \triangledown f ▽f为对 x i x_i xi求偏导后的列向量:
▽ f ( x ) = ( ∂ f ∂ x 1 , ∂ f ∂ x 2 , . . . , ∂ f ∂ x 1 ) T \triangledown f(x)=(\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2},..., \frac{\partial f}{\partial x_1})^T ▽f(x)=(∂x1∂f,∂x2∂f,...,∂x1∂f)T
海森矩阵(Hessian matrix)
- 若存在 f : R n → R f:\R^n \rightarrow \R f:Rn→R,即一个多维输入 x ∈ R x\in\R x∈R到一维输出的映射,若其在任意维度上都二阶可导,则定义其海森矩阵:
H = [ ∂ f 2 ∂ x 1 2 ∂ f 2 ∂ x 1 ∂ x 2 ⋯ ∂ f 2 ∂ x 1 ∂ x n ∂ f 2 ∂ x 2 ∂ x 1 ∂ f 2 ∂ x 2 2 ⋯ ∂ f 2 ∂ x 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ f 2 ∂ x n ∂ x 1 ∂ f 2 ∂ x n ∂ x 2 ⋯ ∂ f 2 ∂ x n 2 ] H = \begin{bmatrix} \frac{\partial f^2}{\partial x_1^2} & \frac{\partial f^2}{\partial x_1\partial x_2} & \cdots &\frac{\partial f^2}{\partial x_1 \partial x_n} \\ \frac{\partial f^2}{\partial x_2 \partial x_1} & \frac{\partial f^2}{\partial x_2^2} & \cdots &\frac{\partial f^2}{\partial x_2 \partial x_n} \\ \vdots & \vdots & \ddots& \vdots \\ \frac{\partial f^2}{\partial x_n \partial x_1} & \frac{\partial f^2}{\partial x_n\partial x_2} & \cdots &\frac{\partial f^2}{ \partial x_n^2} \end{bmatrix} H= ∂x12∂f2∂x2∂x1∂f2⋮∂xn∂x1∂f2∂x1∂x2∂f2∂x22∂f2⋮∂xn∂x2∂f2⋯⋯⋱⋯∂x1∂xn∂f2∂x2∂xn∂f2⋮∂xn2∂f2
显然 , H T = H H^T = H HT=H、 H H H 的尺寸为 n × n n\times n n×n。
雅可比矩阵(Jacobian matrix)
若存在 f : R n → R m f:\R^n \rightarrow \R^m f:Rn→Rm,即一个多维输入 x ∈ R n x\in\R^n x∈Rn到多维输出 f ( x ) ∈ R m f(x)\in\R^m f(x)∈Rm的映射,则 f f f的雅可比矩阵:
J = [ ∂ f ∂ x 1 ∂ f ∂ x 2 ⋯ ∂ f ∂ x n ] = [ ∂ f 1 ∂ x 1 ⋯ ∂ f 1 ∂ x n ⋮ ⋱ ⋮ ∂ f m ∂ x 1 ⋯ ∂ f m ∂ x n ] \begin{aligned} J &= [ \frac{\partial f}{\partial x_1} \frac{\partial f}{\partial x_2} \cdots \frac{\partial f}{\partial x_n}] \\ &= \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n} \\ \end{bmatrix} \end{aligned} J=[∂x1∂f∂x2∂f⋯∂xn∂f]=
∂x1∂f1⋮∂x1∂fm⋯⋱⋯∂xn∂f1⋮∂xn∂fm
显然,雅可比矩阵的尺寸为 n × m n \times m n×m, J i j = ∂ f i ∂ x j J_{ij} = \frac{\partial f_i}{\partial x_j} J