1. 标量对向量求导:
结果是向量
事实上这就是所谓的Gradient,即对于一般标量函数 f(x), 其中向量为 x=(x1,...,xn),导数为:
![\small \frac{\partial f}{\partial x}=\left ( \frac{\partial f}{\partial x_{1}} ,\cdots \frac{\partial f}{\partial x_{n}}\right )](https://private.codecogs.com/gif.latex?%5Csmall%20%5Cfrac%7B%5Cpartial%20f%7D%7B%5Cpartial%20x%7D%3D%5Cleft%20%28%20%5Cfrac%7B%5Cpartial%20f%7D%7B%5Cpartial%20x_%7B1%7D%7D%20%2C%5Ccdots%20%5Cfrac%7B%5Cpartial%20f%7D%7B%5Cpartial%20x_%7Bn%7D%7D%5Cright%20%29)
也记为:![\small \triangledown f](https://private.codecogs.com/gif.latex?%5Csmall%20%5Ctriangledown%20f)
2. 向量对向量求导:
结果是矩阵
一阶导:
这个当然也是gradient,当然这准确的说应该叫matrix gradient. 即对于向量值函数 f(x), 其中 x=(x1,...,xn) , f=(f1,...,fm)
, 导数为:
![\small \frac{\partial f}{\partial x}=\frac{\partial f^{T}}{\partial x}= \left [ \frac{\partial f_{1}}{\partial x},\cdots \frac{\partial f_{m}}{\partial x} \right ]= \begin{bmatrix} \frac{\partial f_{1}}{\partial x_{1}} & \cdots &\frac{\partial f_{m}}{\partial x_{1}} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_{1}}{\partial x_{n}} & \cdots & \frac{\partial f_{m}}{\partial x_{n}} \end{bmatrix}](https://private.codecogs.com/gif.latex?%5Csmall%20%5Cfrac%7B%5Cpartial%20f%7D%7B%5Cpartial%20x%7D%3D%5Cfrac%7B%5Cpartial%20f%5E%7BT%7D%7D%7B%5Cpartial%20x%7D%3D%20%5Cleft%20%5B%20%5Cfrac%7B%5Cpartial%20f_%7B1%7D%7D%7B%5Cpartial%20x%7D%2C%5Ccdots%20%5Cfrac%7B%5Cpartial%20f_%7Bm%7D%7D%7B%5Cpartial%20x%7D%20%5Cright%20%5D%3D%20%5Cbegin%7Bbmatrix%7D%20%5Cfrac%7B%5Cpartial%20f_%7B1%7D%7D%7B%5Cpartial%20x_%7B1%7D%7D%20%26%20%5Ccdots%20%26%5Cfrac%7B%5Cpartial%20f_%7Bm%7D%7D%7B%5Cpartial%20x_%7B1%7D%7D%20%5C%5C%20%5Cvdots%20%26%20%5Cddots%20%26%20%5Cvdots%20%5C%5C%20%5Cfrac%7B%5Cpartial%20f_%7B1%7D%7D%7B%5Cpartial%20x_%7Bn%7D%7D%20%26%20%5Ccdots%20%26%20%5Cfrac%7B%5Cpartial%20f_%7Bm%7D%7D%7B%5Cpartial%20x_%7Bn%7D%7D%20%5Cend%7Bbmatrix%7D)
这个矩阵也叫做 Jacobian 矩阵
二阶导:
二阶导数就是Hessian矩阵,形式如下:
![\small H\left ( f \right )=\begin{bmatrix} \frac{\partial ^{2}f}{\partial x_{1}^{2}} & \frac{\partial ^{2}f}{\partial x_{1}\partial x_{2}} & \cdots & \frac{\partial ^{2}f}{\partial x_{1}\partial x_{n}} \\ \frac{\partial ^{2}f}{\partial x_{2}\partial x_{1}} & \frac{\partial ^{2}f}{\partial x_{2}^{2}} &\cdots &\frac{\partial ^{2}f}{\partial x_{2}\partial x_{n}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial ^{2}f}{\partial x_{n}\partial x_{1}} & \frac{\partial ^{2}f}{\partial x_{n}\partial x_{2}} & \cdots & \frac{\partial ^{2}f}{\partial x_{n}^{2}} \end{bmatrix}](https://private.codecogs.com/gif.latex?%5Csmall%20H%5Cleft%20%28%20f%20%5Cright%20%29%3D%5Cbegin%7Bbmatrix%7D%20%5Cfrac%7B%5Cpartial%20%5E%7B2%7Df%7D%7B%5Cpartial%20x_%7B1%7D%5E%7B2%7D%7D%20%26%20%5Cfrac%7B%5Cpartial%20%5E%7B2%7Df%7D%7B%5Cpartial%20x_%7B1%7D%5Cpartial%20x_%7B2%7D%7D%20%26%20%5Ccdots%20%26%20%5Cfrac%7B%5Cpartial%20%5E%7B2%7Df%7D%7B%5Cpartial%20x_%7B1%7D%5Cpartial%20x_%7Bn%7D%7D%20%5C%5C%20%5Cfrac%7B%5Cpartial%20%5E%7B2%7Df%7D%7B%5Cpartial%20x_%7B2%7D%5Cpartial%20x_%7B1%7D%7D%20%26%20%5Cfrac%7B%5Cpartial%20%5E%7B2%7Df%7D%7B%5Cpartial%20x_%7B2%7D%5E%7B2%7D%7D%20%26%5Ccdots%20%26%5Cfrac%7B%5Cpartial%20%5E%7B2%7Df%7D%7B%5Cpartial%20x_%7B2%7D%5Cpartial%20x_%7Bn%7D%7D%20%5C%5C%20%5Cvdots%20%26%20%5Cvdots%20%26%20%5Cddots%20%26%20%5Cvdots%20%5C%5C%20%5Cfrac%7B%5Cpartial%20%5E%7B2%7Df%7D%7B%5Cpartial%20x_%7Bn%7D%5Cpartial%20x_%7B1%7D%7D%20%26%20%5Cfrac%7B%5Cpartial%20%5E%7B2%7Df%7D%7B%5Cpartial%20x_%7Bn%7D%5Cpartial%20x_%7B2%7D%7D%20%26%20%5Ccdots%20%26%20%5Cfrac%7B%5Cpartial%20%5E%7B2%7Df%7D%7B%5Cpartial%20x_%7Bn%7D%5E%7B2%7D%7D%20%5Cend%7Bbmatrix%7D)
或者可以用更抽象的定义:
![\small H_{ij}= \frac{\partial ^{2}l}{\partial\Theta _{i}\partial \Theta _{j} }](https://private.codecogs.com/gif.latex?%5Csmall%20H_%7Bij%7D%3D%20%5Cfrac%7B%5Cpartial%20%5E%7B2%7Dl%7D%7B%5Cpartial%5CTheta%20_%7Bi%7D%5Cpartial%20%5CTheta%20_%7Bj%7D%20%7D)