# 深度学习中需要的矩阵计算

The Matrix Calculus You Need For Deep Learning

## 介绍

1 N ∑ x ( t a r g e t ( x ) − a c t i v a t i o n ( x ) ) 2 = 1 N ∑ x ( t a r g e t ( x ) − max ⁡ ( 0 , ∑ 1 ∣ x ∣ w i x i + b ) ) 2 \frac{1}{N}\sum_{\mathbb{x}}\left(target(\mathbf{x}) - activation(\mathbb{x})\right)^2=\frac{1}{N}\sum_{\mathbb{x}}\left(target(\mathbb{x}) - \max(0, \sum_{1}^{|x|}w_ix_i+b)\right)^2

∣ x ∣ |x| 表示向量 x x 中元素的个数， 注意这只是一个神经元，神经网络需要同时训练所有层的所有神经元。由于有多个输入和多个网络输出，通常需要一些向量对向量和求导法则。这篇文章的目的于此。

## 复习：标量求导法则

Rule f ( x ) f(x) x x 导数例子

x n x^n n x n − 1 nx^{n-1} d d x x 3 = 3 x 2 \frac{d}{dx}x^3=3x^2
f + g f+g d f d x + d g d x \frac{df}{dx}+\frac{dg}{dx} d d x ( x 2 + 3 x ) = 2 x + 3 \frac{d}{dx}(x^2+3x)=2x+3
f g fg f d g d x + g d f d x f\frac{dg}{dx}+g\frac{df}{dx} d d x x 2 x = x 2 + x 2 x = 3 x 2 \frac{d}{dx}x^2x=x^2+x2x=3x^2

## 向量计算和偏导数

∇ f ( x , y ) = [ ∂ f ( x , y ) ∂ x , ∂ f ( x , y ) ∂ y ] = [ 6 y x , 3 x 2 ] \nabla f(x, y) = \left[\frac{\partial f(x, y)}{\partial x}, \frac{\partial f(x, y)}{\partial y}\right] = \left[6yx, 3x^2\right]

## 矩阵计算

J = [ ∇ f ( x , y ) ∇ g ( x , y ) ] = [ ∂ f ( x , y ) ∂ x ∂ f ( x , y ) ∂ y ∂ g ( x , y ) ∂ x ∂ g ( x , y ) ∂ y ] = [ 6 y x 3 x 2 2 8 y 7 ] J =\begin{bmatrix}\nabla f(x, y)\\ \nabla g(x, y)\end{bmatrix} = \begin{bmatrix} \frac{\partial f(x, y)}{\partial x}&\frac{\partial f(x, y)}{\partial y}\\ \frac{\partial g(x, y)}{\partial x}&\frac{\partial g(x, y)}{\partial y} \end{bmatrix}= \begin{bmatrix} 6yx & 3x^2\\2&8y^7 \end{bmatrix}

## Jacobian 矩阵生成

x = [ x 1 x 2 ⋮ x n ] \mathbb{x}= \begin{bmatrix} x_1\\ x_2\\ \vdots\\ x_n \end{bmatrix}

y 1 = f 1 ( x ) y 2 = f 2 ( x ) ⋮ y m = f m ( x ) \begin{aligned} y_1 &= f_1(\mathbb{x})\\ y_2 &= f_2(\mathbb{x})\\ &\vdots\\ y_m &=f_m(\mathbb{x}) \end{aligned}

y 1 = f 1 ( x ) = 3 x 1 2 x 2 y 2 = f 2 ( x ) = 2 x 1 + x 2 8 \begin{aligned} y_1 &= f_1(\mathbb{x}) =3x_1^2x_2\\ y_2 &= f_2(\mathbb{x}) = 2x_1 + x_2^8 \end{aligned}

∂ y ∂ x = [ ∇ f 1 ( x ) ∇ f 2 ( x ) ⋮ ∇ f m ( x ) ] = [ ∂ ∂ x f 1 ( x ) ∂ ∂ x f 2 ( x ) ⋮ ∂ ∂ x f m ( x ) ] = [ ∂ ∂ x 1 f 1 ( x ) ∂ ∂ x 2 f 1 ( x ) ⋯ ∂ ∂ x n f 1 ( x ) ∂ ∂ x 1 f 2 ( x ) ∂ ∂ x 2 f 2 ( x ) ⋯ ∂ ∂ x n f 2 ( x ) ⋮ ⋮ ⋮ ∂ ∂ x 1 f m ( x ) ∂ ∂ x 2 f m ( x ) ⋯ ∂ ∂ x n f m ( x ) ] \frac{\partial\mathbb{y}}{\partial\mathbb{x}}= \begin{bmatrix} \nabla f_1(\mathbb{x})\\ \nabla f_2(\mathbb{x})\\ \vdots \\ \nabla f_m(\mathbb{x}) \end{bmatrix}= \begin{bmatrix} \frac{\partial}{\partial\mathbb{x}}f_1(\mathbb{x})\\ \frac{\partial}{\partial\mathbb{x}}f_2(\mathbb{x})\\ \vdots\\ \frac{\partial}{\partial\mathbb{x}}f_m(\mathbb{x}) \end{bmatrix}= \begin{bmatrix} \frac{\partial}{\partial x_1}f_1(\mathbb{x}) & \frac{\partial}{\partial x_2}f_1(\mathbb{x}) &\cdots &\frac{\partial}{\partial x_n}f_1(\mathbb{x})\\ \frac{\partial}{\partial x_1}f_2(\mathbb{x}) & \frac{\partial}{\partial x_2}f_2(\mathbb{x}) &\cdots & \frac{\partial}{\partial x_n} f_2(\mathbb{x})\\ \vdots &\vdots&&\vdots\\ \frac{\partial}{\partial x_1}f_m(\mathbb{x}) &\frac{\partial}{\partial x_2}f_m(\mathbb{x}) &\cdots &\frac{\partial}{\partial x_n}f_m(\mathbb{x}) \end{bmatrix}

∂ b ∂ x = [ ∂ ∂ x f 1 ( x ) ∂ ∂ x f 2 ( x ) ⋮ ∂ ∂ x f m ( x ) ] = [ ∂ ∂ x 1 f 1 ( x ) ∂ ∂ x 2 f 1 ( x ) ⋯ ∂ ∂ x n f 1 ( x ) ∂ ∂ x 1 f 2 ( x ) ∂ ∂ x 2 f 2 ( x ) ⋯ ∂ ∂ x n f 2 ( x ) ⋮ ⋮ ⋮ ∂ ∂ x 1 f m ( x ) ∂ ∂ x 2 f m ( x ) ⋯ ∂ ∂ x n f m ( x ) ] = [ ∂ ∂ x 1 x 1 ∂ ∂ x 2 x 1 ⋯ ∂ ∂ x n x 1 ∂ ∂ x 1 x 2 ∂ ∂ x 2 x 2 ⋯ ∂ ∂ x n x 2 ⋮ ⋮ ⋮ ∂ ∂ x 1 x n ∂ ∂ x 2 x n ⋯ ∂ ∂ x n x n ] = I \begin{aligned} \frac{\partial\mathbf{b}}{\partial\mathbf{x}}= \begin{bmatrix} \frac{\partial}{\partial\mathbf{x}}f_1(\mathbf{x})\\ \frac{\partial}{\partial\mathbf{x}}f_2(\mathbf{x})\\ \vdots\\ \frac{\partial}{\partial\mathbf{x}}f_m(\mathbf{x}) \end{bmatrix} &= \begin{bmatrix} \frac{\partial}{\partial x_1}f_1(\mathbb{x}) & \frac{\partial}{\partial x_2}f_1(\mathbb{x}) &\cdots &\frac{\partial}{\partial x_n}f_1(\mathbb{x})\\ \frac{\partial}{\partial x_1}f_2(\mathbb{x}) & \frac{\partial}{\partial x_2}f_2(\mathbb{x}) &\cdots & \frac{\partial}{\partial x_n} f_2(\mathbb{x})\\ \vdots &\vdots&&\vdots\\ \frac{\partial}{\partial x_1}f_m(\mathbb{x}) &\frac{\partial}{\partial x_2}f_m(\mathbb{x}) &\cdots &\frac{\partial}{\partial x_n}f_m(\mathbb{x}) \end{bmatrix}\\ &= \begin{bmatrix} \frac{\partial}{\partial x_1}x_1 & \frac{\partial}{\partial x_2}x_1 &\cdots &\frac{\partial}{\partial x_n}x_1\\ \frac{\partial}{\partial x_1}x_2 & \frac{\partial}{\partial x_2}x_2 &\cdots & \frac{\partial}{\partial x_n} x_2 \\ \vdots &\vdots&&\vdots\\ \frac{\partial}{\partial x_1}x_n &\frac{\partial}{\partial x_2}x_n &\cdots &\frac{\partial}{\partial x_n}x_n \end{bmatrix}\\ &=I \end{aligned}

## 向量元素级二元运算符的导数

[ y 1 y 2 ⋮ y n ] = [ f 1 ( w ) ◯ g 1 ( x ) f 2 ( w ) ◯ g 2 ( x ) ⋮ f n ( w ) ◯ g n ( x ) ] \begin{bmatrix} y_1\\ y2\\ \vdots\\y_n \end{bmatrix}= \begin{bmatrix} f_1(\mathbf{w})\bigcirc g_1(\mathbf{x})\\ f_2(\mathbf{w})\bigcirc g_2(\mathbf{x})\\ \vdots\\ f_n(\mathbf{w})\bigcirc g_n(\mathbf{x}) \end{bmatrix}

J w = ∂ y ∂ w = [ ∂ ∂ w 1 ( f 1 ( w ) ◯ g 1 ( x ) ) ∂ ∂ w 2 ( f 1 ( w ) ◯ g 1 ( x ) ) ⋯ ∂ ∂ w n ( f 1 ( w ) ◯ g 1 ( x ) ) ∂ ∂ w 1 ( f 2 ( w ) ◯ g 2 ( x ) ) ∂ ∂ w 2 ( f 2 ( w ) ◯ g 2 ( x ) ) ⋯ ∂ ∂ w n ( f 2 ( w ) ◯ g 2 ( x ) ) ⋮ ⋮ ⋮ ∂ ∂ w 1 ( f n ( w ) ◯ g n ( x ) ) ∂ ∂ w 2 ( f n ( w ) ◯ g n ( x ) ) ⋯ ∂ ∂ w n ( f n ( w ) ◯ g n ( x ) ) ] J_\mathbf{w}=\frac{\partial\mathbf{y}}{\partial\mathbf{w}}= \begin{bmatrix} \frac{\partial}{\partial w_1}\left(f_1(\mathbf{w})\bigcirc g_1(\mathbf{x})\right) &\frac{\partial}{\partial w_2}\left(f_1(\mathbf{w})\bigcirc g_1(\mathbf{x})\right) &\cdots &\frac{\partial}{\partial w_n}\left(f_1(\mathbf{w})\bigcirc g_1(\mathbf{x})\right)\\ \frac{\partial}{\partial w_1}\left(f_2(\mathbf{w})\bigcirc g_2(\mathbf{x})\right) &\frac{\partial}{\partial w_2}\left(f_2(\mathbf{w})\bigcirc g_2(\mathbf{x})\right) &\cdots &\frac{\partial}{\partial w_n}\left(f_2(\mathbf{w})\bigcirc g_2(\mathbf{x})\right)\\ \vdots & \vdots &&\vdots\\ \frac{\partial}{\partial w_1}\left(f_n(\mathbf{w})\bigcirc g_n(\mathbf{x})\right) &\frac{\partial}{\partial w_2}\left(f_n(\mathbf{w})\bigcirc g_n(\mathbf{x})\right) &\cdots &\frac{\partial}{\partial w_n}\left(f_n(\mathbf{w})\bigcirc g_n(\mathbf{x})\right)\\ \end{bmatrix}

∂ y ∂ w = d i a g ( ∂ ∂ w 1 ( f 1 ( w 1 ) ◯ g 1 ( x 1 ) ) , ∂ ∂ w 2 ( f 2 ( w 2 ) ◯ g 2 ( x 2 ) ) , ⋯   , ∂ ∂ w n ( f n ( w n ) ◯ g n ( x n ) ) ) \frac{\partial \mathbf{y}}{\partial\mathbf{w}}=diag\left(\frac{\partial}{\partial w_1}(f_1(w_1)\bigcirc g_1(x_1)), \frac{\partial}{\partial w_2}(f_2(w_2)\bigcirc g_2(x_2)), \cdots, \frac{\partial}{\partial w_n}(f_n(w_n)\bigcirc g_n(x_n))\right)

OpPartial with Respect to w \mathbf{w}
+ + ∂ ( w + x ) ∂ w = d i a g ( ⋯ ∂ ( w i + x i ) ∂ w i ⋯   ) = I \frac{\partial(\mathbf{w} + \mathbf{x})}{\partial\mathbf{w}}=diag(\cdots\frac{\partial(w_i + x_i)}{\partial w_i}\cdots)=I
− - ∂ ( w + x ) ∂ w = d i a g ( ⋯ ∂ ( w i − x i ) ∂ w i ⋯   ) = I \frac{\partial(\mathbf{w} + \mathbf{x})}{\partial\mathbf{w}}=diag(\cdots\frac{\partial{(w_i - x_i)}}{\partial w_i}\cdots)=I
⊗ \otimes ∂ ( w ⊗ x ) ∂ w = d i a g ( ⋯ ∂ ( w i × x i ) ∂ w i ⋯   ) = d i a g ( x ) \frac{\partial(\mathbf{w}\otimes\mathbf{x})}{\partial\mathbf{w}}=diag\left(\cdots\frac{\partial(w_i\times x_i)}{\partial w_i}\cdots\right)=diag(\mathbf{x})
⊘ \oslash ∂ ( w ⊘ x ) ∂ w = d i a g ( ⋯ ∂ ( w i / x i ) ∂ w i ⋯   ) = d i a g ( ⋯ 1 x i ⋯   ) \frac{\partial(\mathbf{w}\oslash\mathbf{x})}{\partial\mathbf{w}}=diag\left(\cdots\frac{\partial(w_i/ x_i)}{\partial w_i}\cdots\right)=diag(\cdots\frac{1}{x_i}\cdots)

x \mathbf{x} 的偏导

OPPartial With Respect to x \mathbf{x}
+ + ∂ ( w + x ) ∂ x = I \frac{\partial(\mathbf{w+x})}{\partial\mathbf{x}}=I
− - ∂ ( w − x ) ∂ x = − I \frac{\partial(\mathbf{w-x})}{\partial\mathbf{x}}=-I
⊗ \otimes ∂ ( w ⊗ x ) ∂ x = d i a g ( w ) \frac{\partial(\mathbf{w\otimes x})}{\partial\mathbf{x}}=diag(\mathbf{w})
⊘ \oslash ∂ ( w ⊘ x ) ∂ x = d i a g ( ⋯ − w i x i 2 ⋯   ) \frac{\partial(\mathbf{w\oslash x})}{\partial\mathbf{x}}=diag\left(\cdots\frac{-w_i}{x_i^2}\cdots\right)

## 涉及标量运算的导数

∂ y ∂ x = d i a g ( ⋯ ∂ ∂ ( f i ( x i ) ◯ g i ( z ) ) ⋯   ) \frac{\partial\mathbf{y}}{\partial\mathbf{x}}=diag\left(\cdots \frac{\partial}{\partial}(f_i(x_i)\bigcirc g_i(z))\cdots\right)

∂ ∂ x ( x + z ) = d i a g ( 1 ⃗ ) = I ∂ ∂ z ( x + z ) = d i a g ( 1 ⃗ ) = I \frac{\partial}{\partial\mathbf{x}}(\mathbf{x} + z) = diag(\vec{1})= I\\ \frac{\partial}{\partial z}(\mathbf{x} + z) = diag(\vec{1})= I

∂ ∂ x ( x z ) = d i a g ( 1 ⃗ z ) = I z ∂ ∂ z ( x z ) = x \frac{\partial}{\partial\mathbf{x}}(\mathbf{x}z)=diag(\vec{1}z) = Iz\\ \frac{\partial}{\partial z}(\mathbf{x}z)= \mathbf{x}

∂ ∂ z ( f i ( x i ) ⊗ g i ( z ) ) = x i ∂ z ∂ z + z ∂ x i ∂ z = x i + 0 = x i \frac{\partial}{\partial z}(f_i(x_i)\otimes g_i(z) ) = x_i\frac{\partial z}{\partial z} + z\frac{\partial x_i}{\partial z} = x_i + 0 = x_i

## 向量归约和(sum reduction)

y = ∑ ( f ( x ) ) = ∑ i = 1 n f i ( x ) y=\sum(\mathbf{f(x)}) = \sum_{i=1}^nf_i(\mathbf{x}) ，注意每个函数的参数都是向量 x \mathbf{x} 。对应雅可比矩阵为 1 × n 1\times n 向量:

∂ y ∂ x = [ ∂ y ∂ x 1 , ∂ y ∂ x 2 , ⋯   , ∂ y ∂ x n ] = [ ∂ ∂ x 1 ∑ i f i ( x ) , ∂ ∂ x 2 ∑ i f i ( x ) , ⋯   , ∂ ∂ x n ∑ i f i ( x ) ] = [ ∑ i ∂ f i ( x ) ∂ x 1 , ∑ i ∂ f i ( x ) ∂ x 2 , ⋯   , ∑ i ∂ f i ( x ) ∂ x n ] ( move derivate inside ∑ ) \begin{aligned} \frac{\partial y}{\partial\mathbf{x}}&= \begin{bmatrix} \frac{\partial y}{\partial x_1}, \frac{\partial y}{\partial x_2}, \cdots,\frac{\partial y}{\partial x_n} \end{bmatrix}\\ &= \begin{bmatrix} \frac{\partial}{\partial x_1}\sum_i f_i(\mathbf{x}), \frac{\partial}{\partial x_2}\sum_if_i(\mathbf{x}),\cdots,\frac{\partial}{\partial x_n}\sum_if_i(\mathbf{x}) \end{bmatrix}\\ &= \begin{bmatrix} \sum_i\frac{\partial f_i(\mathbf{x})}{\partial x_1}, \sum_i\frac{\partial f_i(\mathbf{x})}{\partial x_2},\cdots,\sum_i\frac{\partial f_i(\mathbf{x})}{\partial x_n} \end{bmatrix} (\text{move derivate inside} \sum) \end{aligned}

∇ y = [ ∑ i ∂ x i ∂ x 1 , ∑ i ∂ x i ∂ x 2 , ⋯   , ∑ i ∂ x i ∂ x n ] = [ 1 , 1 , ⋯   , 1 ] = 1 ⃗ T \nabla y = \begin{bmatrix} \sum_i\frac{\partial x_i}{\partial x_1},\sum_i\frac{\partial x_i}{\partial x_2},\cdots,\sum_i\frac{\partial x_i}{\partial x_n} \end{bmatrix} = [1, 1,\cdots,1] = \vec{1}^T

∂ y ∂ x = [ ∑ i ∂ ∂ x 1 x i z , ∑ i ∂ ∂ x 2 x i z , ⋯   , ∑ i ∂ ∂ x n x i z ] = [ z , z , ⋯   , z ] \begin{aligned} \frac{\partial y}{\partial \mathbf{x}} &= \begin{bmatrix} \sum_i\frac{\partial}{\partial x_1}x_iz,\sum_i\frac{\partial}{\partial x_2}x_iz, \cdots, \sum_i\frac{\partial}{\partial x_n}x_iz \end{bmatrix}\\ &= \begin{bmatrix} z, z,\cdots, z \end{bmatrix} \end{aligned}

∂ y ∂ z = ∂ ∂ z ∑ i = 1 n x i z = ∑ i ∂ ∂ z x i z = ∑ i x i = s u m ( x ) \begin{aligned} \frac{\partial\mathbf{y}}{\partial z} &= \frac{\partial}{\partial z}\sum_{i=1}^n x_iz\\ &= \sum_i\frac{\partial}{\partial z}x_i z\\ &= \sum_ix_i\\ &=sum(\mathbf{x}) \end{aligned}

## 链式法则

### Single-variable chain rule

d y d x = d y d u d u d x \frac{dy}{dx} = \frac{dy}{du}\frac{du}{dx}

1. 通过中间变量把复杂函数求导转化为两个简单函数的求导
2. 分别计算两个简单函数的导数
3. 两个导数结果想乘
4. 替换中间变量

Single-variable chain rule 应用场景：注意上图 x x y y 只有一条数据流。因此 x x 的改变仅能通过一条路径影响到 y y 。 但是如果表达式为 y ( x ) = x + x 2 y(x) = x + x^2 ，它表达为 y ( x , u ) = x + u ， 此 时 y(x, u) = x + u，此时 y ( x , u ) y(x, u) 的数据流图有多条路径，此时应该使用单变量全微分链式法则(single-variable total-derivative chain rule)。可以先考虑下面这个式子 y = f ( x ) = l n ( s i n ( x 3 ) 2 ) y = f(x)=ln(sin(x^3)^2) ， 过程如下：

1. 使用中间变量
u 1 = f 1 ( x ) = x 3 u 2 = f 2 ( u 1 ) = s i n ( u 1 ) u 3 = f 3 ( u 2 ) = u 2 2 u 4 = f 4 ( u 3 ) = l n ( u 3 ) ( y = u 4 ) \begin{aligned} u_1 &= f_1(x) = x^3\\ u_2 &= f_2(u_1)= sin(u_1)\\ u_3 &= f_3(u_2) = u_2^2\\ u_4 &= f_4(u_3) =ln(u_3)(y = u_4) \end{aligned}

2. 计算微分
d d u x u 1 = 3 x 2 d d u 1 u 2 = c o s ( u 1 ) d d u 2 u 3 = 2 u 2 d d u 3 u 5 = 1 u 3 \begin{aligned} \frac{d}{du_x}u_1 &= 3 x^2\\ \frac{d}{du_1}u_2&= cos(u_1)\\ \frac{d}{du_2}u_3 &= 2u_2\\ \frac{d}{du_3}u_5 &= \frac{1}{u_3} \end{aligned}

3. 组合四个中间变量
d y d x = d u 4 d x = 1 u 3 2 u 2 c o s ( u 1 ) 3 x 2 = 6 u 2 x 2 c o s ( u 1 ) u 3 \frac{dy}{dx} = \frac{du_4}{dx} = \frac{1}{u_3}2u_2cos(u_1)3x^2 = \frac{6u_2x^2cos(u_1)}{u_3}

4. 替换中间变量
d y d x = 6 s i n ( u 1 ) x 2 c o s ( x 3 ) u 2 2 = 6 s i n ( x 3 ) x 2 c o s ( x 3 ) s i n ( x 3 ) 2 = 6 x 2 c o s ( x 3 ) s i n ( x 3 ) \frac{dy}{dx} = \frac{6sin(u_1)x^2cos(x^3)}{u_2^2} = \frac{6sin(x^3)x^2cos(x^3)}{sin(x^3)^2} = \frac{6x^2cos(x^3)}{sin(x^3)}

### Single-variable total-derivative chain rule

u 1 ( x ) = x 2 u 2 ( x , u 1 ) = x + u 1 ( y = f ( x ) = u 2 ( x , u 1 ) ) \begin{aligned} u_1(x) &= x^2\\ u_2(x, u_1) &=x + u _1\quad(y=f(x)=u_2(x, u_1)) \end{aligned}

∂ u 1 ( x ) ∂ x = 2 x ∂ u 2 ( x , u 1 ) ∂ u 1 = ∂ ∂ u 1 ( x + u 1 ) = 0 + 1 = 1 ∂ u 2 ( x , u 1 ) ∂ x ≠ ∂ ∂ x ( x + u 1 ) = 1 + 0 = 1 \begin{aligned} \frac{\partial u_1(x)}{\partial x} &= 2x\\ \frac{\partial u_2(x,u_1)}{\partial u_1} &= \frac{\partial}{\partial u_1}(x + u_1) = 0 + 1= 1\\ \frac{\partial u_2(x, u_1)}{\partial x}&\neq \frac{\partial}{\partial x}(x + u_1) = 1 + 0 = 1 \end{aligned}

∂ u 2 ( x , u 1 ) ∂ x \frac{\partial u_2(x, u_1)}{\partial x} 出现问题，因为 u 1 u_1 包含变量了 x x 。在计算偏导的时候不能把 u 1 u_1 看作标量。可以通过如下计算图展示。

x x 的变化会通过加法和平方运算影响到 y y 。下面的式子可以看出来 x x 如何影响 y y

y ^ = ( x + Δ x ) + ( x + Δ x ) 2 \hat{y} = (x +\Delta x) + (x +\Delta x)^2

Δ y = y ^ − y \Delta y = \hat{y} - y , 此时需要引出总导数( total derivatives）, 他假设所有的中间变量都包含 x x 并且可能随着 x x 的变化而变化。公式如下：

d y d x = ∂ f ( x ) x = ∂ u 2 ( x , u 1 ) ∂ x = ∂ u 2 ∂ x ∂ x ∂ x + ∂ u 2 ∂ u 1 ∂ u 1 ∂ x = ∂ u 2 ∂ x + ∂ u 2 ∂ u 1 ∂ u 1 ∂ x \frac{dy}{dx}=\frac{\partial f(x)}{x} = \frac{\partial u_2(x, u_1)}{\partial x} = \frac{\partial u_2}{\partial x}\frac{\partial x}{\partial x} + \frac{\partial u_2}{\partial u_1}\frac{\partial u_1}{\partial x} = \frac{\partial u_2}{\partial x} + \frac{\partial u_2}{\partial u_1}\frac{\partial u_1}{\partial x}

d y d x = ∂ u 2 ∂ x + ∂ u 2 ∂ u 1 ∂ u 1 ∂ x = 1 + 1 × 2 x = 1 = 2 x \frac{dy}{dx} = \frac{\partial u_2}{\partial x} + \frac{\partial u_2}{\partial u_1}\frac{\partial u_1}{\partial x} = 1 + 1\times2x = 1 = 2x

∂ f ( x , u 1 , ⋯   , u n ) ∂ x = ∂ f ∂ x + ∑ i n ∂ f ∂ u i ∂ u i ∂ x \frac{\partial f(x, u_1,\cdots,u_n)}{\partial x}=\frac{\partial f}{\partial x} + \sum_i^n\frac{\partial f}{\partial u_i}\frac{\partial u_i}{\partial x}

u 1 ( x ) = x 2 u 2 ( x , u 1 ) = x + u 1 u 3 ( u 2 ) = s i n ( u 2 ) \begin{aligned} u_1(x) &= x^2\\ u_2(x, u_1) &= x + u_1\\ u_3(u_2) &= sin(u_2) \end{aligned}

∂ u 1 ∂ x = 2 x ∂ u 2 ∂ x = ∂ x ∂ x + ∂ u 2 ∂ u 1 ∂ u 1 ∂ x = 1 + 2 x ∂ f ( x ) ∂ x = ∂ u 3 ∂ x + ∂ u 3 ∂ u 2 ∂ u 2 ∂ x = 0 + c o s ( u 2 ) ∂ u 2 ∂ x = c o s ( x + x 2 ) ( 1 + 2 x ) \begin{aligned} \frac{\partial u_1}{\partial x} &= 2x\\ \frac{\partial u_2}{\partial x} &=\frac{\partial x}{\partial x} + \frac{\partial u_2}{\partial u_1}\frac{\partial u_1}{\partial x}= 1 + 2x\\ \frac{\partial f(x)}{\partial x} &= \frac{\partial u_3}{\partial x} +\frac{\partial u_3}{\partial u_2}\frac{\partial u_2}{\partial x} = 0 + cos(u_2)\frac{\partial u_2}{\partial x} = cos(x + x^2)(1+2x) \end{aligned}

u 1 ( x ) = x 2 u 2 ( x , u 1 ) = x u 1 ∂ u 1 ∂ x = 2 x ∂ u 2 ∂ x = u 1 + ∂ u 2 ∂ u 1 ∂ u 1 ∂ x = x 2 + x × 2 x = 3 x 2 \begin{aligned} u_1(x) &= x^2\\ u_2(x, u1) &= xu_1\\ \frac{\partial u_1}{\partial x} &= 2x\\ \frac{\partial u_2}{\partial x} &= u_1 + \frac{\partial u_2}{\partial u_1}\frac{\partial u_1}{\partial x} = x^2 + x\times 2x = 3x^2 \end{aligned}

∂ f ( u 1 , ⋯   , u n + 1 ) ∂ x = ∑ i = 1 n + 1 ∂ f ∂ u i ∂ u i ∂ x \frac{\partial f(u_1,\cdots, u_{n + 1})}{\partial x} = \sum_{i=1}^{n + 1}\frac{\partial f}{\partial u_i}\frac{\partial u_i}{\partial x}

### 向量链式法则

[ y 1 ( x ) y 2 ( x ) ] = [ f 1 ( x ) f 2 ( x ) ] = [ l n ( x 2 ) s i n ( 3 x ) ] \begin{bmatrix} y_1(x)\\ y_2(x)\\ \end{bmatrix}= \begin{bmatrix} f_1(x)\\ f_2(x) \end{bmatrix}= \begin{bmatrix} ln(x^2)\\ sin(3x) \end{bmatrix}

[ g 1 ( x ) g 2 ( x ) ] = [ x 2 3 x ] [ f 1 ( g ) f 2 ( g ) ] = [ l n ( g 1 ) s i n ( g 2 ) ] \begin{aligned} \begin{bmatrix} g_1(x)\\ g_2(x) \end{bmatrix} &= \begin{bmatrix} x^2\\ 3x \end{bmatrix}\\ \begin{bmatrix} f_1(\mathbf{g})\\ f_2(\mathbf{g}) \end{bmatrix} &= \begin{bmatrix} ln(g_1)\\ sin(g_2) \end{bmatrix} \end{aligned}

∂ y ∂ x = [ ∂ f 1 ( g ) ∂ x ∂ f 2 ( g ) ∂ x ] = [ ∂ f 1 ∂ g 1 ∂ g 1 ∂ x + ∂ f 1 ∂ g 2 ∂ g 2 ∂ x ∂ f 2 ∂ g 1 ∂ g 1 ∂ x + ∂ f 2 ∂ g 2 ∂ g 2 ∂ x ] = [ 1 g 1 2 x + 0 0 + c o s ( g 2 ) 3 ] = [ 2 x 3 c o s ( 3 x ) ] \frac{\partial\mathbf{y}}{\partial x} = \begin{bmatrix} \frac{\partial f_1(\mathbf{g})}{\partial x}\\ \frac{\partial f_2(\mathbf{g})}{\partial x} \end{bmatrix}= \begin{bmatrix} \frac{\partial f_1}{\partial g_1}\frac{\partial g_1}{\partial x} + \frac{\partial f_1}{\partial g_2}\frac{\partial g_2}{\partial x}\\ \frac{\partial f_2}{\partial g_1}\frac{\partial g_1}{\partial x} + \frac{\partial f_2}{\partial g_2}\frac{\partial g_2}{\partial x} \end{bmatrix} = \begin{bmatrix} \frac{1}{g_1}2x+0\\ 0 + cos(g_2)3 \end{bmatrix}= \begin{bmatrix} \frac{2}{x}\\ 3cos(3x) \end{bmatrix}

∂ ∂ x f ( g ( x ) ) = [ ∂ f 1 ∂ g 1 ∂ g 1 ∂ x + ∂ f 1 ∂ g 2 ∂ g 2 ∂ x ∂ f 2 ∂ g 1 ∂ g 1 ∂ x + ∂ f 2 ∂ g 2 ∂ g 2 ∂ x ] = [ ∂ f 1 ∂ g 1 ∂ f 1 ∂ g 2 ∂ f 1 ∂ g 1 ∂ f 2 ∂ g 2 ] [ ∂ g 1 ∂ x ∂ g 2 ∂ x ] = ∂ f ∂ g ∂ g ∂ x \frac{\partial}{\partial x}\mathbf{f}(g(x))= \begin{bmatrix} \frac{\partial f_1}{\partial g_1}\frac{\partial g_1}{\partial x} + \frac{\partial f_1}{\partial g_2}\frac{\partial g_2}{\partial x}\\ \frac{\partial f_2}{\partial g_1}\frac{\partial g_1}{\partial x} + \frac{\partial f_2}{\partial g_2}\frac{\partial g_2}{\partial x} \end{bmatrix} = \begin{bmatrix} \frac{\partial f_1}{\partial g_1} & \frac{\partial f_1}{\partial g_2}\\ \frac{\partial f_1}{\partial g_1} & \frac{\partial f_2}{\partial g_2} \end{bmatrix} \begin{bmatrix} \frac{\partial g_1}{\partial x}\\ \frac{\partial g_2}{\partial x} \end{bmatrix}= \frac{\partial \mathbf{f}}{\partial \mathbf{g}} \frac{\partial \mathbf{g}}{\partial x}

∂ ∂ x f ( g ( x ) ) = ∂ f ∂ g ∂ g ∂ x = [ f 1 g 1 f 1 g 2 ⋯ f 1 g k f 2 g 1 f 2 g 2 ⋯ f 2 g k ⋮ ⋮ ⋮ f m g 1 f m g 2 ⋯ f m g k ] [ ∂ g 1 ∂ x 1 ∂ g 1 ∂ x 2 ⋯ ∂ g 1 ∂ x n ∂ g 2 ∂ x 1 ∂ g 2 ∂ x 1 ⋯ ∂ g 2 ∂ x n ⋮ ⋮ ⋮ ∂ g k ∂ x 1 ∂ g k ∂ x 2 ⋯ ∂ g k ∂ x n ] \frac{\partial}{\partial \mathbf{x}}\mathbf{f(g(x))} = \frac{\partial \mathbf{f}}{\partial \mathbf{g}}\frac{\partial \mathbf{g}}{\partial\mathbf{x}} = \begin{bmatrix} \frac{f_1}{g_1} &\frac{f_1}{g_2} &\cdots &\frac{f_1}{g_k}\\ \frac{f_2}{g_1} &\frac{f_2}{g_2} &\cdots &\frac{f_2}{g_k}\\ \vdots &\vdots &&\vdots\\ \frac{f_m}{g_1}&\frac{f_m}{g_2} &\cdots&\frac{f_m}{g_k} \end{bmatrix} \begin{bmatrix} \frac{\partial g_1}{\partial x_1} &\frac{\partial g_1}{\partial x_2}&\cdots&\frac{\partial g_1}{\partial x_n}\\ \frac{\partial g_2}{\partial x_1} &\frac{\partial g_2}{\partial x_1}&\cdots&\frac{\partial g_2}{\partial x_n}\\ \vdots&\vdots&&\vdots\\ \frac{\partial g_k}{\partial x_1}& \frac{\partial g_k}{\partial x_2} &\cdots &\frac{\partial g_k}{\partial x_n} \end{bmatrix}