复合函数的前向微分与反向自动微分计算
关于
- 首次发表日期:2024-09-13
- 参考:
- https://rufflewind.com/2016-12-30/reverse-mode-automatic-differentiation
- Calculus Early Transcendentals 9e - James Stewart (2020)
- https://en.wikipedia.org/wiki/Automatic_differentiation
- 水平有限,如有错误,请不吝指出
前向与反向自动微分:数学
先复习一下微积分求导法则
微积分求导法则复习
乘法法则
f ( x ) = u ( x ) × v ( x ) f(x) = u(x) \times v(x) f(x)=u(x)×v(x)
d y d x = d u d x × v + d v d x × u f ′ ( x ) = u ′ v + v ′ u \begin{aligned} \frac{dy}{dx} &= \frac{du}{dx} \times v + \frac{dv}{dx} \times u \\ f'(x) &= u'v + v'u \end{aligned} dxdyf′(x)=dxdu×v+dxdv×u=u′v+v′u
f ( x ) = ( 3 x − 5 ) × ( 4 x + 7 ) u = 3 x − 5 v = 4 x + 7 u ′ = 3 v ′ = 4 f ′ ( x ) = 3 ( 4 x + 7 ) + 4 ( 3 x − 5 ) = 12 x + 21 + 12 x − 20 = 24 x + 1 = 24 x + 1 \begin{aligned} f(x)&=(3 x-5) \times(4 x+7) \\ u&=3 x-5 \quad v=4 x+7 \\ u^{\prime}&=3 \quad v^{\prime}=4 \\ f^{\prime}(x)&=3(4 x+7)+4(3 x-5) \\ &=12 x+21+12 x-20=24 x+1 \\ &=24 x+1 \end{aligned} f(x)uu′f′(x)=(3x−5)×(4x+7)=3x−5v=4x+7=3v′=4=3(4x+7)+4(3x−5)=12x+21+12x−20=24x+1=24x+1
除法法则
f ( x ) = u ( x ) v ( x ) f(x) = \frac{u(x)}{v(x)} f(x)=v(x)u(x)
f ′ ( x ) = u ′ v − v ′ u v 2 d y d x = d u d x v − d v d x u v 2 \begin{aligned} f'(x) &= \frac{u'v - v'u}{v^2} \\ \frac{dy}{dx} &= \frac{\frac{du}{dx}v - \frac{dv}{dx}u}{v^2} \end{aligned} f′(x)dxdy=v2u′v−v′u=v2dxduv−dxdvu
f ( x ) = 3 x − 5 4 x + 7 u = 3 x − 5 v = 4 x + 7 u ′ = 3 v ′ = 4 f ′ ( x ) = 3 ( 4 x + 7 ) − 4 ( 3 x − 5 ) ( 4 x + 7 ) 2 = 12 x + 21 − 12 x + 20 ( 4 x + 7 ) 2 = 41 ( 4 x + 7 ) 2 \begin{aligned} f(x)&=\frac{3 x-5}{4 x+7} \\ u&=3 x-5 \quad v=4 x+7 \\ u^{\prime}&=3 \quad v^{\prime}=4 \\ f^{\prime}(x)&=\frac{3(4 x+7)-4(3 x-5)}{(4 x+7)^2} \\ &=\frac{12 x+21-12 x+20}{(4 x+7)^2} \\ &=\frac{41}{(4 x+7)^2} \end{aligned} f(x)uu′f′(x)=4x+73x−5=3x−5v=4x+7=3v′=4=(4x+7)23(4x+7)−4(3x−5)=(4x+7)212x+21−12x+20=(4x+7)241
cos和sin求导
y = sin ( x ) d y d x = cos ( x ) \begin{aligned} y &= \sin(x) \\ \frac{dy}{dx} &= \cos(x) \end{aligned} ydxdy=sin(x)=cos(x)
y = cos ( x ) d y d x = − sin ( x ) \begin{aligned} y = \cos(x) \\ \frac{dy}{dx} = -\sin(x) \end{aligned} y=cos(x)dxdy=−sin(x)
链式法则(单变量复合函数)
y = f ( u ) u = f ( x ) y = f(u) \quad u = f(x) y=f(u)u=f(x)
d y d x = d y d u ⋅ d u d x \frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx} dxdy=dudy⋅dxdu
y = ( 2 x + 4 ) 3 y = u 3 and u = 2 x + 4 d y d u = 3 u 2 d u d x = 2 d y d x = 3 u 2 × 2 = 2 × 3 ( 2 x + 4 ) 2 = 6 ( 2 x + 4 ) 2 \begin{aligned} y&=(2 x+4)^3 \\ y&=u^3 \text { and } u=2 x+4 \\ \frac{d y}{d u}&=3 u^2 \quad \frac{d u}{d x}=2 \\ \frac{d y}{d x}&=3 u^2 \times 2=2 \times 3(2 x+4)^2 \\ &=6(2 x+4)^2 \end{aligned} yydudydxdy=(2x+4)3=u3 and u=2x+4=3u2dxdu=2=3u2×2=2×3(2x+4)2=6(2x+4)2
多变量链式法则(Case 1)
z = f ( x , y ) x = g ( t ) y = h ( t ) \begin{aligned} z &= f(x,y) \\ x &= g(t) \\ y &= h(t) \\ \end{aligned} zxy=f(x,y)=g(t)=h(t)
d z d t = ∂ f ∂ x d x d t + ∂ f ∂ y d y d t \frac{d z}{d t}=\frac{\partial f}{\partial x} \frac{d x}{d t}+\frac{\partial f}{\partial y} \frac{d y}{d t} dtdz=∂x∂fdtdx+∂y∂fdtdy
多变量链式法则(Case 2)
z = f ( x , y ) x = g ( s , t ) y = h ( s , t ) \begin{aligned} z &= f(x,y) \\ x & = g(s,t) \\ y &= h(s,t) \end{aligned} zxy=f(x,y)=g(s,t)=h(s,t)
∂ z ∂ s = ∂ z ∂ x ∂ x ∂ s + ∂ z ∂ y ∂ y ∂ s ∂ z ∂ t = ∂ z ∂ x ∂ x ∂ t + ∂ z ∂ y ∂ y ∂ t \frac{\partial z}{\partial s}=\frac{\partial z}{\partial x} \frac{\partial x}{\partial s}+\frac{\partial z}{\partial y} \frac{\partial y}{\partial s} \quad \frac{\partial z}{\partial t}=\frac{\partial z}{\partial x} \frac{\partial x}{\partial t}+\frac{\partial z}{\partial y} \frac{\partial y}{\partial t} ∂s∂z=∂x∂z∂s∂x+∂y∂z∂s∂y∂t∂z=∂x∂z∂t∂x+∂y∂z∂t∂y
当计算 ∂ z ∂ s \frac{\partial z}{\partial s} ∂s∂z时,我们保持(hold) t t t 固定并计算 z z z 对 s s s 的普通导数,即应用多变量链式法则(Case 1)。计算 ∂ z ∂ t \frac{\partial z}{\partial t} ∂t∂z时同理。
多变量链式法则(广义版)
u = f ( x 1 , x 2 , … , x n ) x k = g ( t 1 , t 2 , … , t m ) for 1 ≤ k ≤ n \begin{aligned} u &= f(x_1, x_2, \ldots, x_n) \\ x_k &= g(t_1, t_2, \ldots, t_m) \qquad \text{for } 1 \leq k \leq n \end{aligned} uxk=f(x1,x2,…,xn)=g(t1,t2,…,tm)for 1≤k≤n
∂ u ∂ t i = ∂ u ∂ x 1 ∂ x 1 ∂ t i + ∂ u ∂ x 2 ∂ x 2 ∂ t i + ⋯ + ∂ u ∂ x n ∂ x n ∂ t i for 1 ≤ i ≤ m \begin{aligned} &\frac{\partial u}{\partial t_i}=\frac{\partial u}{\partial x_1} \frac{\partial x_1}{\partial t_i}+\frac{\partial u}{\partial x_2} \frac{\partial x_2}{\partial t_i}+\cdots+\frac{\partial u}{\partial x_n} \frac{\partial x_n}{\partial t_i} \end{aligned} \qquad \text{for } 1 \leq i \leq m ∂ti∂u=∂x1∂u∂ti∂x1+∂x2∂u∂ti∂x2+⋯+∂xn∂u∂ti∂xnfor 1≤i≤m
复合函数,偏微分,链式法则,前向和反向自动微分
前向与反向的计算顺序
对于组合函数:
y = f ( g ( h ( x ) ) ) = f ( g ( h ( w 0 ) ) ) = f ( g ( w 1 ) ) = f ( w 2 ) = w 3 w 0 = x w 1 = h ( w 0 ) w 2 = g ( w 1 ) w 3 = f ( w 2 ) = y \begin{aligned} y & =f(g(h(x)))=f\left(g\left(h\left(w_0\right)\right)\right)=f\left(g\left(w_1\right)\right)=f\left(w_2\right)=w_3 \\ w_0 & =x \\ w_1 & =h\left(w_0\right) \\ w_2 & =g\left(w_1\right) \\ w_3 & =f\left(w_2\right)=y \end{aligned} yw0w1w2w3=f(g(h(x)))=f(g(h(w0)))=f(g(w1))=f(w2)=w3=x=h(w0)=g(w1)=f(w2)=y
链式法则将给出:
∂ y ∂ x = ∂ y ∂ w 2 ∂ w 2 ∂ w 1 ∂ w 1 ∂ x = ∂ f ( w 2 ) ∂ w 2 ∂ g ( w 1 ) ∂ w 1 ∂ h ( w 0 ) ∂ x \begin{aligned} \frac{\partial y}{\partial x}&=\frac{\partial y}{\partial w_2} \frac{\partial w_2}{\partial w_1} \frac{\partial w_1}{\partial x}=\frac{\partial f\left(w_2\right)}{\partial w_2} \frac{\partial g\left(w_1\right)}{\partial w_1} \frac{\partial h\left(w_0\right)}{\partial x} \end{aligned} ∂x∂y=∂w2∂y∂w1∂w2∂x∂w1=∂w2∂f(w2)∂w1∂g(w1)∂x∂h(w0)
计算顺序:
- 前向微分计算时 ,先计算 ∂ w 1 / ∂ x \partial w_1 / \partial x ∂w1/∂x,然后计算 ∂ w 2 / ∂ w 1 \partial w_2/\partial w_1 ∂w2/∂w1,最后计算 ∂ y / ∂ w 2 \partial y / \partial w_2 ∂y/∂w2
- 反向微分计算时,先计算 ∂ y / ∂ w 2 \partial y / \partial w_2 ∂y/∂w2,然后计算 ∂ w 2 / ∂ w 1 \partial w_2/\partial w_1 ∂w2/∂w1,最后计算 ∂ w 1 / ∂ x \partial w_1 / \partial x ∂w1/∂x
前向微分
对于组合函数:
r = ? s = ? t = ? x = g ( r , s , t ) y = h ( r , s , t ) z = i ( r , s , t ) u = f ( x , y , z ) \begin{aligned} r &= ? \\ s &= ? \\ t &= ? \\ x &= g(r,s,t) \\ y & = h(r,s,t) \\ z &= i(r,s,t) \\ u &= f(x,y,z) \end{aligned} rstxyzu=?=?=?=g(r,s,t)=h(r,s,t)=i(r,s,t)=f(x,y,z)
前向微分计算:
∂ r ∂ v = ? ∂ s ∂ v = ? ∂ t ∂ v = ? ∂ x ∂ v = ∂ x ∂ r ∂ r ∂ v + ∂ x ∂ s ∂ s ∂ v + ∂ x ∂ t ∂ t ∂ v ∂ y ∂ v = ∂ y ∂ r ∂ r ∂ v + ∂ y ∂ s ∂ s ∂ v + ∂ y ∂ t ∂ t ∂ v ∂ z ∂ v = ∂ z ∂ r ∂ r ∂ v + ∂ z ∂ s ∂ s ∂ v + ∂ z ∂ t ∂ t ∂ v ∂ u ∂ v = ∂ u ∂ x ∂ x ∂ v + ∂ u ∂ y ∂ y ∂ v + ∂ u ∂ z ∂ z ∂ v \begin{aligned} \frac{\partial r}{\partial v} &= ? \\ \frac{\partial s}{\partial v} &= ? \\ \frac{\partial t}{\partial v} &= ? \\ \\ \frac{\partial x}{\partial v} &= \frac{\partial x}{\partial r}\frac{\partial r}{\partial v} + \frac{\partial x}{\partial s}\frac{\partial s}{\partial v} + \frac{\partial x}{\partial t}\frac{\partial t}{\partial v} \\ \frac{\partial y}{\partial v} &= \frac{\partial y}{\partial r}\frac{\partial r}{\partial v} + \frac{\partial y}{\partial s}\frac{\partial s}{\partial v} + \frac{\partial y}{\partial t}\frac{\partial t}{\partial v} \\ \frac{\partial z}{\partial v} &= \frac{\partial z}{\partial r}\frac{\partial r}{\partial v} + \frac{\partial z}{\partial s}\frac{\partial s}{\partial v} + \frac{\partial z}{\partial t}\frac{\partial t}{\partial v} \\ \\ \frac{\partial u}{\partial v}&=\frac{\partial u}{\partial x} \frac{\partial x}{\partial v}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial v}+\frac{\partial u}{\partial z} \frac{\partial z}{\partial v} \end{aligned} ∂v∂r∂v∂s∂v∂t∂v∂x∂v∂y∂v∂z∂v∂u=?=?=?=∂r∂x∂v∂r+∂s∂x∂v∂s+∂t∂x∂v∂t=∂r∂y∂v∂r+∂s∂y∂v∂s+∂t∂y∂v∂t=∂r∂z∂v∂r+∂s∂z∂v∂s+∂t∂z∂v∂t=∂x∂u∂v∂x+∂y∂u∂v∂y+∂z∂u∂v∂z
当 v = r v=r v=r,即将 r r r作为独立变量并将 s s s和 t t t固定时,可得
∂ r ∂ v = 1 ∂ s ∂ v = 0 ∂ t ∂ v = 0 ∂ u ∂ r = ∂ u ∂ x ∂ x ∂ r + ∂ u ∂ y ∂ y ∂ r + ∂ u ∂ z ∂ z ∂ r \begin{aligned} \frac{\partial r}{\partial v} &= 1 \\ \frac{\partial s}{\partial v} &= 0 \\ \frac{\partial t}{\partial v} &= 0 \\ \frac{\partial u}{\partial r}&=\frac{\partial u}{\partial x} \frac{\partial x}{\partial r}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial r}+\frac{\partial u}{\partial z} \frac{\partial z}{\partial r} \end{aligned} ∂v∂r∂v∂s∂v∂t∂r∂u=1=0=0=∂x∂u∂r∂x+∂y∂u∂r∂y+∂z∂u∂r∂z
当 v = s v=s v=s,即将 s s s作为独立变量并将 r r r和 t t t固定时,可得
∂ r ∂ v = 0 ∂ s ∂ v = 1 ∂ t ∂ v = 0 ∂ u ∂ s = ∂ u ∂ x ∂ x ∂ s + ∂ u ∂ y ∂ y ∂ s + ∂ u ∂ z ∂ z ∂ s \begin{aligned} \frac{\partial r}{\partial v} &= 0 \\ \frac{\partial s}{\partial v} &= 1 \\ \frac{\partial t}{\partial v} &= 0 \\ \frac{\partial u}{\partial s}&=\frac{\partial u}{\partial x} \frac{\partial x}{\partial s}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial s}+\frac{\partial u}{\partial z} \frac{\partial z}{\partial s} \end{aligned} ∂v∂r∂v∂s∂v∂t∂s∂u=0=1=0=∂x∂u∂s∂x+∂y∂u∂s∂y+∂z∂u∂s∂z
当 v = t v=t v=t,即将 t t t作为独立变量并将 s s s和 r r r固定时,可得
∂ r ∂ v = 0 ∂ s ∂ v = 0 ∂ t ∂ v = 1 ∂ u ∂ t = ∂ u ∂ x ∂ x ∂ t + ∂ u ∂ y ∂ y ∂ t + ∂ u ∂ z ∂ z ∂ t \begin{aligned} \frac{\partial r}{\partial v} &= 0 \\ \frac{\partial s}{\partial v} &= 0 \\ \frac{\partial t}{\partial v} &= 1 \\ \frac{\partial u}{\partial t}&=\frac{\partial u}{\partial x} \frac{\partial x}{\partial t}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial t}+\frac{\partial u}{\partial z} \frac{\partial z}{\partial t} \end{aligned} ∂v∂r∂v∂s∂v∂t∂t∂u=0=0=1=∂x∂u∂t∂x+∂y∂u∂t∂y+∂z∂u∂t∂z
反向微分
对于组合函数:
u 1 = r ( x 1 , x 2 ) u 2 = s ( x 1 , x 2 ) y 1 = f ( u 1 , u 2 ) y 2 = g ( u 1 , u 2 ) y 3 = h ( u 1 , u 2 ) \begin{aligned} u_1 &= r(x_1, x_2) \\ u_2 &= s(x_1, x_2) \\ y_1 &= f(u_1, u_2) \\ y_2 &= g(u_1, u_2) \\ y_3 &= h(u_1, u_2) \end{aligned} u1u2y1y2y3=r(x1,x2)=s(x1,x2)=f(u1,u2)=g(u1,u2)=h(u1,u2)
反向微分计算:
∂ s ∂ y 1 = ? ∂ s ∂ y 2 = ? ∂ s ∂ y 3 = ? ∂ s ∂ u 1 = ∂ s ∂ y 1 ∂ y 1 ∂ u 1 + ∂ s ∂ y 2 ∂ y 2 ∂ u 1 + ∂ s ∂ y 3 ∂ y 3 ∂ u 1 ∂ s ∂ u 2 = ∂ s ∂ y 1 ∂ y 1 ∂ u 2 + ∂ s ∂ y 2 ∂ y 2 ∂ u 2 + ∂ s ∂ y 3 ∂ y 3 ∂ u 2 ∂ s ∂ x 1 = ∂ s ∂ u 1 ∂ u 1 ∂ x 1 + ∂ s ∂ u 2 ∂ u 2 ∂ x 1 ∂ s ∂ x 2 = ∂ s ∂ u 1 ∂ u 1 ∂ x x + ∂ s ∂ u 2 ∂ u 2 ∂ x x \begin{aligned} \frac{\partial s}{\partial y_1} &= ? \\ \frac{\partial s}{\partial y_2} &= ? \\ \frac{\partial s}{\partial y_3} &= ? \\ \\ \frac{\partial s}{\partial u_1} &= \frac{\partial s}{\partial y_1}\frac{\partial y_1}{\partial u_1} + \frac{\partial s}{\partial y_2}\frac{\partial y_2}{\partial u_1} + \frac{\partial s}{\partial y_3}\frac{\partial y_3}{\partial u_1} \\ \frac{\partial s}{\partial u_2} &= \frac{\partial s}{\partial y_1}\frac{\partial y_1}{\partial u_2} + \frac{\partial s}{\partial y_2}\frac{\partial y_2}{\partial u_2} + \frac{\partial s}{\partial y_3}\frac{\partial y_3}{\partial u_2} \\ \\ \frac{\partial s}{\partial x_1} &= \frac{\partial s}{\partial u_1}\frac{\partial u_1}{\partial x_1} + \frac{\partial s}{\partial u_2}\frac{\partial u_2}{\partial x_1} \\ \frac{\partial s}{\partial x_2} &= \frac{\partial s}{\partial u_1}\frac{\partial u_1}{\partial x_x} + \frac{\partial s}{\partial u_2}\frac{\partial u_2}{\partial x_x} \end{aligned} ∂y1∂s∂y2∂s∂y3∂s∂u1∂s∂u2∂s∂x1∂s∂x2∂s=?=?=?=∂y1∂s∂u1∂y1+∂y2∂s∂u1∂y2+∂y3∂s∂u1∂y3=∂y1∂s∂u2∂y1+∂y2∂s∂u2∂y2+∂y3∂s∂u2∂y3=∂u1∂s∂x1∂u1+∂u2∂s∂x1∂u2=∂u1∂s∂xx∂u1+∂u2∂s∂xx∂u2
可以想象有一个函数 s = f u n c t i o n ( y 1 , y 2 , y 3 ) s=function(y_1,y_2,y_3) s=function(y1,y2,y3)
当 s = y 1 s=y_1 s=y1,即将 y 1 y_1 y1作为独立变量并将 y 2 y_2 y2和 y 3 y_3 y3固定时,可得
∂ s ∂ y 1 = 1 ∂ s ∂ y 2 = 0 ∂ s ∂ y 3 = 0 ∂ s ∂ u 1 = ∂ s ∂ y 1 ∂ y 1 ∂ u 1 ∂ s ∂ u 2 = ∂ s ∂ y 1 ∂ y 1 ∂ u 2 ∂ s ∂ x 1 = ∂ s ∂ u 1 ∂ u 1 ∂ x 1 + ∂ s ∂ u 2 ∂ u 2 ∂ x 1 ∂ s ∂ x 2 = ∂ s ∂ u 1 ∂ u 1 ∂ x x + ∂ s ∂ u 2 ∂ u 2 ∂ x x \begin{aligned} \frac{\partial s}{\partial y_1} &= 1 \\ \frac{\partial s}{\partial y_2} &= 0 \\ \frac{\partial s}{\partial y_3} &= 0 \\ \\ \frac{\partial s}{\partial u_1} &= \frac{\partial s}{\partial y_1}\frac{\partial y_1}{\partial u_1}\\ \frac{\partial s}{\partial u_2} &= \frac{\partial s}{\partial y_1}\frac{\partial y_1}{\partial u_2} \\ \\ \frac{\partial s}{\partial x_1} &= \frac{\partial s}{\partial u_1}\frac{\partial u_1}{\partial x_1} + \frac{\partial s}{\partial u_2}\frac{\partial u_2}{\partial x_1} \\ \frac{\partial s}{\partial x_2} &= \frac{\partial s}{\partial u_1}\frac{\partial u_1}{\partial x_x} + \frac{\partial s}{\partial u_2}\frac{\partial u_2}{\partial x_x} \end{aligned} ∂y1∂s∂y2∂s∂y3∂s∂u1∂s∂u2∂s∂x1∂s∂x2∂s=1=0=0=∂y1∂s∂u1∂y1=∂y1∂s∂u2∂y1=∂u1∂s∂x1∂u1+∂u2∂s∂x1∂u2=∂u1∂s∂xx∂u1+∂u2∂s∂xx∂u2
以例子说明自动微分的计算
例子
假设有2个输入变量( x 1 x_1 x1, x 2 x_2 x2)和2个输出变量( y 1 y_1 y1, y 2 y_2 y2):
m
1
=
x
1
⋅
x
2
+
sin
(
x
1
)
m
2
=
4
x
1
+
2
x
2
+
cos
(
x
2
)
y
1
=
m
1
+
m
2
y
2
=
m
1
⋅
m
2
(1)
\begin{aligned} m_1 &= x_1 \cdot x_2 + \sin(x_1) \\ m_2 &= 4x_1 + 2x_2 + \cos(x_2) \\ y_1 &= m_1 + m_2 \\ y_2 &= m_1 \cdot m_2 \end{aligned} \tag{1}
m1m2y1y2=x1⋅x2+sin(x1)=4x1+2x2+cos(x2)=m1+m2=m1⋅m2(1)
即:
y 1 = x 1 ⋅ x 2 + sin ( x 1 ) + 4 x 1 + 2 x 2 + cos ( x 2 ) y 2 = ( x 1 + x 2 + sin ( x 1 ) ) ⋅ ( 4 x 1 + 2 x 2 + cos ( x 2 ) ) \begin{aligned} y_1 &= x_1 \cdot x_2 + \sin(x_1) + 4x_1 + 2x_2 + \cos(x_2) \\ y_2 &= (x_1 + x_2 + \sin(x_1)) \cdot (4x_1 + 2x_2 + \cos(x_2)) \end{aligned} y1y2=x1⋅x2+sin(x1)+4x1+2x2+cos(x2)=(x1+x2+sin(x1))⋅(4x1+2x2+cos(x2))
其中:
∂ y 1 ∂ x 1 = x 2 + cos ( x 1 ) + 4 ∂ y 1 ∂ x 2 = x 1 + 2 − sin ( x 2 ) ∂ y 2 ∂ x 1 = ( x 2 + cos ( x 1 ) ) ⋅ m 2 + m 1 ⋅ 4 \begin{aligned} \frac{\partial y_1}{\partial x_1} &= x_2 + \cos(x_1) + 4 \\ \frac{\partial y_1}{\partial x_2} &= x_1 + 2 - \sin(x_2) \\ \frac{\partial y_2}{\partial x_1} &= (x_2 + \cos(x_1)) \cdot m_2 + m_1 \cdot 4 \end{aligned} ∂x1∂y1∂x2∂y1∂x1∂y2=x2+cos(x1)+4=x1+2−sin(x2)=(x2+cos(x1))⋅m2+m1⋅4
接下来,我们将以这个例子说明如何进行前向自动微分和反向自动微分
前向自动微分
我们将用到如下的链式法则:
∂ w ∂ t = ∑ i ( ∂ w ∂ u i ⋅ ∂ u i ∂ t ) = ∂ w ∂ u 1 ⋅ ∂ u 1 ∂ t + ∂ w ∂ u 2 ⋅ ∂ u 2 ∂ t + ⋯ \begin{align} \frac{\partial w}{\partial t} &= \sum_i \left(\frac{\partial w}{\partial u_i} \cdot \frac{\partial u_i}{\partial t}\right) \\ &= \frac{\partial w}{\partial u_1} \cdot \frac{\partial u_1}{\partial t} + \frac{\partial w}{\partial u_2} \cdot \frac{\partial u_2}{\partial t} + \cdots \end{align} ∂t∂w=i∑(∂ui∂w⋅∂t∂ui)=∂u1∂w⋅∂t∂u1+∂u2∂w⋅∂t∂u2+⋯
其中:
-
w
w
w表示输出
- 在例子中,为 y 1 y_1 y1或者 y 2 y_2 y2
-
u
i
u_i
ui表示直接影响
w
w
w的输入变量
- 在例子中,为 a a a和 b b b
-
t
t
t表示有待给出的输入变量
- 在例子中,为 x 1 x_1 x1或者 x 2 x_2 x2其中之一
在计算之前,我们先将公式(1)分解为简单的算子计算:
x 1 = ? x 2 = ? a = x 1 ⋅ x 2 b = sin ( x 1 ) c = 4 x 1 + 2 x 2 d = cos ( x 2 ) m 1 = a + b m 2 = c + d y 1 = m 1 + m 2 y 2 = m 1 ⋅ m 2 (2) \begin{aligned} x_1 &= ? \\ x_2 &= ? \\ \\ a &= x_1 \cdot x_2 \\ b &= \sin(x_1) \\ \\ c &= 4x_1 + 2x_2 \\ d &= \cos(x_2) \\ \\ m_1 &= a + b \\ m_2 &= c + d \\ \\ y_1 &= m_1 + m_2 \\ y_2 &= m_1 \cdot m_2 \end{aligned} \tag{2} x1x2abcdm1m2y1y2=?=?=x1⋅x2=sin(x1)=4x1+2x2=cos(x2)=a+b=c+d=m1+m2=m1⋅m2(2)
现在我们对有待给出的变量 t t t求导:
∂ x 1 ∂ t = ? ∂ x 2 ∂ t = ? ∂ a ∂ t = x 2 ∂ x 1 ∂ t + x 1 ∂ x 2 ∂ t ∂ b ∂ t = cos ( x 1 ) ∂ x 1 ∂ t ∂ c ∂ t = 4 ∂ x 1 ∂ t + 2 ∂ x 2 ∂ t ∂ d ∂ t = − sin ( x 2 ) ∂ x 2 ∂ t ∂ m 1 ∂ t = ∂ a ∂ t + ∂ b ∂ t ∂ m 2 ∂ t = ∂ c ∂ t + ∂ d ∂ t ∂ y 1 ∂ t = ∂ m 1 ∂ t + ∂ m 2 ∂ t ∂ y 2 ∂ t = ∂ m 1 ∂ t ⋅ m 2 + ∂ m 2 ∂ t ⋅ m 1 \begin{aligned} \frac{\partial x_1}{\partial t} &= ? \\ \frac{\partial x_2}{\partial t} &= ? \\ \\ \frac{\partial a}{\partial t} &= x_2\frac{\partial x_1}{\partial t} + x_1 \frac{\partial x_2}{\partial t} \\ \frac{\partial b}{\partial t} &= \cos(x_1) \frac{\partial x_1}{\partial t} \\ \\ \frac{\partial c}{\partial t} &= 4\frac{\partial x_1}{\partial t} + 2 \frac{\partial x_2}{\partial t} \\ \frac{\partial d}{\partial t} &= -\sin(x_2)\frac{\partial x_2}{\partial t} \\ \\ \frac{\partial m_1}{\partial t} &= \frac{\partial a}{\partial t} + \frac{\partial b}{\partial t} \\ \frac{\partial m_2}{\partial t} &= \frac{\partial c}{\partial t} + \frac{\partial d}{\partial t} \\ \\ \frac{\partial y_1}{\partial t} &= \frac{\partial m_1}{\partial t} + \frac{\partial m_2}{\partial t} \\ \frac{\partial y_2}{\partial t} &= \frac{\partial m_1}{\partial t} \cdot m_2 + \frac{\partial m_2}{\partial t} \cdot m_1 \end{aligned} ∂t∂x1∂t∂x2∂t∂a∂t∂b∂t∂c∂t∂d∂t∂m1∂t∂m2∂t∂y1∂t∂y2=?=?=x2∂t∂x1+x1∂t∂x2=cos(x1)∂t∂x1=4∂t∂x1+2∂t∂x2=−sin(x2)∂t∂x2=∂t∂a+∂t∂b=∂t∂c+∂t∂d=∂t∂m1+∂t∂m2=∂t∂m1⋅m2+∂t∂m2⋅m1
前面有提到 t t t是有待给出的,现在是时候给出了:
- 将 t = x 1 t=x_1 t=x1代入以上公式,则 ∂ x 1 ∂ t = 1 \frac{\partial x_1}{\partial t} = 1 ∂t∂x1=1而 ∂ x 2 ∂ t = 0 \frac{\partial x_2}{\partial t}=0 ∂t∂x2=0,然后可以计算 ∂ y 1 ∂ x 1 \frac{\partial y_1}{\partial x_1} ∂x1∂y1和 ∂ y 2 ∂ x 1 \frac{\partial y_2}{\partial x_1} ∂x1∂y2
∂ x 1 ∂ t = 1 ∂ x 2 ∂ t = 0 ∂ a ∂ t = x 2 ∂ x 1 ∂ t + x 1 ∂ x 2 ∂ t = x 2 ∂ b ∂ t = cos ( x 1 ) ∂ x 1 ∂ t = cos ( x 1 ) ∂ c ∂ t = 4 ∂ x 1 ∂ t + 2 ∂ x 2 ∂ t = 4 ∂ d ∂ t = − sin ( x 2 ) ∂ x 2 ∂ t = 0 ∂ m 1 ∂ t = ∂ a ∂ t + ∂ b ∂ t = x 2 + cos ( x 1 ) ∂ m 2 ∂ t = ∂ c ∂ t + ∂ d ∂ t = 4 ∂ y 1 ∂ t = ∂ m 1 ∂ t + ∂ m 2 ∂ t = x 2 + cos ( x 1 ) + 4 ∂ y 2 ∂ t = ∂ m 1 ∂ t ⋅ m 2 + ∂ m 2 ∂ t ⋅ m 1 = ( x 2 + cos ( x 1 ) ) ⋅ m 2 + 4 ⋅ m 1 \begin{aligned} \frac{\partial x_1}{\partial t} &= 1 \\ \frac{\partial x_2}{\partial t} &= 0 \\ \\ \frac{\partial a}{\partial t} &= x_2\frac{\partial x_1}{\partial t} + x_1 \frac{\partial x_2}{\partial t} = x_2 \\ \frac{\partial b}{\partial t} &= \cos(x_1) \frac{\partial x_1}{\partial t} = \cos(x_1) \\ \\ \frac{\partial c}{\partial t} &= 4\frac{\partial x_1}{\partial t} + 2 \frac{\partial x_2}{\partial t} = 4 \\ \frac{\partial d}{\partial t} &= -\sin(x_2)\frac{\partial x_2}{\partial t} = 0\\ \\ \frac{\partial m_1}{\partial t} &= \frac{\partial a}{\partial t} + \frac{\partial b}{\partial t} = x_2 + \cos(x_1) \\ \frac{\partial m_2}{\partial t} &= \frac{\partial c}{\partial t} + \frac{\partial d}{\partial t} = 4 \\ \\ \frac{\partial y_1}{\partial t} &= \frac{\partial m_1}{\partial t} + \frac{\partial m_2}{\partial t} = x_2 + \cos(x_1) + 4 \\ \frac{\partial y_2}{\partial t} &= \frac{\partial m_1}{\partial t} \cdot m_2 + \frac{\partial m_2}{\partial t} \cdot m_1 = (x_2 + \cos(x_1)) \cdot m_2 + 4 \cdot m_1 \end{aligned} ∂t∂x1∂t∂x2∂t∂a∂t∂b∂t∂c∂t∂d∂t∂m1∂t∂m2∂t∂y1∂t∂y2=1=0=x2∂t∂x1+x1∂t∂x2=x2=cos(x1)∂t∂x1=cos(x1)=4∂t∂x1+2∂t∂x2=4=−sin(x2)∂t∂x2=0=∂t∂a+∂t∂b=x2+cos(x1)=∂t∂c+∂t∂d=4=∂t∂m1+∂t∂m2=x2+cos(x1)+4=∂t∂m1⋅m2+∂t∂m2⋅m1=(x2+cos(x1))⋅m2+4⋅m1
- 将 t = x 2 t=x_2 t=x2代入以上公式,则 ∂ x 1 ∂ t = 0 \frac{\partial x_1}{\partial t} = 0 ∂t∂x1=0而 ∂ x 2 ∂ t = 1 \frac{\partial x_2}{\partial t}=1 ∂t∂x2=1,然后可以计算 ∂ y 1 ∂ x 2 \frac{\partial y_1}{\partial x_2} ∂x2∂y1和 ∂ y 2 ∂ x 2 \frac{\partial y_2}{\partial x_2} ∂x2∂y2
可以推断:
- 当有 n n n个输入变量时(本例中有2个),需要计算 n n n次上述公式。
- 假设神经网络中的输入是一张1280 x 720的图片,输出是51个浮点数,那么前向微分方法则需要计算921600次。
反向自动微分
我们将用到如下的链式法则:
∂ s ∂ u = ∑ i ( ∂ w i ∂ u ⋅ ∂ s ∂ w i ) = ∂ w 1 ∂ u ⋅ ∂ s ∂ w 1 + ∂ w 2 ∂ u ⋅ ∂ s ∂ w 2 + ⋯ \begin{align} \frac{\partial s}{\partial u} &= \sum_i \left(\frac{\partial w_i}{\partial u} \cdot \frac{\partial s}{\partial w_i}\right) \\ &= \frac{\partial w_1}{\partial u} \cdot \frac{\partial s}{\partial w_1} + \frac{\partial w_2}{\partial u} \cdot \frac{\partial s}{\partial w_2} + \cdots \end{align} ∂u∂s=i∑(∂u∂wi⋅∂wi∂s)=∂u∂w1⋅∂w1∂s+∂u∂w2⋅∂w2∂s+⋯
其中:
- u u u 表示输入变量
- w i w_i wi 表示依赖 u u u 的输出变量
- s s s 表示有待给出的变量
回顾拆解后的简单算子计算(2):
x 1 = ? x 2 = ? a = x 1 ⋅ x 2 b = sin ( x 1 ) c = 4 x 1 + 2 x 2 d = cos ( x 2 ) m 1 = a + b m 2 = c + d y 1 = m 1 + m 2 y 2 = m 1 ⋅ m 2 (2) \begin{aligned} x_1 &= ? \\ x_2 &= ? \\ \\ a &= x_1 \cdot x_2 \\ b &= \sin(x_1) \\ \\ c &= 4x_1 + 2x_2 \\ d &= \cos(x_2) \\ \\ m_1 &= a + b \\ m_2 &= c + d \\ \\ y_1 &= m_1 + m_2 \\ y_2 &= m_1 \cdot m_2 \end{aligned} \tag{2} x1x2abcdm1m2y1y2=?=?=x1⋅x2=sin(x1)=4x1+2x2=cos(x2)=a+b=c+d=m1+m2=m1⋅m2(2)
现在计算反向微分:
∂ s ∂ y 1 = ? ∂ s ∂ y 2 = ? ∂ s ∂ m 1 = ∂ s ∂ y 1 ∂ y 1 ∂ m 1 + ∂ s ∂ y 2 ∂ y 2 ∂ m 1 ∂ s ∂ m 2 = ∂ s ∂ y 1 ∂ y 1 ∂ m 2 + ∂ s ∂ y 2 ∂ y 2 ∂ m 2 ∂ s ∂ a = ∂ s ∂ m 1 ∂ m 1 ∂ a ∂ s ∂ b = ∂ s ∂ m 1 ∂ m 1 ∂ b ∂ s ∂ c = ∂ s ∂ m 2 ∂ m 2 ∂ c ∂ s ∂ d = ∂ s ∂ m 2 ∂ m 2 ∂ d ∂ s ∂ x 1 = ∂ s ∂ a ∂ a ∂ x 1 + ∂ s ∂ b ∂ b ∂ x 1 + ∂ s ∂ c ∂ c ∂ x 1 ∂ s ∂ x 2 = ∂ s ∂ a ∂ a ∂ x 1 + ∂ s ∂ c ∂ c ∂ x 1 + ∂ s ∂ d ∂ d ∂ x 1 \begin{aligned} \frac{\partial s}{\partial y_1} &= ? \\ \frac{\partial s}{\partial y_2} &= ? \\ \\ \frac{\partial s}{\partial m_1} &= \frac{\partial s}{\partial y_1} \frac{\partial y_1}{\partial m_1} + \frac{\partial s}{\partial y_2} \frac{\partial y_2}{\partial m_1} \\ \frac{\partial s}{\partial m_2} &= \frac{\partial s}{\partial y_1} \frac{\partial y_1}{\partial m_2} + \frac{\partial s}{\partial y_2} \frac{\partial y_2}{\partial m_2} \\ \\ \frac{\partial s}{\partial a} &= \frac{\partial s}{\partial m_1}\frac{\partial m_1}{\partial a} \\ \frac{\partial s}{\partial b} &= \frac{\partial s}{\partial m_1}\frac{\partial m_1}{\partial b} \\ \frac{\partial s}{\partial c} &= \frac{\partial s}{\partial m_2}\frac{\partial m_2}{\partial c} \\ \frac{\partial s}{\partial d} &= \frac{\partial s}{\partial m_2}\frac{\partial m_2}{\partial d} \\ \\ \frac{\partial s}{\partial x_1} &= \frac{\partial s}{\partial a}\frac{\partial a}{\partial x_1} + \frac{\partial s}{\partial b}\frac{\partial b}{\partial x_1} + \frac{\partial s}{\partial c}\frac{\partial c}{\partial x_1} \\ \frac{\partial s}{\partial x_2} &= \frac{\partial s}{\partial a}\frac{\partial a}{\partial x_1} + \frac{\partial s}{\partial c}\frac{\partial c}{\partial x_1} + \frac{\partial s}{\partial d}\frac{\partial d}{\partial x_1} \end{aligned} ∂y1∂s∂y2∂s∂m1∂s∂m2∂s∂a∂s∂b∂s∂c∂s∂d∂s∂x1∂s∂x2∂s=?=?=∂y1∂s∂m1∂y1+∂y2∂s∂m1∂y2=∂y1∂s∂m2∂y1+∂y2∂s∂m2∂y2=∂m1∂s∂a∂m1=∂m1∂s∂b∂m1=∂m2∂s∂c∂m2=∂m2∂s∂d∂m2=∂a∂s∂x1∂a+∂b∂s∂x1∂b+∂c∂s∂x1∂c=∂a∂s∂x1∂a+∂c∂s∂x1∂c+∂d∂s∂x1∂d
当 s = y 1 s=y_1 s=y1时:
∂ s ∂ y 1 = 1 ∂ s ∂ y 2 = 0 ∂ s ∂ m 1 = ∂ s ∂ y 1 ∂ y 1 ∂ m 1 + ∂ s ∂ y 2 ∂ y 2 ∂ m 1 = 1 ∂ s ∂ m 2 = ∂ s ∂ y 1 ∂ y 1 ∂ m 2 + ∂ s ∂ y 2 ∂ y 2 ∂ m 2 = 1 ∂ s ∂ a = ∂ s ∂ m 1 ∂ m 1 ∂ a = 1 ∂ s ∂ b = ∂ s ∂ m 1 ∂ m 1 ∂ b = 1 ∂ s ∂ c = ∂ s ∂ m 2 ∂ m 2 ∂ c = 1 ∂ s ∂ d = ∂ s ∂ m 2 ∂ m 2 ∂ d = 1 ∂ s ∂ x 1 = ∂ s ∂ a ∂ a ∂ x 1 + ∂ s ∂ b ∂ b ∂ x 1 + ∂ s ∂ c ∂ c ∂ x 1 = 1 ⋅ x 2 + 1 ⋅ cos ( x 1 ) + 1 ⋅ 4 = x 2 + cos ( x 1 ) + 4 ∂ s ∂ x 2 = ∂ s ∂ a ∂ a ∂ x 2 + ∂ s ∂ c ∂ c ∂ x 2 + ∂ s ∂ d ∂ d ∂ x 2 = 1 ⋅ x 1 + 1 ⋅ 2 + 1 ⋅ ( − sin ( x 2 ) ) = x 1 + 2 − sin ( x 2 ) \begin{aligned} \frac{\partial s}{\partial y_1} &= 1 \\ \frac{\partial s}{\partial y_2} &= 0 \\ \\ \frac{\partial s}{\partial m_1} &= \frac{\partial s}{\partial y_1} \frac{\partial y_1}{\partial m_1} + \frac{\partial s}{\partial y_2} \frac{\partial y_2}{\partial m_1} = 1 \\ \frac{\partial s}{\partial m_2} &= \frac{\partial s}{\partial y_1} \frac{\partial y_1}{\partial m_2} + \frac{\partial s}{\partial y_2} \frac{\partial y_2}{\partial m_2} = 1 \\ \\ \frac{\partial s}{\partial a} &= \frac{\partial s}{\partial m_1}\frac{\partial m_1}{\partial a} = 1 \\ \frac{\partial s}{\partial b} &= \frac{\partial s}{\partial m_1}\frac{\partial m_1}{\partial b} = 1 \\ \frac{\partial s}{\partial c} &= \frac{\partial s}{\partial m_2}\frac{\partial m_2}{\partial c} = 1 \\ \frac{\partial s}{\partial d} &= \frac{\partial s}{\partial m_2}\frac{\partial m_2}{\partial d} = 1 \\ \\ \frac{\partial s}{\partial x_1} &= \frac{\partial s}{\partial a}\frac{\partial a}{\partial x_1} + \frac{\partial s}{\partial b}\frac{\partial b}{\partial x_1} + \frac{\partial s}{\partial c}\frac{\partial c}{\partial x_1} = 1 \cdot x_2 + 1 \cdot \cos(x_1) + 1 \cdot 4 = x_2 + \cos(x_1) + 4 \\ \frac{\partial s}{\partial x_2} &= \frac{\partial s}{\partial a}\frac{\partial a}{\partial x_2} + \frac{\partial s}{\partial c}\frac{\partial c}{\partial x_2} + \frac{\partial s}{\partial d}\frac{\partial d}{\partial x_2} = 1 \cdot x_1 + 1 \cdot 2 + 1 \cdot (-\sin(x_2)) = x_1 + 2 -\sin(x_2) \end{aligned} ∂y1∂s∂y2∂s∂m1∂s∂m2∂s∂a∂s∂b∂s∂c∂s∂d∂s∂x1∂s∂x2∂s=1=0=∂y1∂s∂m1∂y1+∂y2∂s∂m1∂y2=1=∂y1∂s∂m2∂y1+∂y2∂s∂m2∂y2=1=∂m1∂s∂a∂m1=1=∂m1∂s∂b∂m1=1=∂m2∂s∂c∂m2=1=∂m2∂s∂d∂m2=1=∂a∂s∂x1∂a+∂b∂s∂x1∂b+∂c∂s∂x1∂c=1⋅x2+1⋅cos(x1)+1⋅4=x2+cos(x1)+4=∂a∂s∂x2∂a+∂c∂s∂x2∂c+∂d∂s∂x2∂d=1⋅x1+1⋅2+1⋅(−sin(x2))=x1+2−sin(x2)
同理可以计算当 s = y 2 s=y_2 s=y2时。
可以推断:
- 当有 n n n个输出变量时(本例中有2个),需要计算 n n n次上述公式。
- 假设神经网络中的输入是一张1280 x 720的图片,输出是51个浮点数,那么反向微分方法则需要计算51次。