复合函数的前向微分与反向自动微分计算

复合函数的前向微分与反向自动微分计算

关于

  • 首次发表日期:2024-09-13
  • 参考:
    • https://rufflewind.com/2016-12-30/reverse-mode-automatic-differentiation
    • Calculus Early Transcendentals 9e - James Stewart (2020)
    • https://en.wikipedia.org/wiki/Automatic_differentiation
  • 水平有限,如有错误,请不吝指出

前向与反向自动微分:数学

先复习一下微积分求导法则

微积分求导法则复习

乘法法则

f ( x ) = u ( x ) × v ( x ) f(x) = u(x) \times v(x) f(x)=u(x)×v(x)

d y d x = d u d x × v + d v d x × u f ′ ( x ) = u ′ v + v ′ u \begin{aligned} \frac{dy}{dx} &= \frac{du}{dx} \times v + \frac{dv}{dx} \times u \\ f'(x) &= u'v + v'u \end{aligned} dxdyf(x)=dxdu×v+dxdv×u=uv+vu

f ( x ) = ( 3 x − 5 ) × ( 4 x + 7 ) u = 3 x − 5 v = 4 x + 7 u ′ = 3 v ′ = 4 f ′ ( x ) = 3 ( 4 x + 7 ) + 4 ( 3 x − 5 ) = 12 x + 21 + 12 x − 20 = 24 x + 1 = 24 x + 1 \begin{aligned} f(x)&=(3 x-5) \times(4 x+7) \\ u&=3 x-5 \quad v=4 x+7 \\ u^{\prime}&=3 \quad v^{\prime}=4 \\ f^{\prime}(x)&=3(4 x+7)+4(3 x-5) \\ &=12 x+21+12 x-20=24 x+1 \\ &=24 x+1 \end{aligned} f(x)uuf(x)=(3x5)×(4x+7)=3x5v=4x+7=3v=4=3(4x+7)+4(3x5)=12x+21+12x20=24x+1=24x+1

除法法则

f ( x ) = u ( x ) v ( x ) f(x) = \frac{u(x)}{v(x)} f(x)=v(x)u(x)

f ′ ( x ) = u ′ v − v ′ u v 2 d y d x = d u d x v − d v d x u v 2 \begin{aligned} f'(x) &= \frac{u'v - v'u}{v^2} \\ \frac{dy}{dx} &= \frac{\frac{du}{dx}v - \frac{dv}{dx}u}{v^2} \end{aligned} f(x)dxdy=v2uvvu=v2dxduvdxdvu

f ( x ) = 3 x − 5 4 x + 7 u = 3 x − 5 v = 4 x + 7 u ′ = 3 v ′ = 4 f ′ ( x ) = 3 ( 4 x + 7 ) − 4 ( 3 x − 5 ) ( 4 x + 7 ) 2 = 12 x + 21 − 12 x + 20 ( 4 x + 7 ) 2 = 41 ( 4 x + 7 ) 2 \begin{aligned} f(x)&=\frac{3 x-5}{4 x+7} \\ u&=3 x-5 \quad v=4 x+7 \\ u^{\prime}&=3 \quad v^{\prime}=4 \\ f^{\prime}(x)&=\frac{3(4 x+7)-4(3 x-5)}{(4 x+7)^2} \\ &=\frac{12 x+21-12 x+20}{(4 x+7)^2} \\ &=\frac{41}{(4 x+7)^2} \end{aligned} f(x)uuf(x)=4x+73x5=3x5v=4x+7=3v=4=(4x+7)23(4x+7)4(3x5)=(4x+7)212x+2112x+20=(4x+7)241

cos和sin求导

y = sin ⁡ ( x ) d y d x = cos ⁡ ( x ) \begin{aligned} y &= \sin(x) \\ \frac{dy}{dx} &= \cos(x) \end{aligned} ydxdy=sin(x)=cos(x)

y = cos ⁡ ( x ) d y d x = − sin ⁡ ( x ) \begin{aligned} y = \cos(x) \\ \frac{dy}{dx} = -\sin(x) \end{aligned} y=cos(x)dxdy=sin(x)

链式法则(单变量复合函数)

y = f ( u ) u = f ( x ) y = f(u) \quad u = f(x) y=f(u)u=f(x)

d y d x = d y d u ⋅ d u d x \frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx} dxdy=dudydxdu

y = ( 2 x + 4 ) 3 y = u 3  and  u = 2 x + 4 d y d u = 3 u 2 d u d x = 2 d y d x = 3 u 2 × 2 = 2 × 3 ( 2 x + 4 ) 2 = 6 ( 2 x + 4 ) 2 \begin{aligned} y&=(2 x+4)^3 \\ y&=u^3 \text { and } u=2 x+4 \\ \frac{d y}{d u}&=3 u^2 \quad \frac{d u}{d x}=2 \\ \frac{d y}{d x}&=3 u^2 \times 2=2 \times 3(2 x+4)^2 \\ &=6(2 x+4)^2 \end{aligned} yydudydxdy=(2x+4)3=u3 and u=2x+4=3u2dxdu=2=3u2×2=2×3(2x+4)2=6(2x+4)2

多变量链式法则(Case 1)

z = f ( x , y ) x = g ( t ) y = h ( t ) \begin{aligned} z &= f(x,y) \\ x &= g(t) \\ y &= h(t) \\ \end{aligned} zxy=f(x,y)=g(t)=h(t)

d z d t = ∂ f ∂ x d x d t + ∂ f ∂ y d y d t \frac{d z}{d t}=\frac{\partial f}{\partial x} \frac{d x}{d t}+\frac{\partial f}{\partial y} \frac{d y}{d t} dtdz=xfdtdx+yfdtdy

多变量链式法则(Case 2)

z = f ( x , y ) x = g ( s , t ) y = h ( s , t ) \begin{aligned} z &= f(x,y) \\ x & = g(s,t) \\ y &= h(s,t) \end{aligned} zxy=f(x,y)=g(s,t)=h(s,t)

∂ z ∂ s = ∂ z ∂ x ∂ x ∂ s + ∂ z ∂ y ∂ y ∂ s ∂ z ∂ t = ∂ z ∂ x ∂ x ∂ t + ∂ z ∂ y ∂ y ∂ t \frac{\partial z}{\partial s}=\frac{\partial z}{\partial x} \frac{\partial x}{\partial s}+\frac{\partial z}{\partial y} \frac{\partial y}{\partial s} \quad \frac{\partial z}{\partial t}=\frac{\partial z}{\partial x} \frac{\partial x}{\partial t}+\frac{\partial z}{\partial y} \frac{\partial y}{\partial t} sz=xzsx+yzsytz=xztx+yzty

当计算 ∂ z ∂ s \frac{\partial z}{\partial s} sz时,我们保持(hold) t t t 固定并计算 z z z s s s 的普通导数,即应用多变量链式法则(Case 1)。计算 ∂ z ∂ t \frac{\partial z}{\partial t} tz时同理。

多变量链式法则(广义版)

u = f ( x 1 , x 2 , … , x n ) x k = g ( t 1 , t 2 , … , t m ) for  1 ≤ k ≤ n \begin{aligned} u &= f(x_1, x_2, \ldots, x_n) \\ x_k &= g(t_1, t_2, \ldots, t_m) \qquad \text{for } 1 \leq k \leq n \end{aligned} uxk=f(x1,x2,,xn)=g(t1,t2,,tm)for 1kn

∂ u ∂ t i = ∂ u ∂ x 1 ∂ x 1 ∂ t i + ∂ u ∂ x 2 ∂ x 2 ∂ t i + ⋯ + ∂ u ∂ x n ∂ x n ∂ t i for  1 ≤ i ≤ m \begin{aligned} &\frac{\partial u}{\partial t_i}=\frac{\partial u}{\partial x_1} \frac{\partial x_1}{\partial t_i}+\frac{\partial u}{\partial x_2} \frac{\partial x_2}{\partial t_i}+\cdots+\frac{\partial u}{\partial x_n} \frac{\partial x_n}{\partial t_i} \end{aligned} \qquad \text{for } 1 \leq i \leq m tiu=x1utix1+x2utix2++xnutixnfor 1im

复合函数,偏微分,链式法则,前向和反向自动微分

前向与反向的计算顺序

对于组合函数:

y = f ( g ( h ( x ) ) ) = f ( g ( h ( w 0 ) ) ) = f ( g ( w 1 ) ) = f ( w 2 ) = w 3 w 0 = x w 1 = h ( w 0 ) w 2 = g ( w 1 ) w 3 = f ( w 2 ) = y \begin{aligned} y & =f(g(h(x)))=f\left(g\left(h\left(w_0\right)\right)\right)=f\left(g\left(w_1\right)\right)=f\left(w_2\right)=w_3 \\ w_0 & =x \\ w_1 & =h\left(w_0\right) \\ w_2 & =g\left(w_1\right) \\ w_3 & =f\left(w_2\right)=y \end{aligned} yw0w1w2w3=f(g(h(x)))=f(g(h(w0)))=f(g(w1))=f(w2)=w3=x=h(w0)=g(w1)=f(w2)=y

链式法则将给出:

∂ y ∂ x = ∂ y ∂ w 2 ∂ w 2 ∂ w 1 ∂ w 1 ∂ x = ∂ f ( w 2 ) ∂ w 2 ∂ g ( w 1 ) ∂ w 1 ∂ h ( w 0 ) ∂ x \begin{aligned} \frac{\partial y}{\partial x}&=\frac{\partial y}{\partial w_2} \frac{\partial w_2}{\partial w_1} \frac{\partial w_1}{\partial x}=\frac{\partial f\left(w_2\right)}{\partial w_2} \frac{\partial g\left(w_1\right)}{\partial w_1} \frac{\partial h\left(w_0\right)}{\partial x} \end{aligned} xy=w2yw1w2xw1=w2f(w2)w1g(w1)xh(w0)

计算顺序:

  • 前向微分计算时 ,先计算 ∂ w 1 / ∂ x \partial w_1 / \partial x w1/x,然后计算 ∂ w 2 / ∂ w 1 \partial w_2/\partial w_1 w2/w1,最后计算 ∂ y / ∂ w 2 \partial y / \partial w_2 y/w2
  • 反向微分计算时,先计算 ∂ y / ∂ w 2 \partial y / \partial w_2 y/w2,然后计算 ∂ w 2 / ∂ w 1 \partial w_2/\partial w_1 w2/w1,最后计算 ∂ w 1 / ∂ x \partial w_1 / \partial x w1/x
前向微分

对于组合函数:

r = ? s = ? t = ? x = g ( r , s , t ) y = h ( r , s , t ) z = i ( r , s , t ) u = f ( x , y , z ) \begin{aligned} r &= ? \\ s &= ? \\ t &= ? \\ x &= g(r,s,t) \\ y & = h(r,s,t) \\ z &= i(r,s,t) \\ u &= f(x,y,z) \end{aligned} rstxyzu=?=?=?=g(r,s,t)=h(r,s,t)=i(r,s,t)=f(x,y,z)

前向微分计算:

∂ r ∂ v = ? ∂ s ∂ v = ? ∂ t ∂ v = ? ∂ x ∂ v = ∂ x ∂ r ∂ r ∂ v + ∂ x ∂ s ∂ s ∂ v + ∂ x ∂ t ∂ t ∂ v ∂ y ∂ v = ∂ y ∂ r ∂ r ∂ v + ∂ y ∂ s ∂ s ∂ v + ∂ y ∂ t ∂ t ∂ v ∂ z ∂ v = ∂ z ∂ r ∂ r ∂ v + ∂ z ∂ s ∂ s ∂ v + ∂ z ∂ t ∂ t ∂ v ∂ u ∂ v = ∂ u ∂ x ∂ x ∂ v + ∂ u ∂ y ∂ y ∂ v + ∂ u ∂ z ∂ z ∂ v \begin{aligned} \frac{\partial r}{\partial v} &= ? \\ \frac{\partial s}{\partial v} &= ? \\ \frac{\partial t}{\partial v} &= ? \\ \\ \frac{\partial x}{\partial v} &= \frac{\partial x}{\partial r}\frac{\partial r}{\partial v} + \frac{\partial x}{\partial s}\frac{\partial s}{\partial v} + \frac{\partial x}{\partial t}\frac{\partial t}{\partial v} \\ \frac{\partial y}{\partial v} &= \frac{\partial y}{\partial r}\frac{\partial r}{\partial v} + \frac{\partial y}{\partial s}\frac{\partial s}{\partial v} + \frac{\partial y}{\partial t}\frac{\partial t}{\partial v} \\ \frac{\partial z}{\partial v} &= \frac{\partial z}{\partial r}\frac{\partial r}{\partial v} + \frac{\partial z}{\partial s}\frac{\partial s}{\partial v} + \frac{\partial z}{\partial t}\frac{\partial t}{\partial v} \\ \\ \frac{\partial u}{\partial v}&=\frac{\partial u}{\partial x} \frac{\partial x}{\partial v}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial v}+\frac{\partial u}{\partial z} \frac{\partial z}{\partial v} \end{aligned} vrvsvtvxvyvzvu=?=?=?=rxvr+sxvs+txvt=ryvr+syvs+tyvt=rzvr+szvs+tzvt=xuvx+yuvy+zuvz

v = r v=r v=r,即将 r r r作为独立变量并将 s s s t t t固定时,可得

∂ r ∂ v = 1 ∂ s ∂ v = 0 ∂ t ∂ v = 0 ∂ u ∂ r = ∂ u ∂ x ∂ x ∂ r + ∂ u ∂ y ∂ y ∂ r + ∂ u ∂ z ∂ z ∂ r \begin{aligned} \frac{\partial r}{\partial v} &= 1 \\ \frac{\partial s}{\partial v} &= 0 \\ \frac{\partial t}{\partial v} &= 0 \\ \frac{\partial u}{\partial r}&=\frac{\partial u}{\partial x} \frac{\partial x}{\partial r}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial r}+\frac{\partial u}{\partial z} \frac{\partial z}{\partial r} \end{aligned} vrvsvtru=1=0=0=xurx+yury+zurz

v = s v=s v=s,即将 s s s作为独立变量并将 r r r t t t固定时,可得

∂ r ∂ v = 0 ∂ s ∂ v = 1 ∂ t ∂ v = 0 ∂ u ∂ s = ∂ u ∂ x ∂ x ∂ s + ∂ u ∂ y ∂ y ∂ s + ∂ u ∂ z ∂ z ∂ s \begin{aligned} \frac{\partial r}{\partial v} &= 0 \\ \frac{\partial s}{\partial v} &= 1 \\ \frac{\partial t}{\partial v} &= 0 \\ \frac{\partial u}{\partial s}&=\frac{\partial u}{\partial x} \frac{\partial x}{\partial s}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial s}+\frac{\partial u}{\partial z} \frac{\partial z}{\partial s} \end{aligned} vrvsvtsu=0=1=0=xusx+yusy+zusz

v = t v=t v=t,即将 t t t作为独立变量并将 s s s r r r固定时,可得

∂ r ∂ v = 0 ∂ s ∂ v = 0 ∂ t ∂ v = 1 ∂ u ∂ t = ∂ u ∂ x ∂ x ∂ t + ∂ u ∂ y ∂ y ∂ t + ∂ u ∂ z ∂ z ∂ t \begin{aligned} \frac{\partial r}{\partial v} &= 0 \\ \frac{\partial s}{\partial v} &= 0 \\ \frac{\partial t}{\partial v} &= 1 \\ \frac{\partial u}{\partial t}&=\frac{\partial u}{\partial x} \frac{\partial x}{\partial t}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial t}+\frac{\partial u}{\partial z} \frac{\partial z}{\partial t} \end{aligned} vrvsvttu=0=0=1=xutx+yuty+zutz

反向微分

对于组合函数:

u 1 = r ( x 1 , x 2 ) u 2 = s ( x 1 , x 2 ) y 1 = f ( u 1 , u 2 ) y 2 = g ( u 1 , u 2 ) y 3 = h ( u 1 , u 2 ) \begin{aligned} u_1 &= r(x_1, x_2) \\ u_2 &= s(x_1, x_2) \\ y_1 &= f(u_1, u_2) \\ y_2 &= g(u_1, u_2) \\ y_3 &= h(u_1, u_2) \end{aligned} u1u2y1y2y3=r(x1,x2)=s(x1,x2)=f(u1,u2)=g(u1,u2)=h(u1,u2)

反向微分计算:

∂ s ∂ y 1 = ? ∂ s ∂ y 2 = ? ∂ s ∂ y 3 = ? ∂ s ∂ u 1 = ∂ s ∂ y 1 ∂ y 1 ∂ u 1 + ∂ s ∂ y 2 ∂ y 2 ∂ u 1 + ∂ s ∂ y 3 ∂ y 3 ∂ u 1 ∂ s ∂ u 2 = ∂ s ∂ y 1 ∂ y 1 ∂ u 2 + ∂ s ∂ y 2 ∂ y 2 ∂ u 2 + ∂ s ∂ y 3 ∂ y 3 ∂ u 2 ∂ s ∂ x 1 = ∂ s ∂ u 1 ∂ u 1 ∂ x 1 + ∂ s ∂ u 2 ∂ u 2 ∂ x 1 ∂ s ∂ x 2 = ∂ s ∂ u 1 ∂ u 1 ∂ x x + ∂ s ∂ u 2 ∂ u 2 ∂ x x \begin{aligned} \frac{\partial s}{\partial y_1} &= ? \\ \frac{\partial s}{\partial y_2} &= ? \\ \frac{\partial s}{\partial y_3} &= ? \\ \\ \frac{\partial s}{\partial u_1} &= \frac{\partial s}{\partial y_1}\frac{\partial y_1}{\partial u_1} + \frac{\partial s}{\partial y_2}\frac{\partial y_2}{\partial u_1} + \frac{\partial s}{\partial y_3}\frac{\partial y_3}{\partial u_1} \\ \frac{\partial s}{\partial u_2} &= \frac{\partial s}{\partial y_1}\frac{\partial y_1}{\partial u_2} + \frac{\partial s}{\partial y_2}\frac{\partial y_2}{\partial u_2} + \frac{\partial s}{\partial y_3}\frac{\partial y_3}{\partial u_2} \\ \\ \frac{\partial s}{\partial x_1} &= \frac{\partial s}{\partial u_1}\frac{\partial u_1}{\partial x_1} + \frac{\partial s}{\partial u_2}\frac{\partial u_2}{\partial x_1} \\ \frac{\partial s}{\partial x_2} &= \frac{\partial s}{\partial u_1}\frac{\partial u_1}{\partial x_x} + \frac{\partial s}{\partial u_2}\frac{\partial u_2}{\partial x_x} \end{aligned} y1sy2sy3su1su2sx1sx2s=?=?=?=y1su1y1+y2su1y2+y3su1y3=y1su2y1+y2su2y2+y3su2y3=u1sx1u1+u2sx1u2=u1sxxu1+u2sxxu2

可以想象有一个函数 s = f u n c t i o n ( y 1 , y 2 , y 3 ) s=function(y_1,y_2,y_3) s=function(y1,y2,y3)

s = y 1 s=y_1 s=y1,即将 y 1 y_1 y1作为独立变量并将 y 2 y_2 y2 y 3 y_3 y3固定时,可得

∂ s ∂ y 1 = 1 ∂ s ∂ y 2 = 0 ∂ s ∂ y 3 = 0 ∂ s ∂ u 1 = ∂ s ∂ y 1 ∂ y 1 ∂ u 1 ∂ s ∂ u 2 = ∂ s ∂ y 1 ∂ y 1 ∂ u 2 ∂ s ∂ x 1 = ∂ s ∂ u 1 ∂ u 1 ∂ x 1 + ∂ s ∂ u 2 ∂ u 2 ∂ x 1 ∂ s ∂ x 2 = ∂ s ∂ u 1 ∂ u 1 ∂ x x + ∂ s ∂ u 2 ∂ u 2 ∂ x x \begin{aligned} \frac{\partial s}{\partial y_1} &= 1 \\ \frac{\partial s}{\partial y_2} &= 0 \\ \frac{\partial s}{\partial y_3} &= 0 \\ \\ \frac{\partial s}{\partial u_1} &= \frac{\partial s}{\partial y_1}\frac{\partial y_1}{\partial u_1}\\ \frac{\partial s}{\partial u_2} &= \frac{\partial s}{\partial y_1}\frac{\partial y_1}{\partial u_2} \\ \\ \frac{\partial s}{\partial x_1} &= \frac{\partial s}{\partial u_1}\frac{\partial u_1}{\partial x_1} + \frac{\partial s}{\partial u_2}\frac{\partial u_2}{\partial x_1} \\ \frac{\partial s}{\partial x_2} &= \frac{\partial s}{\partial u_1}\frac{\partial u_1}{\partial x_x} + \frac{\partial s}{\partial u_2}\frac{\partial u_2}{\partial x_x} \end{aligned} y1sy2sy3su1su2sx1sx2s=1=0=0=y1su1y1=y1su2y1=u1sx1u1+u2sx1u2=u1sxxu1+u2sxxu2

以例子说明自动微分的计算

例子

假设有2个输入变量( x 1 x_1 x1, x 2 x_2 x2)和2个输出变量( y 1 y_1 y1, y 2 y_2 y2):

m 1 = x 1 ⋅ x 2 + sin ⁡ ( x 1 ) m 2 = 4 x 1 + 2 x 2 + cos ⁡ ( x 2 ) y 1 = m 1 + m 2 y 2 = m 1 ⋅ m 2 (1) \begin{aligned} m_1 &= x_1 \cdot x_2 + \sin(x_1) \\ m_2 &= 4x_1 + 2x_2 + \cos(x_2) \\ y_1 &= m_1 + m_2 \\ y_2 &= m_1 \cdot m_2 \end{aligned} \tag{1} m1m2y1y2=x1x2+sin(x1)=4x1+2x2+cos(x2)=m1+m2=m1m2(1)
即:

y 1 = x 1 ⋅ x 2 + sin ⁡ ( x 1 ) + 4 x 1 + 2 x 2 + cos ⁡ ( x 2 ) y 2 = ( x 1 + x 2 + sin ⁡ ( x 1 ) ) ⋅ ( 4 x 1 + 2 x 2 + cos ⁡ ( x 2 ) ) \begin{aligned} y_1 &= x_1 \cdot x_2 + \sin(x_1) + 4x_1 + 2x_2 + \cos(x_2) \\ y_2 &= (x_1 + x_2 + \sin(x_1)) \cdot (4x_1 + 2x_2 + \cos(x_2)) \end{aligned} y1y2=x1x2+sin(x1)+4x1+2x2+cos(x2)=(x1+x2+sin(x1))(4x1+2x2+cos(x2))

其中:

∂ y 1 ∂ x 1 = x 2 + cos ⁡ ( x 1 ) + 4 ∂ y 1 ∂ x 2 = x 1 + 2 − sin ⁡ ( x 2 ) ∂ y 2 ∂ x 1 = ( x 2 + cos ⁡ ( x 1 ) ) ⋅ m 2 + m 1 ⋅ 4 \begin{aligned} \frac{\partial y_1}{\partial x_1} &= x_2 + \cos(x_1) + 4 \\ \frac{\partial y_1}{\partial x_2} &= x_1 + 2 - \sin(x_2) \\ \frac{\partial y_2}{\partial x_1} &= (x_2 + \cos(x_1)) \cdot m_2 + m_1 \cdot 4 \end{aligned} x1y1x2y1x1y2=x2+cos(x1)+4=x1+2sin(x2)=(x2+cos(x1))m2+m14

接下来,我们将以这个例子说明如何进行前向自动微分和反向自动微分

前向自动微分

我们将用到如下的链式法则:

∂ w ∂ t = ∑ i ( ∂ w ∂ u i ⋅ ∂ u i ∂ t ) = ∂ w ∂ u 1 ⋅ ∂ u 1 ∂ t + ∂ w ∂ u 2 ⋅ ∂ u 2 ∂ t + ⋯ \begin{align} \frac{\partial w}{\partial t} &= \sum_i \left(\frac{\partial w}{\partial u_i} \cdot \frac{\partial u_i}{\partial t}\right) \\ &= \frac{\partial w}{\partial u_1} \cdot \frac{\partial u_1}{\partial t} + \frac{\partial w}{\partial u_2} \cdot \frac{\partial u_2}{\partial t} + \cdots \end{align} tw=i(uiwtui)=u1wtu1+u2wtu2+

其中:

  • w w w表示输出
    • 在例子中,为 y 1 y_1 y1或者 y 2 y_2 y2
  • u i u_i ui表示直接影响 w w w的输入变量
    • 在例子中,为 a a a b b b
  • t t t表示有待给出的输入变量
    • 在例子中,为 x 1 x_1 x1或者 x 2 x_2 x2其中之一

在计算之前,我们先将公式(1)分解为简单的算子计算:

x 1 = ? x 2 = ? a = x 1 ⋅ x 2 b = sin ⁡ ( x 1 ) c = 4 x 1 + 2 x 2 d = cos ⁡ ( x 2 ) m 1 = a + b m 2 = c + d y 1 = m 1 + m 2 y 2 = m 1 ⋅ m 2 (2) \begin{aligned} x_1 &= ? \\ x_2 &= ? \\ \\ a &= x_1 \cdot x_2 \\ b &= \sin(x_1) \\ \\ c &= 4x_1 + 2x_2 \\ d &= \cos(x_2) \\ \\ m_1 &= a + b \\ m_2 &= c + d \\ \\ y_1 &= m_1 + m_2 \\ y_2 &= m_1 \cdot m_2 \end{aligned} \tag{2} x1x2abcdm1m2y1y2=?=?=x1x2=sin(x1)=4x1+2x2=cos(x2)=a+b=c+d=m1+m2=m1m2(2)

现在我们对有待给出的变量 t t t求导:

∂ x 1 ∂ t = ? ∂ x 2 ∂ t = ? ∂ a ∂ t = x 2 ∂ x 1 ∂ t + x 1 ∂ x 2 ∂ t ∂ b ∂ t = cos ⁡ ( x 1 ) ∂ x 1 ∂ t ∂ c ∂ t = 4 ∂ x 1 ∂ t + 2 ∂ x 2 ∂ t ∂ d ∂ t = − sin ⁡ ( x 2 ) ∂ x 2 ∂ t ∂ m 1 ∂ t = ∂ a ∂ t + ∂ b ∂ t ∂ m 2 ∂ t = ∂ c ∂ t + ∂ d ∂ t ∂ y 1 ∂ t = ∂ m 1 ∂ t + ∂ m 2 ∂ t ∂ y 2 ∂ t = ∂ m 1 ∂ t ⋅ m 2 + ∂ m 2 ∂ t ⋅ m 1 \begin{aligned} \frac{\partial x_1}{\partial t} &= ? \\ \frac{\partial x_2}{\partial t} &= ? \\ \\ \frac{\partial a}{\partial t} &= x_2\frac{\partial x_1}{\partial t} + x_1 \frac{\partial x_2}{\partial t} \\ \frac{\partial b}{\partial t} &= \cos(x_1) \frac{\partial x_1}{\partial t} \\ \\ \frac{\partial c}{\partial t} &= 4\frac{\partial x_1}{\partial t} + 2 \frac{\partial x_2}{\partial t} \\ \frac{\partial d}{\partial t} &= -\sin(x_2)\frac{\partial x_2}{\partial t} \\ \\ \frac{\partial m_1}{\partial t} &= \frac{\partial a}{\partial t} + \frac{\partial b}{\partial t} \\ \frac{\partial m_2}{\partial t} &= \frac{\partial c}{\partial t} + \frac{\partial d}{\partial t} \\ \\ \frac{\partial y_1}{\partial t} &= \frac{\partial m_1}{\partial t} + \frac{\partial m_2}{\partial t} \\ \frac{\partial y_2}{\partial t} &= \frac{\partial m_1}{\partial t} \cdot m_2 + \frac{\partial m_2}{\partial t} \cdot m_1 \end{aligned} tx1tx2tatbtctdtm1tm2ty1ty2=?=?=x2tx1+x1tx2=cos(x1)tx1=4tx1+2tx2=sin(x2)tx2=ta+tb=tc+td=tm1+tm2=tm1m2+tm2m1

前面有提到 t t t是有待给出的,现在是时候给出了:

  • t = x 1 t=x_1 t=x1代入以上公式,则 ∂ x 1 ∂ t = 1 \frac{\partial x_1}{\partial t} = 1 tx1=1 ∂ x 2 ∂ t = 0 \frac{\partial x_2}{\partial t}=0 tx2=0,然后可以计算 ∂ y 1 ∂ x 1 \frac{\partial y_1}{\partial x_1} x1y1 ∂ y 2 ∂ x 1 \frac{\partial y_2}{\partial x_1} x1y2

∂ x 1 ∂ t = 1 ∂ x 2 ∂ t = 0 ∂ a ∂ t = x 2 ∂ x 1 ∂ t + x 1 ∂ x 2 ∂ t = x 2 ∂ b ∂ t = cos ⁡ ( x 1 ) ∂ x 1 ∂ t = cos ⁡ ( x 1 ) ∂ c ∂ t = 4 ∂ x 1 ∂ t + 2 ∂ x 2 ∂ t = 4 ∂ d ∂ t = − sin ⁡ ( x 2 ) ∂ x 2 ∂ t = 0 ∂ m 1 ∂ t = ∂ a ∂ t + ∂ b ∂ t = x 2 + cos ⁡ ( x 1 ) ∂ m 2 ∂ t = ∂ c ∂ t + ∂ d ∂ t = 4 ∂ y 1 ∂ t = ∂ m 1 ∂ t + ∂ m 2 ∂ t = x 2 + cos ⁡ ( x 1 ) + 4 ∂ y 2 ∂ t = ∂ m 1 ∂ t ⋅ m 2 + ∂ m 2 ∂ t ⋅ m 1 = ( x 2 + cos ⁡ ( x 1 ) ) ⋅ m 2 + 4 ⋅ m 1 \begin{aligned} \frac{\partial x_1}{\partial t} &= 1 \\ \frac{\partial x_2}{\partial t} &= 0 \\ \\ \frac{\partial a}{\partial t} &= x_2\frac{\partial x_1}{\partial t} + x_1 \frac{\partial x_2}{\partial t} = x_2 \\ \frac{\partial b}{\partial t} &= \cos(x_1) \frac{\partial x_1}{\partial t} = \cos(x_1) \\ \\ \frac{\partial c}{\partial t} &= 4\frac{\partial x_1}{\partial t} + 2 \frac{\partial x_2}{\partial t} = 4 \\ \frac{\partial d}{\partial t} &= -\sin(x_2)\frac{\partial x_2}{\partial t} = 0\\ \\ \frac{\partial m_1}{\partial t} &= \frac{\partial a}{\partial t} + \frac{\partial b}{\partial t} = x_2 + \cos(x_1) \\ \frac{\partial m_2}{\partial t} &= \frac{\partial c}{\partial t} + \frac{\partial d}{\partial t} = 4 \\ \\ \frac{\partial y_1}{\partial t} &= \frac{\partial m_1}{\partial t} + \frac{\partial m_2}{\partial t} = x_2 + \cos(x_1) + 4 \\ \frac{\partial y_2}{\partial t} &= \frac{\partial m_1}{\partial t} \cdot m_2 + \frac{\partial m_2}{\partial t} \cdot m_1 = (x_2 + \cos(x_1)) \cdot m_2 + 4 \cdot m_1 \end{aligned} tx1tx2tatbtctdtm1tm2ty1ty2=1=0=x2tx1+x1tx2=x2=cos(x1)tx1=cos(x1)=4tx1+2tx2=4=sin(x2)tx2=0=ta+tb=x2+cos(x1)=tc+td=4=tm1+tm2=x2+cos(x1)+4=tm1m2+tm2m1=(x2+cos(x1))m2+4m1

  • t = x 2 t=x_2 t=x2代入以上公式,则 ∂ x 1 ∂ t = 0 \frac{\partial x_1}{\partial t} = 0 tx1=0 ∂ x 2 ∂ t = 1 \frac{\partial x_2}{\partial t}=1 tx2=1,然后可以计算 ∂ y 1 ∂ x 2 \frac{\partial y_1}{\partial x_2} x2y1 ∂ y 2 ∂ x 2 \frac{\partial y_2}{\partial x_2} x2y2

可以推断:

  • 当有 n n n个输入变量时(本例中有2个),需要计算 n n n次上述公式。
  • 假设神经网络中的输入是一张1280 x 720的图片,输出是51个浮点数,那么前向微分方法则需要计算921600次。

反向自动微分

我们将用到如下的链式法则:

∂ s ∂ u = ∑ i ( ∂ w i ∂ u ⋅ ∂ s ∂ w i ) = ∂ w 1 ∂ u ⋅ ∂ s ∂ w 1 + ∂ w 2 ∂ u ⋅ ∂ s ∂ w 2 + ⋯ \begin{align} \frac{\partial s}{\partial u} &= \sum_i \left(\frac{\partial w_i}{\partial u} \cdot \frac{\partial s}{\partial w_i}\right) \\ &= \frac{\partial w_1}{\partial u} \cdot \frac{\partial s}{\partial w_1} + \frac{\partial w_2}{\partial u} \cdot \frac{\partial s}{\partial w_2} + \cdots \end{align} us=i(uwiwis)=uw1w1s+uw2w2s+

其中:

  • u u u 表示输入变量
  • w i w_i wi 表示依赖 u u u 的输出变量
  • s s s 表示有待给出的变量

回顾拆解后的简单算子计算(2):

x 1 = ? x 2 = ? a = x 1 ⋅ x 2 b = sin ⁡ ( x 1 ) c = 4 x 1 + 2 x 2 d = cos ⁡ ( x 2 ) m 1 = a + b m 2 = c + d y 1 = m 1 + m 2 y 2 = m 1 ⋅ m 2 (2) \begin{aligned} x_1 &= ? \\ x_2 &= ? \\ \\ a &= x_1 \cdot x_2 \\ b &= \sin(x_1) \\ \\ c &= 4x_1 + 2x_2 \\ d &= \cos(x_2) \\ \\ m_1 &= a + b \\ m_2 &= c + d \\ \\ y_1 &= m_1 + m_2 \\ y_2 &= m_1 \cdot m_2 \end{aligned} \tag{2} x1x2abcdm1m2y1y2=?=?=x1x2=sin(x1)=4x1+2x2=cos(x2)=a+b=c+d=m1+m2=m1m2(2)

现在计算反向微分:

∂ s ∂ y 1 = ? ∂ s ∂ y 2 = ? ∂ s ∂ m 1 = ∂ s ∂ y 1 ∂ y 1 ∂ m 1 + ∂ s ∂ y 2 ∂ y 2 ∂ m 1 ∂ s ∂ m 2 = ∂ s ∂ y 1 ∂ y 1 ∂ m 2 + ∂ s ∂ y 2 ∂ y 2 ∂ m 2 ∂ s ∂ a = ∂ s ∂ m 1 ∂ m 1 ∂ a ∂ s ∂ b = ∂ s ∂ m 1 ∂ m 1 ∂ b ∂ s ∂ c = ∂ s ∂ m 2 ∂ m 2 ∂ c ∂ s ∂ d = ∂ s ∂ m 2 ∂ m 2 ∂ d ∂ s ∂ x 1 = ∂ s ∂ a ∂ a ∂ x 1 + ∂ s ∂ b ∂ b ∂ x 1 + ∂ s ∂ c ∂ c ∂ x 1 ∂ s ∂ x 2 = ∂ s ∂ a ∂ a ∂ x 1 + ∂ s ∂ c ∂ c ∂ x 1 + ∂ s ∂ d ∂ d ∂ x 1 \begin{aligned} \frac{\partial s}{\partial y_1} &= ? \\ \frac{\partial s}{\partial y_2} &= ? \\ \\ \frac{\partial s}{\partial m_1} &= \frac{\partial s}{\partial y_1} \frac{\partial y_1}{\partial m_1} + \frac{\partial s}{\partial y_2} \frac{\partial y_2}{\partial m_1} \\ \frac{\partial s}{\partial m_2} &= \frac{\partial s}{\partial y_1} \frac{\partial y_1}{\partial m_2} + \frac{\partial s}{\partial y_2} \frac{\partial y_2}{\partial m_2} \\ \\ \frac{\partial s}{\partial a} &= \frac{\partial s}{\partial m_1}\frac{\partial m_1}{\partial a} \\ \frac{\partial s}{\partial b} &= \frac{\partial s}{\partial m_1}\frac{\partial m_1}{\partial b} \\ \frac{\partial s}{\partial c} &= \frac{\partial s}{\partial m_2}\frac{\partial m_2}{\partial c} \\ \frac{\partial s}{\partial d} &= \frac{\partial s}{\partial m_2}\frac{\partial m_2}{\partial d} \\ \\ \frac{\partial s}{\partial x_1} &= \frac{\partial s}{\partial a}\frac{\partial a}{\partial x_1} + \frac{\partial s}{\partial b}\frac{\partial b}{\partial x_1} + \frac{\partial s}{\partial c}\frac{\partial c}{\partial x_1} \\ \frac{\partial s}{\partial x_2} &= \frac{\partial s}{\partial a}\frac{\partial a}{\partial x_1} + \frac{\partial s}{\partial c}\frac{\partial c}{\partial x_1} + \frac{\partial s}{\partial d}\frac{\partial d}{\partial x_1} \end{aligned} y1sy2sm1sm2sasbscsdsx1sx2s=?=?=y1sm1y1+y2sm1y2=y1sm2y1+y2sm2y2=m1sam1=m1sbm1=m2scm2=m2sdm2=asx1a+bsx1b+csx1c=asx1a+csx1c+dsx1d

s = y 1 s=y_1 s=y1时:

∂ s ∂ y 1 = 1 ∂ s ∂ y 2 = 0 ∂ s ∂ m 1 = ∂ s ∂ y 1 ∂ y 1 ∂ m 1 + ∂ s ∂ y 2 ∂ y 2 ∂ m 1 = 1 ∂ s ∂ m 2 = ∂ s ∂ y 1 ∂ y 1 ∂ m 2 + ∂ s ∂ y 2 ∂ y 2 ∂ m 2 = 1 ∂ s ∂ a = ∂ s ∂ m 1 ∂ m 1 ∂ a = 1 ∂ s ∂ b = ∂ s ∂ m 1 ∂ m 1 ∂ b = 1 ∂ s ∂ c = ∂ s ∂ m 2 ∂ m 2 ∂ c = 1 ∂ s ∂ d = ∂ s ∂ m 2 ∂ m 2 ∂ d = 1 ∂ s ∂ x 1 = ∂ s ∂ a ∂ a ∂ x 1 + ∂ s ∂ b ∂ b ∂ x 1 + ∂ s ∂ c ∂ c ∂ x 1 = 1 ⋅ x 2 + 1 ⋅ cos ⁡ ( x 1 ) + 1 ⋅ 4 = x 2 + cos ⁡ ( x 1 ) + 4 ∂ s ∂ x 2 = ∂ s ∂ a ∂ a ∂ x 2 + ∂ s ∂ c ∂ c ∂ x 2 + ∂ s ∂ d ∂ d ∂ x 2 = 1 ⋅ x 1 + 1 ⋅ 2 + 1 ⋅ ( − sin ⁡ ( x 2 ) ) = x 1 + 2 − sin ⁡ ( x 2 ) \begin{aligned} \frac{\partial s}{\partial y_1} &= 1 \\ \frac{\partial s}{\partial y_2} &= 0 \\ \\ \frac{\partial s}{\partial m_1} &= \frac{\partial s}{\partial y_1} \frac{\partial y_1}{\partial m_1} + \frac{\partial s}{\partial y_2} \frac{\partial y_2}{\partial m_1} = 1 \\ \frac{\partial s}{\partial m_2} &= \frac{\partial s}{\partial y_1} \frac{\partial y_1}{\partial m_2} + \frac{\partial s}{\partial y_2} \frac{\partial y_2}{\partial m_2} = 1 \\ \\ \frac{\partial s}{\partial a} &= \frac{\partial s}{\partial m_1}\frac{\partial m_1}{\partial a} = 1 \\ \frac{\partial s}{\partial b} &= \frac{\partial s}{\partial m_1}\frac{\partial m_1}{\partial b} = 1 \\ \frac{\partial s}{\partial c} &= \frac{\partial s}{\partial m_2}\frac{\partial m_2}{\partial c} = 1 \\ \frac{\partial s}{\partial d} &= \frac{\partial s}{\partial m_2}\frac{\partial m_2}{\partial d} = 1 \\ \\ \frac{\partial s}{\partial x_1} &= \frac{\partial s}{\partial a}\frac{\partial a}{\partial x_1} + \frac{\partial s}{\partial b}\frac{\partial b}{\partial x_1} + \frac{\partial s}{\partial c}\frac{\partial c}{\partial x_1} = 1 \cdot x_2 + 1 \cdot \cos(x_1) + 1 \cdot 4 = x_2 + \cos(x_1) + 4 \\ \frac{\partial s}{\partial x_2} &= \frac{\partial s}{\partial a}\frac{\partial a}{\partial x_2} + \frac{\partial s}{\partial c}\frac{\partial c}{\partial x_2} + \frac{\partial s}{\partial d}\frac{\partial d}{\partial x_2} = 1 \cdot x_1 + 1 \cdot 2 + 1 \cdot (-\sin(x_2)) = x_1 + 2 -\sin(x_2) \end{aligned} y1sy2sm1sm2sasbscsdsx1sx2s=1=0=y1sm1y1+y2sm1y2=1=y1sm2y1+y2sm2y2=1=m1sam1=1=m1sbm1=1=m2scm2=1=m2sdm2=1=asx1a+bsx1b+csx1c=1x2+1cos(x1)+14=x2+cos(x1)+4=asx2a+csx2c+dsx2d=1x1+12+1(sin(x2))=x1+2sin(x2)

同理可以计算当 s = y 2 s=y_2 s=y2时。

可以推断:

  • 当有 n n n个输出变量时(本例中有2个),需要计算 n n n次上述公式。
  • 假设神经网络中的输入是一张1280 x 720的图片,输出是51个浮点数,那么反向微分方法则需要计算51次。
  • 16
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值