【关于机器学习里的Automatic differentiation】

本文介绍了自动微分(AutomaticDifferentiation,AD)的概念,它利用计算机程序执行的基本算术和函数操作,通过链式法则自动计算高精度的偏导数。文章对比了自动微分与符号微分和数值微分的区别,强调其高效性和准确性。特别提到前向累加和反向累加两种模式在计算偏导数的应用,并指出反向传播在机器学习中广泛使用于神经网络的梯度计算。
摘要由CSDN通过智能技术生成

Automatic Differentiation

引用自大百科全书 wikipedia

https://en.wikipedia.org/wiki/Automatic_differentiation

原理说明

在这里插入图片描述
Automatic differentiation exploits the fact that every computer calculation, no matter how complicated, executes a sequence of elementary arithmetic operations (addition, subtraction, multiplication, division, etc.) and elementary functions (exp, log, sin, cos, etc.). By applying the chain rule repeatedly to these operations, partial derivatives of arbitrary order can be computed automatically, accurately to working precision, and using at most a small constant factor of more arithmetic operations than the original program.

Automatic differentiation is distinct from symbolic differentiation and numerical differentiation. Symbolic differentiation faces the difficulty of converting a computer program into a single mathematical expression and can lead to inefficient code. Numerical differentiation (the method of finite differences) can introduce round-off errors in the discretization process and cancellation. Both of these classical methods have problems with calculating higher derivatives, where complexity and errors increase. Finally, both of these classical methods are slow at computing partial derivatives of a function with respect to many inputs, as is needed for gradient-based optimization algorithms. Automatic differentiation solves all of these problems.

AD(Automatic dfferentiation)算法的独特之处:它计算的是准确的微分值。
例如:计算
y = f ( x 1 , x 2 ) y=f(x_1,x_2) y=f(x1,x2)
= x 1 x 2 + s i n x 1 =x_1x_2+sinx_1 =x1x2+sinx1
= ω 1 ω 2 + s i n ω 1 =\omega_1\omega_2+sin\omega_1 =ω1ω2+sinω1
= ω 3 + ω 4 =\omega_3+\omega_4 =ω3+ω4
= ω 5 =\omega_5 =ω5
前向计算图如下所示:
在这里插入图片描述
在这里插入图片描述在这里插入图片描述

从图中可以看出,要计算y相对于 x 2 x_2 x2的偏导数时,对于每一个确定的 x 1 , x 2 x_1,x_2 x1,x2的值,比如$x_1=2,x_2=3时,seeds为 ( 0 , 1 ) (0,1) (0,1) 按上图从底向上进行计算,直到将顶层根节点的值计算出来,要计算对 x 1 x_1 x1的在同样的 x 1 = 2 , x 2 = 3 x_1=2,x_2=3 x1=2,x2=3处的偏导数,将seeds设置为 ( 1 , 0 ) (1,0) (1,0),重复计算一次上图即可。从图中可以看出,这种计算偏导数的方法,不是一种基于符号的对数学公式的微分运算,也没有像数值微分算法那样,采用差分近似。所以它的计算效率很高,且没有的数值微分因采用差分导致的近似误差。

反向计算图如下:
在这里插入图片描述在这里插入图片描述

在这里插入图片描述
反向传播算法对于输出的维度远小于输入的维度时,效率很高。这恰好是机器学习算法中常见的情况。所以反射传播算法在机器学习中得到广泛应用。
注意,由于描述神经网络的计算图中,用到的计算比较简单,如加、减、乘、除,用到的函数也经常是sin、cos、指数函数、对数函数等,它们的微分函数形式也比较好求到,所以当给定神经网络结构后,可以容易地得到计算偏微分的计算图

以下是它的一般原理:
For the simple composition:
y = f ( g ( h ( x ) ) ) = f ( g ( h ( ω 0 ) ) ) = f ( g ( ω 1 ) ) = f ( ω 2 ) = ω 3 y=f(g(h(x)))=f(g(h(\omega_0)))=f(g(\omega_1))=f(\omega_2)=\omega_3 y=f(g(h(x)))=f(g(h(ω0)))=f(g(ω1))=f(ω2)=ω3
ω 0 = x \omega_0=x ω0=x
ω 1 = h ( ω 0 ) \omega_1=h(\omega_0) ω1=h(ω0)
ω 2 = h ( ω 1 ) \omega_2=h(\omega_1) ω2=h(ω1)
ω 3 = h ( ω 2 ) = y \omega_3=h(\omega_2)=y ω3=h(ω2)=y
the chain rule gives
∂ y ∂ x = ∂ y ∂ ω 2 ∂ ω 2 ∂ ω 1 ∂ ω 1 ∂ x = ∂ f ( ω 2 ) ∂ ω 2 ∂ g ( ω 1 ) ∂ ω ∂ h ( ω 0 ) ∂ x \frac{\partial y}{\partial x}=\frac{\partial y}{\partial \omega_2}\frac{\partial \omega_2}{\partial \omega_1}\frac{\partial \omega_1}{\partial x}=\frac{\partial f(\omega_2)}{\partial \omega_2}\frac{\partial g(\omega_1)}{\partial \omega}\frac{\partial h(\omega_0)}{\partial x} xy=ω2yω1ω2xω1=ω2f(ω2)ωg(ω1)xh(ω0)

Two types of automatic differentiation
Usually, two distinct modes of automatic differentiation are presented.

--------(1) forward accumulation (also called bottom-up, forward mode, or tangent mode)
--------(2) reverse accumulation (also called top-down, reverse mode, or adjoint mode)

(1)前向累加 forward accumulation

在这里插入图片描述
算法:先固定要求微分的独立变量,然后递归地计算每个子表达式,如果手工在纸上计算,便如下式所示,重复地按照链式规则取代内部函数:
∂ y ∂ x = ∂ y ∂ ω n − 1 ∂ ω n − 1 ∂ x \frac{\partial y}{\partial x}=\frac{\partial y}{\partial \omega_{n-1}}\frac{\partial \omega_{n-1}}{\partial x} xy=ωn1yxωn1
= ∂ y ∂ ω n − 1 ( ∂ ω n − 1 ∂ ω n − 2 ∂ ω n − 2 ∂ x ) =\frac{\partial y}{\partial \omega_{n-1}}(\frac{\partial \omega_{n-1}}{\partial \omega_{n-2}}\frac{\partial \omega_{n-2}}{\partial x}) =ωn1y(ωn2ωn1xωn2)
= ∂ y ∂ ω n − 1 ( ∂ ω n − 1 ∂ ω n − 2 ( ∂ ω n − 2 ∂ ω n − 3 ∂ ω n − 3 ∂ x ) ) =\frac{\partial y}{\partial \omega_{n-1}}(\frac{\partial \omega_{n-1}}{\partial \omega_{n-2}}(\frac{\partial \omega_{n-2}}{\partial \omega_{n-3}}\frac{\partial \omega_{n-3}}{\partial x})) =ωn1y(ωn2ωn1(ωn3ωn2xωn3))
= . . . =... =...

(2)反向累加 reverse accumulation

在这里插入图片描述

∂ y ∂ x = ∂ y ∂ ω 1 ∂ ω 1 ∂ x \frac{\partial y}{\partial x}=\frac{\partial y}{\partial \omega_{1}}\frac{\partial \omega_{1}}{\partial x} xy=ω1yxω1
= ( ∂ y ∂ ω 2 ∂ ω 2 ∂ ω 1 ) ∂ ω 1 ∂ x =(\frac{\partial y}{\partial \omega_{2}}\frac{\partial \omega_{2}}{\partial \omega_{1}})\frac{\partial \omega_{1}}{\partial x} =(ω2yω1ω2)xω1
= ( ( ∂ y ∂ ω 3 ∂ ω 3 ∂ ω 2 ) ∂ ω 2 ∂ ω 1 ) ∂ ω 1 ∂ x =((\frac{\partial y}{\partial \omega_{3}}\frac{\partial \omega_{3}}{\partial \omega_{2}})\frac{\partial \omega_{2}}{\partial \omega_{1}})\frac{\partial \omega_{1}}{\partial x} =((ω3yω2ω3)ω1ω2)xω1
= . . . =... =...

  • 15
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值