【关于机器学习里的Automatic differentiation】

九头鸟艾云

于 2024-01-13 10:41:23 发布

阅读量803

点赞数 15

文章标签：机器学习人工智能

本文链接：https://blog.csdn.net/weixin_43047969/article/details/135552439

版权

本文介绍了自动微分（AutomaticDifferentiation，AD）的概念，它利用计算机程序执行的基本算术和函数操作，通过链式法则自动计算高精度的偏导数。文章对比了自动微分与符号微分和数值微分的区别，强调其高效性和准确性。特别提到前向累加和反向累加两种模式在计算偏导数的应用，并指出反向传播在机器学习中广泛使用于神经网络的梯度计算。

摘要由CSDN通过智能技术生成

Automatic Differentiation

引用自大百科全书 wikipedia

https://en.wikipedia.org/wiki/Automatic_differentiation

原理说明

在这里插入图片描述
Automatic differentiation exploits the fact that every computer calculation, no matter how complicated, executes a sequence of elementary arithmetic operations (addition, subtraction, multiplication, division, etc.) and elementary functions (exp, log, sin, cos, etc.). By applying the chain rule repeatedly to these operations, partial derivatives of arbitrary order can be computed automatically, accurately to working precision, and using at most a small constant factor of more arithmetic operations than the original program.

Automatic differentiation is distinct from symbolic differentiation and numerical differentiation. Symbolic differentiation faces the difficulty of converting a computer program into a single mathematical expression and can lead to inefficient code. Numerical differentiation (the method of finite differences) can introduce round-off errors in the discretization process and cancellation. Both of these classical methods have problems with calculating higher derivatives, where complexity and errors increase. Finally, both of these classical methods are slow at computing partial derivatives of a function with respect to many inputs, as is needed for gradient-based optimization algorithms. Automatic differentiation solves all of these problems.

AD(Automatic dfferentiation)算法的独特之处：它计算的是准确的微分值。
例如：计算
$y=f(x_1,x_2)$
$x_1x_2+sinx_1$
$=\omega_1\omega_2+sin\omega_1$
$=\omega_3+\omega_4$
$=\omega_5$
前向计算图如下所示：
在这里插入图片描述

从图中可以看出，要计算y相对于 $x_2$ 的偏导数时，对于每一个确定的 $x_1,x_2$ 的值，比如$x_1=2,x_2=3时，seeds为 $(0, 1)$ 按上图从底向上进行计算，直到将顶层根节点的值计算出来，要计算对 $x_1$ 的在同样的 $x_1=2,x_2=3$ 处的偏导数，将seeds设置为 $(1, 0)$ ，重复计算一次上图即可。从图中可以看出，这种计算偏导数的方法，不是一种基于符号的对数学公式的微分运算，也没有像数值微分算法那样，采用差分近似。所以它的计算效率很高，且没有的数值微分因采用差分导致的近似误差。

反向计算图如下：
在这里插入图片描述

在这里插入图片描述
反向传播算法对于输出的维度远小于输入的维度时，效率很高。这恰好是机器学习算法中常见的情况。所以反射传播算法在机器学习中得到广泛应用。
注意，由于描述神经网络的计算图中，用到的计算比较简单，如加、减、乘、除，用到的函数也经常是sin、cos、指数函数、对数函数等，它们的微分函数形式也比较好求到，所以当给定神经网络结构后，可以容易地得到计算偏微分的计算图

以下是它的一般原理：
For the simple composition：
$y=f(g(h(x)))=f(g(h(\omega_0)))=f(g(\omega_1))=f(\omega_2)=\omega_3$
$\omega_0=x$
$\omega_1=h(\omega_0)$
$\omega_2=h(\omega_1)$
$\omega_3=h(\omega_2)=y$
the chain rule gives
$\frac{\partial y}{\partial x}=\frac{\partial y}{\partial \omega_2}\frac{\partial \omega_2}{\partial \omega_1}\frac{\partial \omega_1}{\partial x}=\frac{\partial f(\omega_2)}{\partial \omega_2}\frac{\partial g(\omega_1)}{\partial \omega}\frac{\partial h(\omega_0)}{\partial x}$

Two types of automatic differentiation
Usually, two distinct modes of automatic differentiation are presented.

--------(1) forward accumulation (also called bottom-up, forward mode, or tangent mode)
--------(2) reverse accumulation (also called top-down, reverse mode, or adjoint mode)

（1）前向累加 forward accumulation

在这里插入图片描述
算法：先固定要求微分的独立变量，然后递归地计算每个子表达式，如果手工在纸上计算，便如下式所示，重复地按照链式规则取代内部函数：
$\frac{\partial y}{\partial x}=\frac{\partial y}{\partial \omega_{n-1}}\frac{\partial \omega_{n-1}}{\partial x}$
$=\frac{\partial y}{\partial \omega_{n-1}}(\frac{\partial \omega_{n-1}}{\partial \omega_{n-2}}\frac{\partial \omega_{n-2}}{\partial x})$
$=\frac{\partial y}{\partial \omega_{n-1}}(\frac{\partial \omega_{n-1}}{\partial \omega_{n-2}}(\frac{\partial \omega_{n-2}}{\partial \omega_{n-3}}\frac{\partial \omega_{n-3}}{\partial x}))$
$= ...$

（2）反向累加 reverse accumulation

在这里插入图片描述

$\frac{\partial y}{\partial x}=\frac{\partial y}{\partial \omega_{1}}\frac{\partial \omega_{1}}{\partial x}$
$=(\frac{\partial y}{\partial \omega_{2}}\frac{\partial \omega_{2}}{\partial \omega_{1}})\frac{\partial \omega_{1}}{\partial x}$
$=((\frac{\partial y}{\partial \omega_{3}}\frac{\partial \omega_{3}}{\partial \omega_{2}})\frac{\partial \omega_{2}}{\partial \omega_{1}})\frac{\partial \omega_{1}}{\partial x}$
$= ...$

九头鸟艾云

关注

15
点赞
踩
15

收藏

觉得还不错? 一键收藏
1
评论
【关于机器学习里的Automatic differentiation】

由于描述神经网络的计算图中，用到的计算比较简单，如加、减、乘、除，用到的函数也经常是sin、cos、指数函数、对数函数等，它们的微分函数形式也比较好求到，所以当给定神经网络结构后，可以容易地得到计算偏微分的计算图。从图中可以看出，这种计算偏导数的方法，不是一种基于符号的对数学公式的微分运算，也没有像数值微分算法那样，采用差分近似。所以它的计算效率很高，且没有的数值微分因采用差分导致的近似误差。按上图从底向上进行计算，直到将顶层根节点的值计算出来，要计算对。的偏导数时，对于每一个确定的。
复制链接

扫一扫