计算图运算：反向传播算法（BP）

最新推荐文章于 2022-06-12 20:10:57 发布

傅晨明

最新推荐文章于 2022-06-12 20:10:57 发布

阅读量1.8k

点赞数

分类专栏：深度学习及TensorFlow实现

深度学习及TensorFlow实现专栏收录该内容

5 篇文章 0 订阅

订阅专栏

原文地址：http://colah.github.io/posts/2015-08-Backprop/

计算图运算：BP

发布于2015年8月31日

Introduction简介：

Backpropagation is the key algorithm that makes training deep models computationally tractable. For modern neural networks, it can make training with gradient descent as much as ten million times faster, relative to a naive implementation. That’s the difference between a model taking a week to train and taking 200,000 years.

反向传播算法是一种易于处理训练深模型计算的关键算法。对于现代神经网络，相对于一个原始的实现，它可以使梯度下降的训练速度提高一千万倍。这就是一个模型花一个星期训练和200000年的区别。

Backpropagation反向传播(B-P)，可以用来表示一种神经网络算法，例如：B-P网络。
algorithm演算法; 运算法则; 计算程序
tractable听话; 驯良; 易处理的; 驯服的，温顺的
neural神经的; 背的，背侧的
gradient descent梯度下降
as much as差不多; 足
relative to关于…的，和…比较起来

Beyond its use in deep learning, backpropagation is a powerful computational tool in many other areas, ranging from weather forecasting to analyzing numerical stability – it just goes by different names. In fact, the algorithm has been reinvented at least dozens of times in different fields (see Griewank (2010)). The general, application independent, name is “reverse-mode differentiation.”

在应用于深入学习之前，反向传播算法作为一个强大的计算工具已经在其他许多其他领域应用，从天气预报到分析数值稳定性–它只是以不同的名字展示罢了。事实上，该算法已被重复使用至少数十次在不同的领域（见Griewank（2010））。一般的，独立的应用程序，名为“ 反向模式求导”。

ranging from从…排列
weather forecasting天气预报
numerical stability数值的稳定性
reinvented重复发明，彻底改造，重新使用( reinvent的过去式和过去分词 )
dozens of很多

Fundamentally, it’s a technique for calculating derivatives quickly. And it’s an essential trick to have in your bag, not only in deep learning, but in a wide variety of numerical computing situations.

从根本上说，这是一种快速计算导数的技术。这是一个必不可少的技术，不仅是在深入学习，而且可以在各种各样的数值计算环境中使用。

Fundamentally从根本上; 基础地; 根本地
s asales agent 销售代理人，销售代理商
calculating计算的; 慎重的; <贬>精于算计的; 计算(calculate的现在分词)
derivatives导数; 衍生性金融商品; 派生物，引出物( derivative的名词复数 )
trick恶作剧; 戏法，把戏; 计谋，诀窍; 骗局; 哄骗，欺骗; 打扮; 弄虚作假的; 有诀窍的; 欺诈的
have in贮存; 贮备; 邀请…; 招待…
a wide variety of种种，多种多样
numerical数字的，用数字表示的，数值的

Computational Graphs计算图

Computational graphs are a nice way to think about mathematical expressions. For example, consider the expression e=(a+b)∗(b+1). There are three operations: two additions and one multiplication. To help us talk about this, let’s introduce two intermediary variables, c and d so that every function’s output has a variable. We now have:

计算图是思考数学表达式的好方法。例如，考虑表达式e =（a+b）∗（b + 1）。有三个操作：两个加法和一个乘法。为了帮助我们讨论这一点，让我们介绍两个中间变量c 和 d，以便每个函数的输出都有一个变量。我们现在有：

mathematical expressions数学公式（mathematical expression的复数）
additions增加; 加( addition的名词复数 ); 增加的人或事物; 新增产品
multiplication增加，增殖，倍增; 乘法，乘法运算
intermediary中间人的; 调解的; 居间的; 媒介的; 媒介; 中间人; 调解人; 中间阶段

c=a+b
d=b+1
e=c∗d

To create a computational graph, we make each of these operations, along with the input variables, into nodes. When one node’s value is the input to another node, an arrow goes from one to another.

为了创建计算图，我们将这些操作连同输入变量一起放到节点中。当一个节点的值是另一个节点的输入时，箭头从一个指向另一个。

computational计算的
graph图表，曲线图; 词的拼法; 用曲线图表示，把…绘入图表; 用胶版印刷
operations操作( operation的名词复数 ); 行动; <数>运算; 作用
along with连同; 以及; 和…一起[一道]，随着; 除…以外
input输入，投入; 输入电路; <电>输入端; 输入的数据; 把…输入电脑; 输入; 输入，给料
variables可变因素，变数( variable的名词复数 )
nodes节( node的名词复数 ); 节点; 结节; 植物的节
node节点; 植物的节
arrow矢，箭; 箭状物; 箭头记号; 天箭座
goes去; 进行（go的第三人称单数）; 离开; 进展; 轮到的顺序（go的复数形式）

这里写图片描述

These sorts of graphs come up all the time in computer science, especially in talking about functional programs. They are very closely related to the notions of dependency graphs and call graphs. They’re also the core abstraction behind the popular deep learning framework Theano.

We can evaluate the expression by setting the input variables to certain values and computing nodes up through the graph. For example, let’s set a=2 and b=1:

计算图这类图表一直出现在计算机科学中，特别是关于函数程序的讨论中。它们与依赖图和调用图的概念有着非常密切的关系。他们也是现在流行的深层学习框架“Theano”背后的核心概念。
我们可以通过将输入变量设置为特定值并通过图计算节点来求解表达式。例如，我们设置一个a = 2和b = 1：

come up上来; 发生; 提到; 开庭
all the time一直; 向来， 一向; 时时刻刻; 每时每刻
computer science计算机科学
closely related嫡；自己
notions观念; <美>缝纫用的杂货; 概念( notion的名词复数 ); 突然的念头; 意图
dependency属国，属地; 从属，从属物
the core地心末日
abstraction抽象; 抽象化; 抽象概念; 出神
nodes节( node的名词复数 ); 节点; 结节; 植物的节
For example例如， 譬如; 拿 ... 来说
evaluate英 [ɪ'væljʊeɪt]美 [ɪ'væljʊ'et]
vt. 评价；估价；求…的值
vi. 评价；估价

这里写图片描述

这个表达式计算结果为6。

Derivatives on Computational Graphs计算图上的导数

If one wants to understand derivatives in a computational graph, the key is to understand derivatives on the edges. If a directly affects c, then we want to know how it affects c. If a changes a little bit, how does c change? We call this the partial derivative of c with respect to a.

如果想理解计算图中的导数，关键是要理解边上的导数。如果a直接影响c，那么我们想知道它是如何影响c的。如果a有一点变化，c会如何变化？我们称之c为关于a的偏导数。

derivatives导数; 衍生性金融商品; 派生物，引出物( derivative的名词复数 )
computational计算的
graph图表，曲线图; 词的拼法; 用曲线图表示，把…绘入图表; 用胶版印刷
edges边( edge的名词复数 ); 优势; 边缘; 锋利
affects影响( affect的第三人称单数 ); 假装; 感动; 侵袭
want to要; <口>应该
know how懂得如何做; 能; 专门知识; 技术诀窍
a little bit一点点
partial derivative偏导数，偏微商
with respect to关于， 谈到

To evaluate the partial derivatives in this graph, we need the sum rule and the product rule:
为了评价这个图中的偏导数，我们需要求和规则和乘积规则：

evaluate评价; 求…的值; 对…评价; 求…的数值; 评价，估价
partial部分的; 偏爱的; 偏袒的; 钟爱的; 偏微商
derivatives导数; 衍生性金融商品; 派生物，引出物( derivative的名词复数 )
graph图表，曲线图; 词的拼法; 用曲线图表示，把…绘入图表; 用胶版印刷
need需要; 必须; 不得不; 需要的东西; 责任; 贫穷; 有必要
sum rule求和[加法]定则
product rule乘法定则

这里写图片描述

Below, the graph has the derivative on each edge labeled.
下面，图在每个边标签上有导数。

Below贝洛; 在下面，到下面; 低于
graph图表，曲线图; 词的拼法; 用曲线图表示，把…绘入图表; 用胶版印刷
derivative导数，微商; 衍生物，派生物; 派生词; 衍生的; 导出的; 拷贝的
each每; 各自的; 每个; 各自
edge边; 优势; 边缘，端; 锋利，尖锐; 在…上加边界; 使渐进; 给磨边，使

这里写图片描述

What if we want to understand how nodes that aren’t directly connected affect each other? Let’s consider how e is affected by a. If we change a at a speed of 1, c also changes at a speed of 1. In turn, c changing at a speed of 1 causes e to change at a speed of 2. So e changes at a rate of 1∗2 with respect to a.

如果我们想了解那些没有直接连接的节点是如何相互影响的呢？让我们来考虑e是如何受a影响的。如果我们以1的速度改变a，c也会以1的速度变化。反过来，c以1的速度变化，使e以2的速度变化。所以e速度变化相对于a 为1∗2。

What if要是…又怎样
want to要; <口>应该
nodes节( node的名词复数 ); 节点; 结节; 植物的节
each other互相，彼此
affected by受到…的影响
at a speed of以…的速度
In turn依次; 转而; 轮流地; 相应地
change at在换车
at a rate of以…速度
with respect to关于， 谈到

The general rule is to sum over all possible paths from one node to the other, multiplying the derivatives on each edge of the path together. For example, to get the derivative of e with respect to b we get:
一般规则是从一个节点到另一个节点的所有可能路径进行相加，将每条边上的导数相乘。例如，为了得到e相对于b的导数，我们得到：

general rule通则; 通例，常规，普通规则; 公例
sum金额; 总数; 算术; 概略，要点; 归纳; 总计; 总结，概括
over all遍及
paths路线; 小路( path的名词复数 ); 行动计划; 成功的途径
node节点; 植物的节
multiplying乘( multiply的现在分词 ); 相乘; 增加; 繁殖
derivatives导数; 衍生性金融商品; 派生物，引出物( derivative的名词复数 )
For example例如， 譬如; 拿 ... 来说
derivative导数，微商; 衍生物，派生物; 派生词; 衍生的; 导出的; 拷贝的
with respect to关于， 谈到

这里写图片描述

This accounts for how b affects e through c and also how it affects it through d.

This general “sum over paths” rule is just a different way of thinking about the multivariate chain rule.

这就解释了b如何通过c和d影响e。
这种一般的“路径求和”规则只是对多元链规则的一种不同的思考。

accounts帐目; 账( account的名词复数 ); 报告; 描述
affects影响( affect的第三人称单数 ); 假装; 感动; 侵袭
through透过; 经由; 通过，穿过; 凭借; 从头到尾; 彻底; 自始至终; 接通; 通话完毕; 有洞的; 直达的
general普遍的; 大致的; 综合的; 总的，全体的; 一般; 常规; 上将; 一般原则
sum金额; 总数; 算术; 概略，要点; 归纳; 总计; 总结，概括
paths路线; 小路( path的名词复数 ); 行动计划; 成功的途径
rule规则，规定; 统治，支配; 章程; 控制，支配; 判定; 裁定，裁决; 价格稳定; 统治; 规定; 管理; 裁决; 管辖; 裁定
way of thinking心扉
multivariate多变量的，多元的; 多变元
chain rule链式法则，链规则

Factoring Paths因式分解路径

The problem with just “summing over the paths” is that it’s very easy to get a combinatorial explosion in the number of possible paths.
仅仅在“路径数的和”的问题上，可能的路径数很容易会组合爆炸。

combinatorial explosion组合爆炸

这里写图片描述

In the above diagram, there are three paths from X to Y, and a further three paths from Y to Z. If we want to get the derivative ∂Z/∂X by summing over all paths, we need to sum over 3∗3=9 paths:

在上面的图中，有从X到Y的3条路径，还有从Y到 Z3条路径。如果我们想要得到的导数∂Z/∂X通过所有路径，我们需要总共计算3∗3 = 9条路径：
这里写图片描述

The above only has nine paths, but it would be easy to have the number of paths to grow exponentially as the graph becomes more complicated.

Instead of just naively summing over the paths, it would be much better to factor them:

上面只有九条路径，但是随着图形变得更复杂，路径数的随指数级增长是很容易的。
与其简单地路径求和，不如把它们分解因子：

exponentially以指数方式
complicated结构复杂的; 混乱的，麻烦的; 使复杂化( complicate的过去式)
summing求和的，做加法的; 合计( sum的现在分词 ); 总结，归纳; 总计，计算…的总数（常与 up 连用）; 集中（常与 up 连用）
much better更好；好多了；好得多
factor
n.
因素;<数>因子;代理人
vt.
把…因素包括进去;[数学]分解…的因子，将…分解成因子;以代理商（或管家等）的身份行事，做代理商（或管家）
vi.
做代理商（或管家）

这里写图片描述

This is where “forward-mode differentiation” and “reverse-mode differentiation” come in. They’re algorithms for efficiently computing the sum by factoring the paths. Instead of summing over all of the paths explicitly, they compute the same sum more efficiently by merging paths back together at every node. In fact, both algorithms touch each edge exactly once!
这就是“前向模式求导”和“反向模式求导”的出现。这些算法能有效地对因式分解路径进行求和。它们不是显式地对所有路径进行相加，而是在每个节点上合并路径，从而更有效地计算相同的和。事实上，这两种算法都能精确地碰到每一个边！

efficiently 有效地;效率高地
explicitly 明白地，明确地
differentiation区别，分化; 分异; 衍进; 求导数
come in进来; 到达; 当选; 取得
factoring因子分解，因式分解
Instead of代替…， 而不是…， 而不用…
summing求和的，做加法的; 合计( sum的现在分词 ); 总结，归纳; 总计，计算…的总数（常与 up 连用）; 集中（常与 up 连用）
over all遍及
explicitly明白地，明确地
compute计算，估算; 推断; 用计算机计算; 计算
merging合并; 融入; 混合( merge的现在分词 ); 相融; 渐渐消失在某物中
In fact事实上; 实际上，其实; 实则; 说起来

Forward-mode differentiation starts at an input to the graph and moves towards the end. At every node, it sums all the paths feeding in. Each of those paths represents one way in which the input affects that node. By adding them up, we get the total way in which the node is affected by the input, it’s derivative.
前向模式求导开始于对图形的输入并向结尾移动。在每一个节点，它会对每个路径上的输入求和。这些路径中的每一个都表示输入影响该节点的一种方式。通过将它们加起来，我们得到了节点受输入影响的总方法，即它的导数。

affects影响( affect的第三人称单数 ); 假装; 感动; 侵袭
affected by受到…的影响

这里写图片描述

Though you probably didn’t think of it in terms of graphs, forward-mode differentiation is very similar to what you implicitly learned to do if you took an introduction to calculus class.

Reverse-mode differentiation, on the other hand, starts at an output of the graph and moves towards the beginning. At each node, it merges all paths which originated at that node.

虽然你可能不认为它就图形而言，前向模式求导是非常相似的，你含蓄地学习做，如果你介绍微积分课。
反向模式求导，另一方面，从图形的输出开始，向开始移动。在每个节点上，它合并起源于该节点的所有路径。

think of想起; 考虑; 有…想法; 对…有意见
in terms of根据; 用…的话; 就…而言; 以…为单位
differentiation区别，分化; 分异; 衍进; 求导数
similar to跟。。。类似得，与。。。同样的; 如同
implicitly含蓄地; 暗示地; 无疑问地; 无保留地
calculus运算，演算，微积分
on the other hand在另一方面
merges融入; 混合( merge的第三人称单数 ); 相融; 渐渐消失在某物中
originated起源于，来自，产生( originate的过去式和过去分词 ); 创造; 创始; 开创
at that而且，因此

这里写图片描述

Forward-mode differentiation tracks how one input affects every node. Reverse-mode differentiation tracks how every node affects one output. That is, forward-mode differentiation applies the operator ∂/∂X to every node, while reverse mode differentiation applies the operator ∂Z/∂ to every node.
前向模式求导跟踪一个输入如何影响每个节点。反向模式求导跟踪每个节点如何影响一个输出。即前向模式求导应用算子∂/∂X于每一个节点，而反向模式求导应用算子∂Z/∂于每节点。

Computational Victories 计算的成功

At this point, you might wonder why anyone would care about reverse-mode differentiation. It looks like a strange way of doing the same thing as the forward-mode. Is there some advantage?

Let’s consider our original example again:

在这一点上，您可能会想知道为什么有人会关心反向模式求导。它看起来像一个奇怪的方式做与前向模式相同的事情。有什么优势吗？
让我们再考虑一下原来的例子：

At this point就此; 此时此刻
wonder奇妙的; 钦佩的; 远超过预期的; 奇迹; 惊奇; 奇观; 奇人; 对…感到好奇; 感到诧异; 想弄明白; 惊讶; 怀疑，想知道
anyone任何人; 谁; 任何一个
care about关心; 在乎; 关怀; 担忧
differentiation区别，分化; 分异; 衍进; 求导数
It looks like似乎; 看来象…，看上去好象
strange陌生的，生疏的; 奇怪的，古怪的; 疏远的; 外国的; 奇怪地; 陌生地; 冷淡地
doing做; 不能接受; 难以完成; 事件; 干; 做( do的现在分词 ); 学习; 研究
advantage有利条件; 益处; 优越; 处于支配地位; 有利于; 有益于; 促进; 使处于有利地位; 得益，获利
consider考虑; 把看作…，认为如何; 考虑，细想; 认为; 以为; 看重; 仔细考虑; 深思

这里写图片描述

We can use forward-mode differentiation from b up. This gives us the derivative of every node with respect to b.
我们可以使用前向模式求导从b节点开始。这给出了每个节点的导数关于b。

derivative导数，微商; 衍生物，派生物; 派生词; 衍生的; 导出的; 拷贝的
with respect to关于， 谈到

这里写图片描述

We’ve computed ∂e/∂b, the derivative of our output with respect to one of our inputs.

What if we do reverse-mode differentiation from e down? This gives us the derivative of e with respect to every node:
我们计算∂e/∂b，我们输出的导数与一个输入相关。
如果我们从e开始反向模式求导呢？这里给出了关于每个节点的e的导数：

computed计算，估算( compute的过去式和过去分词 )
derivative导数，微商; 衍生物，派生物; 派生词; 衍生的; 导出的; 拷贝的
output输出; 产量; 作品; 输出信号
with respect to关于， 谈到
one of其中之一
inputs投入; 输入( input的名词复数 ); <电>输入端; 输入的数据; 把…输入电脑( input的第三人称单数 )
What if要是…又怎样
differentiation区别，分化; 分异; 衍进; 求导数
respect尊重; 尊敬; 关心; 遵守; 敬意; 尊重，恭敬; 某方面
node节点; 植物的节

这里写图片描述

When I say that reverse-mode differentiation gives us the derivative of e with respect to every node, I really do mean every node. We get both ∂e/∂a and ∂e/∂b, the derivatives of e with respect to both inputs. Forward-mode differentiation gave us the derivative of our output with respect to a single input, but reverse-mode differentiation gives us all of them.
当我说反向模式求导给出关于每个节点的e的导数时，我真的指的是每个节点。我们得到 ∂e/∂a和∂e/∂b，关于输入的e的导数。前向模式求导给出了我们关于单个输入的输出的导数，但是反向模式求导给了我们所有的输入的输出导数。

differentiation区别，分化; 分异; 衍进; 求导数
derivative导数，微商; 衍生物，派生物; 派生词; 衍生的; 导出的; 拷贝的
with respect to关于， 谈到
node节点; 植物的节
derivatives导数; 衍生性金融商品; 派生物，引出物( derivative的名词复数 )
respect尊重; 尊敬; 关心; 遵守; 敬意; 尊重，恭敬; 某方面
inputs投入; 输入( input的名词复数 ); <电>输入端; 输入的数据; 把…输入电脑( input的第三人称单数 )
gave交给; 给予，赠送( give的过去式 ); 供给
output输出; 产量; 作品; 输出信号
all of实足，不少于

For this graph, that’s only a factor of two speed up, but imagine a function with a million inputs and one output. Forward-mode differentiation would require us to go through the graph a million times to get the derivatives. Reverse-mode differentiation can get them all in one fell swoop! A speed up of a factor of a million is pretty nice!
对于这个图，这只是两个加速的一个因素，但是想象一个有一百万个输入和一个输出的函数。前向模式求导将需要我们经过这个图一百万次来得到导数。反向模式求导可以一举获得全部收益！加速系数为一百万是相当不错的！

For this为此
a factor抽象因素
speed up加速; 开快车; 加紧; 增速
a million一百万
one output单输出
differentiation区别，分化; 分异; 衍进; 求导数
go through通过; 用完; 检查; 完成
derivatives导数; 衍生性金融商品; 派生物，引出物( derivative的名词复数 )
all in one多合一
swoop俯冲，猛冲; 突然扑向; 出其不意的抓起; 飞扑，攫取; 下扑

When training neural networks, we think of the cost (a value describing how bad a neural network performs) as a function of the parameters (numbers describing how the network behaves). We want to calculate the derivatives of the cost with respect to all the parameters, for use in gradient descent. Now, there’s often millions, or even tens of millions of parameters in a neural network. So, reverse-mode differentiation, called backpropagation in the context of neural networks, gives us a massive speed up!
当训练神经网络时，我们考虑成本（描述一个神经网络运行的好坏）作为参数的函数（描述网络行为的数字）。我们想要计算在梯度下降中使用的所有参数的导数的成本。现在，神经网络中常常有上百万甚至甚至上千万的参数。因此，反向模式求导，在神经网络的背景下称为反向传播，给了我们一个巨大的加速！

think of想起; 考虑; 有…想法; 对…有意见
a valueA值
neural network神经式网络
want to要; <口>应该
with respect to关于， 谈到
gradient descent梯度下降
or even乃至，以至
tens of millions数以百万的
in the context of在…情况下；在…背景下
speed up加速; 开快车; 加紧; 增速

(Are there any cases where forward-mode differentiation makes more sense? Yes, there are! Where the reverse-mode gives the derivatives of one output with respect to all inputs, the forward-mode gives us the derivatives of all outputs with respect to one input. If one has a function with lots of outputs, forward-mode differentiation can be much, much, much faster.)
（前模式求导有什么更大的意义吗？）是的，有。在反向模式给出一个输出关于所有输入的导数时，前向模式给出了所有输出相对于一个输入的导数。如果有一个具有大量输出的函数，前向模式求导可以更快。

one output单输出
with respect to关于， 谈到
of all在所有…中
one input1输入

Isn’t This Trivial?这不重要吗？

When I first understood what backpropagation was, my reaction was: “Oh, that’s just the chain rule! How did it take us so long to figure out?” I’m not the only one who’s had that reaction. It’s true that if you ask “is there a smart way to calculate derivatives in feedforward neural networks?” the answer isn’t that difficult.
当我第一次知道反向传播是什么时，我的反应是：“哦，这只是链式法则！我们花了这么长时间才弄明白的？“我不是唯一一个有这种反应的人。如果你问“这是一个在前馈神经网络中计算导数的好方法是真的吗？”“答案并不是那么难。

understood了解; 懂，理解( understand的过去式和过去分词 ); 默认; 听说
backpropagation反向传播(B-P)，可以用来表示一种神经网络算法，例如：B-P网络。
chain rule链式法则，链规则
so long再见
figure out想出; 解决; 计算出; 弄明白
only one有一无二
calculate计算; 估计; 打算，计划; 旨在; 预测，推测
derivatives导数; 衍生性金融商品; 派生物，引出物( derivative的名词复数 )
feedforward前馈
neural神经的; 背的，背侧的

But I think it was much more difficult than it might seem. You see, at the time backpropagation was invented, people weren’t very focused on the feedforward neural networks that we study. It also wasn’t obvious that derivatives were the right way to train them. Those are only obvious once you realize you can quickly calculate derivatives. There was a circular dependency.
但我认为这要比看起来困难得多。你知道，在反向传播技术被发明的时候，人们并没有把注意力放在我们研究的前馈神经网络上。导数是训练它们的正确方法，这一点也不明显。这些都是显而易见的，一旦你意识到你可以快速计算导数。存在循环依赖。

much more多; 更加
You see你瞧，要知道，你是知道的
at the time当时; 在那时， 在那段时间; 旋
backpropagation反向传播(B-P)，可以用来表示一种神经网络算法，例如：B-P网络。
invented虚拟; 编造; 发明，创造( invent的过去式和过去分词 )
feedforward前馈
neural神经的; 背的，背侧的
derivatives导数; 衍生性金融商品; 派生物，引出物( derivative的名词复数 )
circular圆形的; 环行的; 迂回的，绕行的; 供传阅的，流通的; 通知，通告; 印制的广告，传单
dependency属国，属地; 从属，从属物

Worse, it would be very easy to write off any piece of the circular dependency as impossible on casual thought. Training neural networks with derivatives? Surely you’d just get stuck in local minima. And obviously it would be expensive to compute all those derivatives. It’s only because we know this approach works that we don’t immediately start listing reasons it’s likely not to.

That’s the benefit of hindsight. Once you’ve framed the question, the hardest work is already done.

更糟糕的是，在随意的想法上，写下循环依赖的任何一部分都是不可能的。用导数训练神经网络？你肯定会陷入局部极小。很显然，计算所有这些导数是很昂贵的。这仅仅是因为我们知道这种方法是可行的，所以我们不会马上开始列出原因。
这是事后诸葛亮的好处。一旦你提出了这个问题，最困难的工作就已经完成了。

very easy易易
write off流利地写下; 损失掉; 毁掉; 结束掉
dependency属国，属地; 从属，从属物
neural神经的; 背的，背侧的
derivatives导数; 衍生性金融商品; 派生物，引出物( derivative的名词复数 )
get stuck in开始起劲地做某事; <俚>全神贯注吃饭或工作，使劲干
minima最小值，最小化（minimum的复数）; 最低限度，最小量（ minimum的名词复数 ）; 极小量
compute计算，估算; 推断; 用计算机计算; 计算
hindsight后见之明; 照尺; 表尺
hardest困难的( hard的最高级 ); 硬的; 有力的; 努力的
approach n. 方法；途径；接近vt. 接近；着手处理vi. 靠近

Conclusion结论

Derivatives are cheaper than you think. That’s the main lesson to take away from this post. In fact, they’re unintuitively cheap, and us silly humans have had to repeatedly rediscover this fact. That’s an important thing to understand in deep learning. It’s also a really useful thing to know in other fields, and only more so if it isn’t common knowledge.

Are there other lessons? I think there are.

导数比你想象的便宜。这是从这篇文章中吸取的主要教训。事实上，他们不直观地便宜，和我们愚蠢的人类不得不多次重新发现这个事实。在深度学习中理解这一点很重要。在其他领域也知道这一点是非常有用的，而且如果不是常见的知识，则更是如此。
还有其他的课程吗？我想有。

Derivatives导数; 衍生性金融商品; 派生物，引出物( derivative的名词复数 )
take away from夺去; 减损
In fact事实上; 实际上，其实; 实则; 说起来
have had偏
repeatedly反复地，重复地; 再三地; 屡次地; 不停地
rediscover再次发现
in deep卷入很深无法摆脱
more so更是这样，尤其如此
common knowledge大家都知道的事，常识
there are有

Backpropagation is also a useful lens for understanding how derivatives flow through a model. This can be extremely helpful in reasoning about why some models are difficult to optimize. The classic example of this is the problem of vanishing gradients in recurrent neural networks.
反向传播也是一个有用的镜头，了解如何通过一个模型流动的导数。这对于推断为什么有些模型很难优化是非常有用的。经典的例子是递归神经网络中梯度消失问题。

Backpropagation反向传播(B-P)，可以用来表示一种神经网络算法，例如：B-P网络。
derivatives导数; 衍生性金融商品; 派生物，引出物( derivative的名词复数 )
flow through溢流道
reasoning推理，论证; 运用思考、理解、推想等能力的做法或过程; 论究，论断; 推理，思考; 争辩; 说服; 推理的; 有关推理的
difficult to难以
optimize使最优化，使尽可能有效
example of例证；…的例子；…的榜样
vanishing消失，消没，等于零; 消失( vanish的现在分词 ); 突然不见; 不复存在; 绝迹
gradients坡度; 道路的斜度( gradient的名词复数 ); 变化程度; 变化率
recurrent neural networks循环神经网络

Finally, I claim there is a broad algorithmic lesson to take away from these techniques. Backpropagation and forward-mode differentiation use a powerful pair of tricks (linearization and dynamic programming) to compute derivatives more efficiently than one might think possible. If you really understand these techniques, you can use them to efficiently calculate several other interesting expressions involving derivatives. We’ll explore this in a later blog post.
最后，我认为有一个广泛的算法教训，以摆脱这些技术。反向传播和前向模式求导使用一组强大的技巧（线性化和动态规划）来更有效地计算导数。如果您真正理解这些技术，您可以使用它们有效地计算涉及导数的其他几个有趣的表达式。我们将在以后的博客文章中探讨这个问题。

there is那儿有; 有着
algorithmic算法的，规则系统的
take away from夺去; 减损
Backpropagation反向传播(B-P)，可以用来表示一种神经网络算法，例如：B-P网络。
differentiation区别，分化; 分异; 衍进; 求导数
pair of一对
linearization线性化
dynamic programming动态规划，动态规划法
compute计算，估算; 推断; 用计算机计算; 计算
derivatives导数; 衍生性金融商品; 派生物，引出物( derivative的名词复数 )

This post gives a very abstract treatment of backpropagation. I strongly recommend reading Michael Nielsen’s chapter on it for an excellent discussion, more concretely focused on neural networks.
这篇文章提供了一个非常抽象的反向传播治疗。我强烈建议阅读Michael Nielsen关于它的一章，进行一次精彩的讨论，更具体地聚焦于神经网络。

abstract抽象的，理论上的; 难解的; 抽象派的; 茫然的; 摘要; 抽象概念; 抽象派艺术作品; 萃取物; 提取，分离; 转移; 概括，摘录; <婉辞>剽窃
backpropagation反向传播(B-P)，可以用来表示一种神经网络算法，例如：B-P网络。
strongly强烈地; 坚强地; 坚固地; 强有力地
Michael迈克尔
Nielsen尼耳森; 尼尔森
chapter章，回; 分会; 人生或历史上的重要时期; 把…分成章节
on it大量喝酒
concretely具体地
focused聚焦的; 集中， 聚集( focus的过去式和过去分词 ); 调整焦点[焦距]以便看清; 焦点; 集中
neural神经的; 背的，背侧的

Acknowledgments 致谢

Thank you to Greg Corrado, Jon Shlens, Samy Bengio and Anelia Angelova for taking the time to proofread this post.

Thanks also to Dario Amodei, Michael Nielsen and Yoshua Bengio for discussion of approaches to explaining backpropagation. Also thanks to all those who tolerated me practicing explaining backpropagation in talks and seminar series!
谢谢你，Greg Corrado，Jon Shlens，花时间校对后Samy Bengio和Anelia Angelova。
也感谢Dario Amodei，为解释传播途径探讨Michael Nielsen和Yoshua Bengio。也感谢所有容忍我在谈判和研讨会系列中解释反向传播的人！

This might feel a bit like dynamic programming. That’s because it is!↩

参考：https://zhuanlan.zhihu.com/p/25081671?refer=xiaoleimlnote

傅晨明

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
计算图运算：反向传播算法（BP）

原文地址：http://colah.github.io/posts/2015-08-Backprop/计算图运算：BP发布于2015年8月31日Introduction简介：Backpropagation is the key algorithm that makes training deep models computationally tractable. For modern neural
复制链接

扫一扫

专栏目录