Sigmoid neurons Sigmoid神经元
Learning algorithms sound terrific. But how can we devise such algorithms for a neural network? Suppose we have a network of perceptrons that we’d like to use to learn to solve some problem. For example, the inputs to the network might be the raw pixel data from a scanned, handwritten image of a digit. And we’d like the network to learn weights and biases so that the output from the network correctly classifies the digit. To see how learning might work, suppose we make a small change in some weight (or bias) in the network. What we’d like is for this small change in weight to cause only a small corresponding change in the output from the network. As we’ll see in a moment, this property will make learning possible. Schematically, here’s what we want (obviously this network is too simple to do handwriting recognition!):
学习算法听起来恐怖。但是我们怎么为神经网络产生这种算法呢?假设我们有一个感知器网络,我们想要使用它来解决一些问题。例如,这个网络的输入也许是扫描手写数字图片得到的像素数据。我们想要这个网络学习出权重和偏移量使得网络的输出可以正确的分类出数字。来看看学习算法是如何工作的,假设我们小小改变了一些网络中的权重(或者偏移量)。我们想要得到的是对权重的小小改变仅仅在输出产生一个对应的小小改变。正如这个时候我们看到的,这个性质使得学习成为可能。大略上,这就是我们想要的(显然这个网络对于笔迹识别来说太简单了!)。
If it were true that a small change in a weight (or bias) causes only a small change in output, then we could use this fact to modify the weights and biases to get our network to behave more in the manner we want. For example, suppose the network was mistakenly classifying an image as an “8” when it should be a “9”. We could figure out how to make a small change in the weights and biases so the network gets a little closer to classifying the image as a “9”. And then we’d repeat this, changing the weights and biases over and over to produce better and better output. The network would be learning.
如果权重(或者偏移量)的微小改变仅仅使得输出小小改变是真的,那么我们可以利用这个性质来修改权重和偏移量使得我们的网络表现出更多我们想要的性质。举个例子,假设网络错误的将一张“9”的图片分类为“8”了。我们可以指出如何小小改变权重和偏移量使得网络更倾向于将这张图片分类为“9”。接下来我们不断重复这个步骤,不断调整权重和偏移量产生更好的结果。网络就这样学习了。
The problem is that this isn’t what happens when our network contains perceptrons. In fact, a small change in the weights or bias of any single perceptron in the network can sometimes cause the output of that perceptron to completely flip, say from
0
to
问题是当我们的网络中包含感知器的话这种情况就不会发生。事实上,网络中某个感知器的权重和偏移量的微小改变有时候会使得感知器输出彻底翻转,也就是从
0
变成
We can overcome this problem by introducing a new type of artificial neuron called a sigmoid neuron. Sigmoid neurons are similar to perceptrons, but modified so that small changes in their weights and bias cause only a small change in their output. That’s the crucial fact which will allow a network of sigmoid neurons to learn.
我们可以通过构造一种新的人工神经元sigmoid神经元来克服这个问题。sigmoid神经元和感知器非常像,不过权重和偏移量的微小改变只会使得它的输出产生微小改变。这个关键特性使得sigmoid神经网络可以学习。
Okay, let me describe the sigmoid neuron. We’ll depict sigmoid neurons in the same way we depicted perceptrons:
好的,让我们来描述一下sigmoid神经元。我们将用类似于描述感知器的方式来描述sigmoid神经元:
Just like a perceptron, the sigmoid neuron has inputs,
x1,x2,…
But instead of being just
0
or
就像感知器,sigmoid神经元有多个输入
To put it all a little more explicitly, the output of a sigmoid neuron with inputs x1,x2,…x1,x2,…, weights w1,w2,…w1,w2,…, and bias bb is
At first sight, sigmoid neurons appear very different to perceptrons. The algebraic form of the sigmoid function may seem opaque and forbidding if you’re not already familiar with it. In fact, there are many similarities between perceptrons and sigmoid neurons, and the algebraic form of the sigmoid function turns out to be more of a technical detail than a true barrier to understanding.
第一眼看上去,sigmoid神经元和感知器差别甚大。sigmoid函数的代数表达式也许看起来复杂且可怕,如果你不是已经熟悉它的话。事实上,感知器和sigmoid神经元之间还是有许多相似的地方,并且相比它带来的理解上的困难,sigmoid函数的代数表达式给出了更多的技术细节。
To understand the similarity to the perceptron model, suppose
z≡w⋅x+b
is a large positive number. Then
e−z≈0
and so
σ(z)≈1
. In other words, when
z=w⋅x+b
is large and positive, the output from the sigmoid neuron is approximately
1
, just as it would have been for a perceptron. Suppose on the other hand that
为了理解和感知器模型的相似性,假设
z≡w⋅x+b
的结果是一个大的正数。那么
e−z≈0
且
σ(z)≈1
。换句话说,当
z=w⋅x+b
很大且为正时,sigmoid神经元的输出近似于
1
,就像感知器的结果。另一方面假设
What about the algebraic form of
σ
? How can we understand that? In fact, the exact form of
σ
的代数表达式又是怎样的呢?我们能怎么理解它呢?事实上,
This shape is a smoothed out version of a step function:
这是平滑过的阶跃函数:
If
σ
had in fact been a step function, then the sigmoid neuron would be a perceptron, since the output would be
如果
σ
是个阶跃函数,那么sigmoid神经元将变成感知器,因为根据
where the sum is over all the weights, wj , and ∂output/∂wj and ∂output/∂b denote partial derivatives of the output with respect to wj and b , respectively. Don’t panic if you’re not comfortable with partial derivatives! While the expression above looks complicated, with all the partial derivatives, it’s actually saying something very simple (and which is very good news):
这里是对所有权重 wj 求和,并且 ∂output/∂wj 和 ∂output/∂b 表示 output 对 wj 和 b 的偏导数。不用为偏导数而痛苦!当然上面的表达式看起来很复杂,用了偏导数,但是它事实上说的事情很简单(这是一个非常好的消息):
If it’s the shape of
σ
which really matters, and not its exact form, then why use the particular form used for
既然
σ
的图像而不是表达式这么重要,在等式(3)中为什么要使用
How should we interpret the output from a sigmoid neuron? Obviously, one big difference between perceptrons and sigmoid neurons is that sigmoid neurons don’t just output
0
or
我们如何解读sigmoid神经元的输出呢?很明显,感知器和sigmoid神经元之间的一个大不同是sigmoid神经元不是仅仅输出
0
或者
Exercises
Sigmoid neurons simulating perceptrons, part I
Suppose we take all the weights and biases in a network of perceptrons, and multiply them by a positive constant,
c>0
. Show that the behaviour of the network doesn’t change.
设想我们把一个感知器网络的所有权重和偏置都乘以一个正常数
c>0
。证明这个网络的特性不会改变。
Sigmoid neurons simulating perceptrons, part II
Suppose we have the same setup as the last problem - a network of perceptrons. Suppose also that the overall input to the network of perceptrons has been chosen. We won’t need the actual input value, we just need the input to have been fixed. Suppose the weights and biases are such that
w⋅x+b≠0
for the input
x
to any particular perceptron in the network. Now replace all the perceptrons in the network by sigmoid neurons, and multiply the weights and biases by a positive constant
设想我们和上面的问题有相同的设定——一个感知器网络。设想网络中所有感知器的输入已经定好。我们不需要真的输入值,我们仅仅需要输入被确定。设想任意网络中的感知器的权重和偏置对于输入
x
满足