神经网络与深度学习（第一章）（五）_a simple network to classify handwritten digits-CSDN博客

A simple network to classify handwritten digits 分类手写数字的简单网络

Having defined neural networks, let’s return to handwriting recognition. We can split the problem of recognizing handwritten digits into two sub-problems. First, we’d like a way of breaking an image containing many digits into a sequence of separate images, each containing a single digit. For example, we’d like to break the image
定义了神经网络，我们回到笔迹识别。我们可以将识别手写数字的问题切分为两个子问题。首先，我们需要找到一个方法将有许多数字的图片切分为一系列只有一个数字的图片。例如，我们要将下面的图片

into six separate images,
切分为6张图片

We humans solve this segmentation problem with ease, but it’s challenging for a computer program to correctly break up the image. Once the image has been segmented, the program then needs to classify each individual digit. So, for instance, we’d like our program to recognize that the first digit above,
人来解决这个切分问题是非常简单的，但是对于计算机程序来说要正确切分图片是十分具有挑战性的。一旦图片被切分后，程序需要将每个数字正确分类。因此，举例来说，我们需要我们的程序识别上面的第一个数字，

is a 5.
是5.

We’ll focus on writing a program to solve the second problem, that is, classifying individual digits. We do this because it turns out that the segmentation problem is not so difficult to solve, once you have a good way of classifying individual digits. There are many approaches to solving the segmentation problem. One approach is to trial many different ways of segmenting the image, using the individual digit classifier to score each trial segmentation. A trial segmentation gets a high score if the individual digit classifier is confident of its classification in all segments, and a low score if the classifier is having a lot of trouble in one or more segments. The idea is that if the classifier is having trouble somewhere, then it’s probably having trouble because the segmentation has been chosen incorrectly. This idea and other variations can be used to solve the segmentation problem quite well. So instead of worrying about segmentation we’ll concentrate on developing a neural network which can solve the more interesting and difficult problem, namely, recognizing individual handwritten digits.
我们将聚焦于编写程序来解决第二个问题，也就是识别单独数字的问题。我们这么做是因为一旦你找到很好的方法来识别单独数字，切分问题解决起来不是那么困难。这里有许多方法来解决切分问题。一个途径是尝试多种切分方法，然后用单个数字分类器来给每个切分方法的结果打分。如果分类器对切分的每个数字都能很好识别，那么这个切分将得到高分，反之如果分类器不能很好识别切分后的数字，那么这个切分将得到低分。这个思路是如果分类器在哪里出了问题，那么也许就是那里切分出来问题。这个方法和其他的变形可以解决切分问题解决的很好。所以与其担心切分，我们还不如集中解决开发一个可以解决更有趣更困难的问题，识别单个手写数字，的神经网络。

To recognize individual digits we will use a three-layer neural network:
为了识别单个手写数字，我们将使用3层神经网络：

The input layer of the network contains neurons encoding the values of the input pixels. As discussed in the next section, our training data for the network will consist of many 28 by 28 pixel images of scanned handwritten digits, and so the input layer contains 784=28×28 neurons. For simplicity I’ve omitted most of the 784 input neurons in the diagram above. The input pixels are greyscale, with a value of 0.0 representing white, a value of 1.0 representing black, and in between values representing gradually darkening shades of grey.
网络输入层的神经元对输入像素值进行编码。正如我们下一节将讨论的，我们的训练数据由许多 28 乘 28 像素的手写数字扫描图片组成，因此输入层包含 784=28×28 个神经元。简化起见我在上图中省略了 784 个输入神经元中的大部分。输入像素是灰阶的， 0.0 表示白色， 1.0 表示黑色，之间的值表示灰度。

The second layer of the network is a hidden layer. We denote the number of neurons in this hidden layer by n , and we’ll experiment with different values for n . The example shown illustrates a small hidden layer, containing just n=15 neurons.
网络的第二层是隐含层。我们将隐含层的神经元个数记作 n ，我们将尝试不同的 n 。这个例子使用了小规模的隐含层，包含了 n=15 个神经元。

The output layer of the network contains 10 neurons. If the first neuron fires, i.e., has an output ≈1 , then that will indicate that the network thinks the digit is a 0 . If the second neuron fires then that will indicate that the network thinks the digit is a 1 . And so on. A little more precisely, we number the output neurons from 0 through 9 , and figure out which neuron has the highest activation value. If that neuron is, say, neuron number 6 , then our network will guess that the input digit was a 6 . And so on for the other output neurons.
网络的输出层包含10个神经元。如果第一个神经元被激活，也就是输出 ≈1 ，这表示网络认为这个数字是 0 。如果第二个神经元激活表示网络认为这个数字是 1 ，等等。更准确的说，我们量化 0 到 9 的输出神经元，然后看看哪个神经元有最高的激活值。假设 6 的神经元最高，我们的网络将猜测输入数字是 6 。其他的神经元以此类推。

You might wonder why we use 10 output neurons. After all, the goal of the network is to tell us which digit ( 0,1,2,…,9 ) corresponds to the input image. A seemingly natural way of doing that is to use just 4 output neurons, treating each neuron as taking on a binary value, depending on whether the neuron’s output is closer to 0 or to 1 . Four neurons are enough to encode the answer, since 24=16 is more than the 10 possible values for the input digit. Why should our network use 10 neurons instead? Isn’t that inefficient? The ultimate justification is empirical: we can try out both network designs, and it turns out that, for this particular problem, the network with 10 output neurons learns to recognize digits better than the network with 4 output neurons. But that leaves us wondering why using 10 output neurons works better. Is there some heuristic that would tell us in advance that we should use the 10 -output encoding instead of the 4 -output encoding?
你也许会奇怪我们为什么用 10 个输出神经元。毕竟这个网络的目标是告诉我们哪个数字( 0,1,2,…,9 )对应于输入的图片。还有种很自然的想法是使用 4 个输出神经元，每个神经元看看输出更靠近与 0 或 1 选取一个二进制值。四个神经元足够对结果进行编码了，因为 24=16 超过了输入数字的 10 个可能结果。为什么我们反而要用 10 个神经元呢？这样做不是效率更低吗？最终的理由是经验主义的：我们可以尝试设计这两种网络，对于这个特定的问题， 10 个输出神经元的网络识别数字的效果比 4 个输出神经元的网络好。不过这使得我们好奇为什么使用 10 个输出神经元会得到更好的效果。有什么经验预先告诉我们应该使用 10 来对输出编码而不是 4 ？

To understand why we do this, it helps to think about what the neural network is doing from first principles. Consider first the case where we use 10 output neurons. Let’s concentrate on the first output neuron, the one that’s trying to decide whether or not the digit is a 0 . It does this by weighing up evidence from the hidden layer of neurons. What are those hidden neurons doing? Well, just suppose for the sake of argument that the first neuron in the hidden layer detects whether or not an image like the following is present:
从基本原理上了解神经网络做了些什么可以帮助理解为什么我们这么做。考虑第一种情况我们使用10个输出神经元。让我们着重注意第一个输出神经元，它就是尝试判决数字是否是0。它将隐含层的神经元的输出加权求和。隐含层的神经元做了什么？设想隐含层的第一个神经元是来发现一个图片是否和下面的图案相似：

It can do this by heavily weighting input pixels which overlap with the image, and only lightly weighting the other inputs. In a similar way, let’s suppose for the sake of argument that the second, third, and fourth neurons in the hidden layer detect whether or not the following images are present:
它可以通过提高那些和这个图片重叠的输入像素的权重，降低其他输入的权重来做到这个。以此类推，让我们设想第二，第三和第四个隐含层的神经元是来发现一个图片是否和下面的图案相似：

As you may have guessed, these four images together make up the 0 image that we saw in the line of digits shown earlier:
也许你已经猜到了，这四个图像组合成立数字 0 的图像，正如我们之前看到的：

So if all four of these hidden neurons are firing then we can conclude that the digit is a 0 . Of course, that’s not the only sort of evidence we can use to conclude that the image was a 0 - we could legitimately get a 0 in many other ways (say, through translations of the above images, or slight distortions). But it seems safe to say that at least in this case we’d conclude that the input was a 0 .
因此如果所有这四个隐含层神经元被激活，那么我们可以得出这个数字是 0 。当然，这不是仅有的证据我们能用来判决图片是 0 。我们可以通过多种其他的方法合理的推断它是 0 （也就是说，通过对上面的图片变换，或者轻微的变形）。但是可以说最少在这个例子中我们可以判决输入是 0 。

Supposing the neural network functions in this way, we can give a plausible explanation for why it’s better to have 10 outputs from the network, rather than 4 . If we had 4 outputs, then the first output neuron would be trying to decide what the most significant bit of the digit was. And there’s no easy way to relate that most significant bit to simple shapes like those shown above. It’s hard to imagine that there’s any good historical reason the component shapes of the digit will be closely related to (say) the most significant bit in the output.
这样设想神经网络函数，我们可以给出一个似乎合理的理由来解释为什么10输出的网络比4输出的网络要好。如果我们有4个输出，那么第一个输出神经元将尝试判决这个数字最显著的比特是什么。将最显著的比特和上面简单的形状联系起来可不是件容易的事。很难想象有其他好的经验理由来帮助我们将最显著的比特和数字各局部的形状联系在一起。

Now, with all that said, this is all just a heuristic. Nothing says that the three-layer neural network has to operate in the way I described, with the hidden neurons detecting simple component shapes. Maybe a clever learning algorithm will find some assignment of weights that lets us use only 4 output neurons. But as a heuristic the way of thinking I’ve described works pretty well, and can save you a lot of time in designing good neural network architectures.
现在，综上所述，这都是经验型的。没有什么证明3层神经网络是按照我描述的隐含层神经元检测局部简单形状这样工作的。也许有更聪明的算法可以找到权重分配来让我们仅仅使用4个输出神经元。但是按照我描述的作为经验方法去想还是可以较好理解的，并且可以帮你在设计神经网络结构的时候节省不少时间。

Exercise

There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below. Find a set of weights and biases for the new output layer. Assume that the first 3 layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least 0.99 , and incorrect outputs have activation less than 0.01 .
通过在三层网络上增加额外的一层可以识别数字的二进制表示。这额外的一层将前面一层的输出转换为二进制表示，如下图所示。为新的输出层找到一组权重和偏置。假设前三层神经元输出的不是大于 0.99 就是小于 0.01 。