python 反向传播_反向传播的工作原理,以及如何使用Python构建神经网络

python 反向传播

by Samay Shamdasani

由Samay Shamdasani

反向传播的工作原理,以及如何使用Python构建神经网络 (How backpropagation works, and how you can use Python to build a neural network)

Neural networks can be intimidating, especially for people new to machine learning. However, this tutorial will break down how exactly a neural network works and you will have a working flexible neural network by the end. Let’s get started!

神经网络可能会令人生畏,尤其是对于刚接触机器学习的人。 但是,本教程将详细说明神经网络的工作原理,最后您将拥有一个有效的灵活神经网络。 让我们开始吧!

了解过程 (Understanding the process)

With approximately 100 billion neurons, the human brain processes data at speeds as fast as 268 mph! In essence, a neural network is a collection of neurons connected by synapses.

人脑拥有约1000亿个神经元,其处理数据的速度高达268 mph ! 本质上,神经网络是由突触连接的神经元的集合。

This collection is organized into three main layers: the input later, the hidden layer, and the output layer.

该集合分为三个主要层:后面的输入,隐藏层和输出层。

You can have many hidden layers, which is where the term deep learning comes into play. In an artificial neural network, there are several inputs, which are called features, which produce at least one output — which is called a label.

您可以有许多隐藏层,这就是深度学习这个词的作用。 在人工神经网络中,有多个输入(称为特征) ,这些输入产生至少一个输出(称为标签)

In the drawing above, the circles represent neurons while the lines represent synapses.

在上图中,圆圈代表神经元,而线条代表突触。

The role of a synapse is to take and multiply the inputs and weights.

突触的作用是获取并乘以输入和权重

You can think of weights as the “strength” of the connection between neurons. Weights primarily define the output of a neural network. However, they are highly flexible. After, an activation function is applied to return an output.

您可以将权重视为神经元之间连接的“强度”。 权重主要定义神经网络的输出。 但是,它们非常灵活。 之后,应用激活功能以返回输出。

Here’s a brief overview of how a simple feedforward neural network works:

这是一个简单的前馈神经网络的工作原理的简要概述:

  1. Take inputs as a matrix (2D array of numbers)

    将输入作为矩阵(二维数字数组)
  2. Multiply the inputs by a set of weights (this is done by matrix multiplication, aka taking the ‘dot product’)

    将输入乘以一组权重(这是通过矩阵乘法来完成的,又称“点积”)

  3. Apply an activation function

    应用激活功能
  4. Return an output

    返回输出
  5. Error is calculated by taking the difference between the desired output from the model and the predicted output. This is a process called gradient descent, which we can use to alter the weights.

    通过获取模型的期望输出与预测输出之间的差来计算误差。 这是一个称为梯度下降的过程,我们可以使用它来更改权重。
  6. The weights are then adjusted, according to the error found in step 5.

    然后根据步骤5中发现的误差调整权重。
  7. To train, this process is repeated 1,000+ times. The more the data is trained upon, the more accurate our outputs will be.

    为了训练,此过程重复了1000多次。 训练的数据越多,我们的输出将越准确。

At their core, neural networks are simple.

神经网络的核心是简单的。

They just perform matrix multiplication with the input and weights, and apply an activation function.

他们只是使用输入和权重执行矩阵乘法,然后应用激活函数。

When weights are adjusted via the gradient of loss function, the network adapts to the changes to produce more accurate outputs.

当通过损失函数的梯度调整权重时,网络将适应变化以产生更准确的输出。

Our neural network will model a single hidden layer with three inputs and one output. In the network, we will be predicting the score of our exam based on the inputs of how many hours we studied and how many hours we slept the day before. The output is the ‘test score’.

我们的神经网络将对具有三个输入和一个输出的单个隐藏层进行建模。 在网络中,我们将根据输入的学习时间和前一天的睡眠时间来预测考试分数。 输出为“测试分数”。

Here’s our sample data of what we’ll be training our Neural Network on:

这是我们将在以下方面训练神经网络的示例数据:

As you may have noticed, the ? in this case represents what we want our neural network to predict. In this case, we are predicting the test score of someone who studied for four hours and slept for eight hours based on their prior performance.

您可能已经注意到, ? 在这种情况下,代表了我们希望神经网络预测的结果。 在这种情况下,我们根据先前的表现预测一个学习了四个小时并睡了八个小时的人的考试成绩。

正向传播 (Forward Propagation)

Let’s start coding this bad boy! Open up a new python file. You’ll want to import numpy as it will help us with certain calculations.

让我们开始编码这个坏男孩! 打开一个新的python文件。 您将要导入numpy因为它可以帮助我们进行某些计算。

First, let’s import our data as numpy arrays using np.array. We'll also want to normalize our units as our inputs are in hours, but our output is a test score from 0-100. Therefore, we need to scale our data by dividing by the maximum value for each variable.

首先,让我们使用np.array将数据导入为numpy数组。 我们还希望将单位归一化,因为我们的输入以小时为单位,但是我们的输出是0到100的测试分数。 因此,我们需要通过除以每个变量的最大值来缩放数据。

Next, let’s define a python class and write an init function where we'll specify our parameters such as the input, hidden, and output layers.

接下来,让我们定义一个python class并编写一个init函数,在其中我们将指定我们的参数,例如输入,隐藏和输出层。

It is time for our first calculation. Remember that our synapses perform a dot product, or matrix multiplication of the input and weight. Note that weights are generated randomly and between 0 and 1.

现在是我们进行第一次计算的时候了。 请记住,我们的突触执行点积或输入和权重的矩阵乘法。 请注意,权重是随机生成的,介于0和1之间。

我们网络背后的计算 (The calculations behind our network)

In the data set, our input data, X, is a 3x2 matrix. Our output data, y, is a 3x1 matrix. Each element in matrix X needs to be multiplied by a corresponding weight and then added together with all the other results for each neuron in the hidden layer. Here's how the first input data element (2 hours studying and 9 hours sleeping) would calculate an output in the network:

在数据集中,我们的输入数据X是一个3x2的矩阵。 我们的输出数据y是3x1矩阵。 矩阵X每个元素都需要乘以相应的权重,然后与隐藏层中每个神经元的所有其他结果加在一起。 以下是第一个输入数据元素(学习2个小时和睡眠9个小时)如何计算网络中的输出:

This image breaks down what our neural network actually does to produce an output. First, the products of the random generated weights (.2, .6, .1, .8, .3, .7) on each synapse and the corresponding inputs are summed to arrive as the first values of the hidden layer. These sums are in a smaller font as they are not the final values for the hidden layer.

此图像破坏了我们的神经网络实际产生输出的功能。 首先,将每个突触上随机生成的权重(.2,.6,.1,.8,.3,.7)与相应输入的乘积相加,得出隐藏层的第一值。 这些总和使用较小的字体,因为它们不是隐藏层的最终值。

To get the final value for the hidden layer, we need to apply the activation function.

为了获得隐藏层的最终值,我们需要应用激活函数

The role of an activation function is to introduce nonlinearity. An advantage of this is that the output is mapped from a range of 0 and 1, making it easier to alter weights in the future.

激活函数的作用是引入非线性 。 这样做的优点是,输出映射范围是0到1,这使得将来更容易更改权重。

There are many activation functions out there, for many different use cases. In this example, we’ll stick to one of the more popular ones — the sigmoid function.

针对许多不同的用例,有许多激活功能。 在此示例中,我们将坚持使用更流行的功能之一-Sigmoid函数。

Now, we need to use matrix multiplication again, with another set of random weights, to calculate our output layer value.

现在,我们需要再次使用矩阵乘法以及另一组随机权重来计算输出层值。

Lastly, to normalize the output, we just apply the activation function again.

最后,为了标准化输出,我们只需再次应用激活函数。

And, there you go! Theoretically, with those weights, out neural network will calculate .85 as our test score! However, our target was .92. Our result wasn't poor, it just isn't the best it can be. We just got a little lucky when I chose the random weights for this example.

而且,您去了! 从理论上讲,使用这些权重,神经网络将计算出.85作为我们的测试分数! 但是,我们的目标是.92 。 我们的结果并不差,但这并不是最好的。 当我为该示例选择随机权重时,我们只是有点幸运。

How do we train our model to learn? Well, we’ll find out very soon. For now, let’s countinue coding our network.

我们如何训练模型学习? 好吧,我们很快就会发现。 现在,让我们开始对我们的网络进行编码。

If you are still confused, I highly reccomend you check out this informative video which explains the structure of a neural network with the same example.

如果您仍然感到困惑,我强烈建议您观看这个内容丰富的视频, 视频以相同的示例解释了神经网络的结构。

实施计算 (Implementing the calculations)

Now, let’s generate our weights randomly using np.random.randn(). Remember, we'll need two sets of weights. One to go from the input to the hidden layer, and the other to go from the hidden to output layer.

现在,让我们使用np.random.randn()随机生成权重。 请记住,我们将需要两组权重。 一个从输入层转到隐藏层,另一个从隐藏层到输出层。

Once we have all the variables set up, we are ready to write our forward propagation function. Let's pass in our input, X, and in this example, we can use the variable z to simulate the activity between the input and output layers.

一旦设置了所有变量,就可以编写forward传播函数了。 让我们传入输入X ,在此示例中,我们可以使用变量z模拟输入层和输出层之间的活动。

As explained, we need to take a dot product of the inputs and weights, apply an activation function, take another dot product of the hidden layer and second set of weights, and lastly apply a final activation function to receive our output:

如前所述,我们需要获取输入和权重的点积,应用激活函数,获取隐藏层和第二组权重的另一个点积,最后应用最终激活函数来接收输出:

Lastly, we need to define our sigmoid function:

最后,我们需要定义我们的S型函数:

And, there we have it! A (untrained) neural network capable of producing an output.

我们终于得到它了! 能够产生输出的(未经训练的)神经网络。

As you may have noticed, we need to train our network to calculate more accurate results.

您可能已经注意到,我们需要训练我们的网络以计算出更准确的结果。

反向传播-我们网络的“学习” (Backpropagation — the “learning” of our network)

Since we have a random set of weights, we need to alter them to make our inputs equal to the corresponding outputs from our data set. This is done through a method called backpropagation.

由于我们有一组随机的权重,因此我们需要更改权重以使我们的输入等于数据集中相应的输出。 这是通过一种称为反向传播的方法完成的。

Backpropagation works by using a loss function to calculate how far the network was from the target output.

反向传播通过使用损耗函数来计算网络与目标输出之间的距离。

计算误差 (Calculating error)

One way of representing the loss function is by using the mean sum squared loss function:

表示损失函数的一种方法是使用均方平方损失函数:

In this function, o is our predicted output, and y is our actual output. Now that we have the loss function, our goal is to get it as close as we can to 0. That means we will need to have close to no loss at all. As we are training our network, all we are doing is minimizing the loss.

在此函数中, o是我们的预测输出,而y是我们的实际输出。 现在我们有了损失函数,我们的目标是使它尽可能接近0。这意味着我们将需要几乎没有损失。 在培训网络时,我们所做的就是将损失降到最低。

To figure out which direction to alter the weights, we need to find the rate of change of our loss with respect to our weights. In other words, we need to use the derivative of the loss function to understand how the weights affect the input.

为了弄清楚改变权重的方向,我们需要找到损失相对于权重的变化率。 换句话说,我们需要使用损失函数的导数来了解权重如何影响输入。

In this case, we will be using a partial derivative to allow us to take into account another variable.

在这种情况下,我们将使用偏导数来考虑另一个变量。

This method is known as gradient descent. By knowing which way to alter our weights, our outputs can only get more accurate.

这种方法称为梯度下降 。 通过知道改变体重的方法,我们的输出结果将变得更加准确。

Here’s how we will calculate the incremental change to our weights:

这是我们计算权重增量变化的方法:

  1. Find the margin of error of the output layer (o) by taking the difference of the predicted output and the actual output (y)

    通过获取预测输出与实际输出的差(y)来找到输出层的误差裕度 (o)

  2. Apply the derivative of our sigmoid activation function to the output layer error. We call this result the delta output sum.

    将我们的S型激活函数的导数应用于输出层错误。 我们称此结果为增量输出总和

  3. Use the delta output sum of the output layer error to figure out how much our z² (hidden) layer contributed to the output error by performing a dot product with our second weight matrix. We can call this the z² error.

    使用输出层误差的增量输出总和来计算z²(隐藏)层通过与第二个权重矩阵执行点积运算,从而对输出误差有多大贡献。 我们可以称其为z²错误。
  4. Calculate the delta output sum for the z² layer by applying the derivative of our sigmoid activation function (just like step 2).

    通过应用S型激活函数的导数来计算z²层的增量输出总和(就像步骤2一样)。
  5. Adjust the weights for the first layer by performing a dot product of the input layer with the hidden () delta output sum. For the second weight, perform a dot product of the hidden(z²) layer and the output (o) delta output sum.

    通过执行输入层隐藏()增量输出总和点积来调整第一层的权重。 对于第二个权重,请执行hidden(z²)层与输出(o)增量输出总和的点积。

Calculating the delta output sum and then applying the derivative of the sigmoid function are very important to backpropagation. The derivative of the sigmoid, also known as sigmoid prime, will give us the rate of change, or slope, of the activation function at output sum.

计算增量输出总和,然后应用S型函数的导数对于反向传播非常重要。 S型导数(也称为S型素数 )将在输出总和时为我们提供激活函数的变化率或斜率。

Let’s continue to code our Neural_Network class by adding a sigmoidPrime (derivative of sigmoid) function:

让我们继续编写我们Neural_Network通过添加sigmoidPrime(衍生乙状结肠)功能类:

Then, we’ll want to create our backward propagation function that does everything specified in the four steps above:

然后,我们要创建backward传播函数,该函数可以完成上述四个步骤中指定的所有操作:

We can now define our output through initiating foward propagation and intiate the backward function by calling it in the train function:

现在,我们可以通过启动向前传播来定义输出,并通过在train函数中调用它来启动向后函数:

To run the network, all we have to do is to run the train function. Of course, we'll want to do this multiple, or maybe thousands, of times. So, we'll use a for loop.

要运行网络,我们要做的就是运行train功能。 当然,我们将需要多次甚至数千次。 因此,我们将使用for循环。

Here’s the full 60 lines of awesomeness:

这是完整的60行真棒:

There you have it! A full-fledged neural network that can learn from inputs and outputs.

你有它! 可以从输入和输出中学习的成熟神经网络。

While we thought of our inputs as hours studying and sleeping, and our outputs as test scores, feel free to change these to whatever you like and observe how the network adapts!

当我们将输入视为学习和睡眠时间,将输出视为测试成绩时,请随时将其更改为您喜欢的任何值,并观察网络的适应方式!

After all, all the network sees are the numbers. The calculations we made, as complex as they seemed to be, all played a big role in our learning model.

毕竟,所有网络看到的都是数字。 我们进行的计算,尽管看起来似乎很复杂,但在我们的学习模型中都发挥了重要作用。

If you’d like to predict an output based on our trained data, such as predicting the test score if you studied for four hours and slept for eight, check out the full tutorial here.

如果您想根据我们训练有素的数据来预测输出,例如,如果您学习了四个小时而睡了八个小时,则可以预测测试成绩,请在此处查看完整的教程。

演示来源 (Demo & Source)

参考文献 (References)

Steven Miller

史蒂文·米勒

Welch Labs

韦尔奇实验室

Kabir Shah

卡比尔·沙(Kabir Shah)

This tutorial was originally posted on Enlight, a website that hosts a variety of tutorials and projects to learn by building! Check it out for more projects like these :)

本教程最初发布在Enlight上 ,该网站托管着许多教程和项目,可通过构建来学习! 签出更多类似这样的项目:)

翻译自: https://www.freecodecamp.org/news/build-a-flexible-neural-network-with-backpropagation-in-python-acffeb7846d0/

python 反向传播

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值