dnn:深度神经网络_神经网络简介:第2部分

dnn:深度神经网络

In a previous post, I described how to do backpropogation with a 2-layer neural network. I’ve written this post assuming some familiarity with the previous post.

在上一篇文章中 ,我描述了如何使用2层神经网络进行反向传播 。 我写这篇文章是假定对上一篇文章有​​所了解。

When first created, 2-layer neural networks brought about quite a bit of excitement, but this excitement quickly dissipated when researchers realized that 2-layer neural networks could only solve a limited set of problems.

最初创建时,2层神经网络带来了很多激动 ,但是当研究人员意识到2层神经网络只能解决有限的问题时,这种激动很快就消失

Researchers knew that adding an extra layer to the neural networks enabled neural networks to solve much more complex problems, but they didn’t know how to train these more complex networks.

研究人员知道,在神经网络中添加额外的一层可以使神经网络解决更复杂的问题,但他们不知道如何训练这些更复杂的网络。

In the previous post, I described “backpropogation,” but this wasn’t the portion of backpropogation that really changed the history of neural networks. What really changed neural networks is backpropogation with an extra layer. This extra layer enabled researchers to train more complex networks. The extra layer(s) is(are) called the hidden layer(s). In this post, I will describe backpropogation with a hidden layer.

在上一篇文章中,我描述了“反向传播”,但这并不是反向传播真正改变了神经网络历史的部分。 真正改变了神经网络的是带有额外层的反向传播。 这一额外的层使研究人员能够训练更复杂的网络。 额外的层称为隐藏层。 在这篇文章中,我将介绍带有隐藏层的反向传播。

To describe backpropogation with a hidden layer, I will demonstrate how neural networks can solve the XOR problem.

为了描述具有隐藏层的反向传播,我将演示神经网络如何解决XOR问题

In this example of the XOR problem there are four items. Each item is defined by two values. If these two values are the same, then the item belongs to one group (blue here). If the two values are different, then the item belongs to another group (red here).

在XOR问题的此示例中,有四个项目。 每个项目由两个值定义。 如果这两个值相同,则该项目属于一组(此处为蓝色)。 如果两个值不同,则该项目属于另一个组(此处为红色)。

Below, I have depicted the XOR problem. The goal is to find a model that can distinguish between the blue and red groups based on an item’s values.

下面,我描述了XOR问题。 目标是找到一个可以根据项目的值区分蓝色和红色组的模型。

This code is also available as a jupyter notebook on my github.

此代码也可以在我的github上作为jupyter笔记本使用。

1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10

Again, each item has two values. An item’s first value is represented on the x-axis. An items second value is represented on the y-axis. The red items belong to one category and the blue items belong to another.

同样,每个项目都有两个值。 项目的第一个值表示在x轴上。 项的第二值在y轴上表示。 红色项目属于一个类别,蓝色项目属于另一类别。

This is a non-linear problem because no linear function can segregate the groups. For instance, a horizontal line could segregate the upper and lower items and a vertical line could segregate the left and right items, but no single linear function can segregate the red and blue items.

这是一个非线性问题,因为没有线性函数可以隔离组。 例如,一条水平线可以将上下项目分开,而一条垂直线可以将左右项目分开,但是没有一个线性函数可以将红色和蓝色项目分开。

We need a non-linear function to seperate the groups, and neural networks can emulate a non-linear function that segregates them.

我们需要一个非线性函数来分离各组,神经网络可以模拟将它们分开的非线性函数。

While this problem may seem relatively simple, it gave the initial neural networks quite a hard time. In fact, this is the problem that depleted much of the original enthusiasm for neural networks.

虽然这个问题看似相对简单,但却给最初的神经网络造成了很大的困难。 实际上,这是耗尽神经网络最初热情的问题。

Neural networks can easily solve this problem, but they require an extra layer. Below I depict a network with an extra layer (a 3-layer network). To depict the network, I use a repository available on my github.

神经网络可以轻松解决此问题,但它们需要额外的一层。 下面我描述了一个具有额外层的网络(3层网络)。 为了描述网络,我使用github上的可用存储库。

1
1
2
2
3
3
4
4
5
5
6
6
7
7

Notice that this network now has 5 total neurons. The two units at the bottom are the input layer. The activity of input units is the value of the inputs (same as the inputs in my previous post). The two units in the middle are the hidden layer. The activity of hidden units are calculated in the same manner as the output units from my previous post. The unit at the top is the output layer. The activity of this unit is found in the same manner as in my previous post, but the activity of the hidden units replaces the input units.

请注意,此网络现在共有5个神经元。 底部的两个单元是输入层。 输入单位的活动是输入的值(与我之前的文章中的输入相同)。 中间的两个单元是隐藏层。 隐藏单元的活动与我上一篇文章中的输出单元的计算方式相同。 顶部的单位是输出层。 该单元的活动与我之前的帖子中的查找方式相同,但是隐藏单元的活动替换了输入单元。

Thus, when the neural network makes its guess, the only difference is we have to compute an extra layer’s activity.

因此,当神经网络做出猜测时,唯一的区别是我们必须计算额外层的活动。

The goal of this network is for the output unit to have an activity of 0 when presented with an item from the blue group (inputs are same) and to have an activity of 1 when presented with an item from the red group (inputs are different).

该网络的目标是,输出单元在与蓝色组中的项(输入相同)一起呈现时活动为0,在与红色组中的项(输入不同)呈现时活动为1。 )。

One additional aspect of neural networks that I haven’t discussed is each non-input unit can have a bias. You can think about bias as a propensity for the unit to become active or not to become active. For instance, a unit with a postitive bias is more likely to be active than a unit with no bias.

我尚未讨论的神经网络的另一个方面是,每个非输入单元都可能有偏差。 您可以将偏见视为单位活跃或不活跃的倾向。 例如,具有正偏差的单元比没有偏差的单元更可能处于活动状态。

I will implement bias as an extra line feeding into each unit. The weight of this line is the bias, and the bias line is always active, meaning this bias is always present.

我将偏见作为向每个单元馈送的额外行。 该线的权重为偏差,并且偏差线始终处于活动状态,这意味着始终存在该偏差。

Below, I seed this 3-layer neural network with a random set of weights.

下面,我用随机的一组权重播种了这个3层神经网络。

1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
17
17
18
18
19
19
20
20

Above we have out network. The depiction of and are confusing. -0.8 belongs to . -0.5 belongs to .

上面我们有网络。 和的描述令人困惑。 -0.8属于。 -0.5属于。

Lets go through one example of our network receiving an input and making a guess. Lets say the input is [0 1]. This means and . The correct answer in this case is 1.

让我们来看一个网络接收输入并进行猜测的示例。 假设输入为[0 1]。 这意味着和。 在这种情况下,正确答案是1。

First, we have to calculate ’s input. Remember we can write input as

首先,我们必须计算的输入。 记住我们可以将输入写为

with the a bias we can rewrite it as

有了偏见,我们可以将其重写为

Specifically for

专为

Remember the first term in the equation above is the bias term. Lets see what this looks like in code.

请记住,上式中的第一项是偏置项。 让我们看看代码中的样子。

1
1
2
2
3
3
[-1.27669634 -1.07035845]
[-1.27669634 -1.07035845]
 

Note that by using np.dot, I can calculate both hidden unit’s input in a single line of code.

请注意,通过使用np.dot,我可以在一行代码中计算两个隐藏单元的输入。

Next, we have to find the activity of units in the hidden layer.

接下来,我们必须在隐藏层中查找单元的活动。

I will translate input into activity with a logistic function, as I did in the previous post.

与上一篇文章一样,我将使用后勤功能将输入转换为活动。

Lets see what this looks like in code.

让我们看看代码中的样子。

def logistic(x): #each neuron has a logistic activation function
    return 1.0/(1+np.exp(-x))

Hidden_Units = logistic(net_Hidden)
print Hidden_Units
def logistic ( x ): #each neuron has a logistic activation function
     return 1.0 / ( 1 + np . exp ( - x ))
 
 Hidden_Units = logistic ( net_Hidden )
 print Hidden_Units
 

So far so good, the logistic function has transformed the negative inputs into values near 0.

到目前为止,逻辑函数已将负输入转换为接近0的值。

Now we have to compute the output unit’s acitivity.

现在我们必须计算输出单元的活动性。

plugging in the numbers

插入数字

Now the code for computing and the Output unit’s activity.

现在是用于计算的代码和输出单元的活动。

1
1
2
2
3
3
4
4
5
5
6
6
net_Output
[-0.66626595]
Output
[ 0.33933346]
net_Output
[-0.66626595]
Output
[ 0.33933346]
 

Okay, thats the network’s guess for one input…. no where near the correct answer (1). Let’s look at what the network predicts for the other input patterns. Below I create a feedfoward, 2-layer neural network and plot the neural nets’ guesses to the four input patterns.

好的,这就是网络对一个输入的猜测……。 正确答案附近没有(1)。 让我们看一下网络对其他输入模式的预测。 下面,我创建一个前馈的2层神经网络,并将神经网络的猜测绘制到四个输入模式上。

def layer_InputOutput(Inputs,Weights): #find a layers input and activity
    Inputs_with_bias = np.append(Inputs,1.0) #input 1 for each unit's bias
    return logistic(np.dot(Inputs_with_bias,Weights.T))

def neural_net(Input,Weights_1,Weights_2,Training=False): #this function creates and runs the neural net    

    target = 1 #set target value
    if np.array(Input[0])==np.array([Input[1]]): target = 0 #change target value if needed

    #forward pass
    Hidden_Units = layer_InputOutput(Input,Weights_1) #find hidden unit activity
    Output = layer_InputOutput(Hidden_Units,Weights_2) #find Output layer actiity

    return {'output':Output,'target':target,'input':Input} #record trial output

Train_Set = [[1.0,1.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]] #the four input patterns
tempdict = {'output':[],'target':[],'input':[]} #data dictionary
temp = [neural_net(Input,Weights_1,Weights_2) for Input in Train_Set] #get the data
[tempdict[key].append([temp[x][key] for x in range(len(temp))]) for key in tempdict] #combine all the output dictionaries

plotter = np.ones((2,2))*np.reshape(np.array(tempdict['output']),(2,2))
plt.pcolor(plotter,vmin=0,vmax=1,cmap=plt.cm.bwr)
plt.colorbar(ticks=[0,0.25,0.5,0.75,1]);
plt.xlabel('Input 1')
plt.ylabel('Input 2')
plt.xticks([0.5,1.5], ['0','1'])
plt.yticks([0.5,1.5], ['0','1']);
def layer_InputOutput ( Inputs , Weights ): #find a layers input and activity
     Inputs_with_bias = np . append ( Inputs , 1.0 ) #input 1 for each unit's bias
     return logistic ( np . dot ( Inputs_with_bias , Weights . T ))
 
 def neural_net ( Input , Weights_1 , Weights_2 , Training = False ): #this function creates and runs the neural net    
 
     target = 1 #set target value
     if np . array ( Input [ 0 ]) == np . array ([ Input [ 1 ]]): target = 0 #change target value if needed
 
     #forward pass
     Hidden_Units = layer_InputOutput ( Input , Weights_1 ) #find hidden unit activity
     Output = layer_InputOutput ( Hidden_Units , Weights_2 ) #find Output layer actiity
 
     return { 'output' : Output , 'target' : target , 'input' : Input } #record trial output
 
 Train_Set = [[ 1.0 , 1.0 ],[ 0.0 , 1.0 ],[ 1.0 , 0.0 ],[ 0.0 , 0.0 ]] #the four input patterns
 tempdict = { 'output' :[], 'target' :[], 'input' :[]} #data dictionary
 temp = [ neural_net ( Input , Weights_1 , Weights_2 ) for Input in Train_Set ] #get the data
 [ tempdict [ key ] . append ([ temp [ x ][ key ] for x in range ( len ( temp ))]) for key in tempdict ] #combine all the output dictionaries
 
 plotter = np . ones (( 2 , 2 )) * np . reshape ( np . array ( tempdict [ 'output' ]),( 2 , 2 ))
 plt . pcolor ( plotter , vmin = 0 , vmax = 1 , cmap = plt . cm . bwr )
 plt . colorbar ( ticks = [ 0 , 0.25 , 0.5 , 0.75 , 1 ]);
 plt . xlabel ( 'Input 1' )
 plt . ylabel ( 'Input 2' )
 plt . xticks ([ 0.5 , 1.5 ], [ '0' , '1' ])
 plt . yticks ([ 0.5 , 1.5 ], [ '0' , '1' ]);
 

In the plot above, I have Input 1 on the x-axis and Input 2 on the y-axis. So if the Input is [0,0], the network produces the activity depicted in the lower left square. If the Input is [1,0], the network produces the activity depicted in the lower right square. If the network produces an output of 0, then the square will be blue. If the network produces an output of 1, then the square will be red. As you can see, the network produces all output between 0.25 and 0.5… no where near the correct answers.

在上图中,我在x轴上有输入1,在y轴上有输入2。 因此,如果输入为[0,0],则网络会产生左下角方框中所示的活动。 如果输入为[1,0],则网络会产生右下角所示的活动。 如果网络输出为0,则正方形将为蓝色。 如果网络输出为1,则正方形将为红色。 如您所见,网络产生的所有输出都在0.25到0.5之间……在正确答案附近没有。

So how do we update the weights in order to reduce the error between our guess and the correct answer?

那么,我们如何更新权重以减少猜测和正确答案之间的误差?

First, we will do backpropogation between the output and hidden layers. This is exactly the same as backpropogation in the previous post.

首先,我们将在输出层和隐藏层之间进行反向传播。 这与上一篇文章中的反向传播完全相同。

In the previous post I described how our goal was to decrease error by changing the weights between units. This is the equation we used to describe changes in error with changes in the weights. The equation below expresses changes in error with changes to weights between the and the Output unit.

在上一篇文章中,我描述了我们的目标是如何通过更改单位之间的权重来减少错误。 这是我们用来描述误差随权重变化的等式。 以下等式表示随着和输出单位之间权重的变化,误差的变化。

Now multiply this weight adjustment by the learning rate.

现在,将此权重调整值乘以学习率。

Finally, we apply the weight adjustment to .

最后,我们将权重调整应用于。

Now lets do the same thing, but for both the weights and in the code.

现在让我们做同样的事情,只是权重和代码都一样。

alpha = 0.5 #learning rate
target = 1 #target outpu

error = target - Output #amount of error
delta_out = np.atleast_2d(error*(Output*(1-Output))) #first two terms of error by weight derivative

Hidden_Units = np.append(Hidden_Units,1.0) #add an input of 1 for the bias
print Weights_2 + alpha*np.outer(delta_out,Hidden_Units) #apply weight change
alpha = 0.5 #learning rate
 target = 1 #target outpu
 
 error = target - Output #amount of error
 delta_out = np . atleast_2d ( error * ( Output * ( 1 - Output ))) #first two terms of error by weight derivative
 
 Hidden_Units = np . append ( Hidden_Units , 1.0 ) #add an input of 1 for the bias
 print Weights_2 + alpha * np . outer ( delta_out , Hidden_Units ) #apply weight change
 

The hidden layer changes things when we do backpropogation. Above, we computed the new weights using the output unit’s error. Now, we want to find how adjusting a weight changes the error, but this weight connects an input to the hidden layer rather than connecting to the output layer. This means we have to propogate the error backwards to the hidden layer.

当我们进行反向传播时,隐藏层会改变事物。 上面,我们使用输出单位的误差计算了新的权重。 现在,我们想找到调整权重如何改变误差的方法,但是该权重将输入连接到隐藏层而不是连接到输出层。 这意味着我们必须将错误向后传播到隐藏层。

We will describe backpropogation for the line connecting and as

我们将描述连接线的反向传播和

Pretty similar. We just replaced Output with . The interpretation (starting with the final term and moving left) is that changing the changes ’s input. Changing ’s input changes ’s activity. Changing ’s activity changes the error. This last assertion (the first term) is where things get complicated. Lets take a closer look at this first term

非常相似。 我们只是将Output替换为。 解释(从最后一项开始,然后向左移动)是更改更改的输入。 更改的输入会更改的活动。 更改的活动会更改错误。 最后一个断言(第一个术语)是使事情变得复杂的地方。 让我们仔细看看这个第一学期

Changing ’s activity changes changes the input to the Output unit. Changing the output unit’s input changes the error. hmmmm still not quite there yet. Lets look at how changes to the output unit’s input changes the error.

更改的活动会更改输入到输出单元的输入。 更改输出单元的输入会更改错误。 嗯,仍然还没有。 让我们看一下对输出单元输入的更改如何改变错误。

You can probably see where this is going. Changing the output unit’s input changes the output unit’s activity. Changing the output unit’s activity changes error. There we go.

您可能会看到前进的方向。 更改输出单元的输入会更改输出单元的活动。 更改输出单元的活动会更改错误。 好了

Okay, this got a bit heavy, but here comes some good news. Compare the two terms of the equation above to the first two terms of our original backpropogation equation. They’re the same! Now lets look at (the second term from the first equation after our new backpropogation equation).

好的,这有点沉重,但是这里有一些好消息。 将上面方程的两个项与我们原始反向传播方程的前两个项进行比较。 他们是一样的! 现在让我们看一下(在新的反向传播方程之后,第一个方程的第二项)。

Again, I am glossing over how to derive these partial derivatives. For a more complete explantion, I recommend Chapter 8 of Rumelhart and McClelland’s PDP book. Nonetheless, this means we can take the output of our function delta_output multiplied by and we have the first term of our backpropogation equation! We want to be the weight used in the forward pass. Not the updated weight.

再次,我掩盖了如何推导这些偏导数。 对于更完整的移植,我推荐Rumelhart和McClelland的PDP书的第8章 。 但是,这意味着我们可以将函数delta_output的输出乘以X,然后得到反向传播方程的第一项! 我们希望成为前进中使用的重量。 不是更新的重量。

The second two terms from our backpropogation equation are the same as in our original backpropogation equation.

反向传播方程式的后两项与原始反向传播方程式相同。

– this is specific to logistic activation functions.

–这特定于物流激活功能。

and

Lets try and write this out.

让我们尝试将其写出来。

It’s not short, but its doable. Let’s plug in the numbers.

它不短,但可行。 让我们插入数字。

Not too bad. Now lets see the code.

还不错 现在让我们看一下代码。

1
1
2
2
3
3
4
4
[[-0.25119612 -0.50149299 -0.77809147]
 [-0.80193714 -0.23946929 -0.84467792]]
[[-0.25119612 -0.50149299 -0.77809147]
 [-0.80193714 -0.23946929 -0.84467792]]
 

Alright! Lets implement all of this into a single model and train the model on the XOR problem. Below I create a neural network that includes both a forward pass and an optional backpropogation pass.

好的! 让我们将所有这些实现到一个模型中,并对XOR问题进行模型训练。 在下面,我创建了一个神经网络,其中既包含正向传递,也包含可选的反向传播传递。

def neural_net(Input,Weights_1,Weights_2,Training=False): #this function creates and runs the neural net    

    target = 1 #set target value
    if np.array(Input[0])==np.array([Input[1]]): target = 0 #change target value if needed

    #forward pass
    Hidden_Units = layer_InputOutput(Input,Weights_1) #find hidden unit activity
    Output = layer_InputOutput(Hidden_Units,Weights_2) #find Output layer actiity

    if Training == True:
        alpha = 0.5 #learning rate

        Weights_2 = np.atleast_2d(Weights_2) #make sure this weight vector is 2d.

        error = target - Output #error
        delta_out = np.atleast_2d(error*(Output*(1-Output))) #delta between output and hidden

        Hidden_Units = np.append(Hidden_Units,1.0) #append an input for the bias
        delta_hidden = delta_out.dot(np.atleast_2d(Weights_2))*(Hidden_Units*(1-Hidden_Units)) #delta between hidden and input

        Weights_2 += alpha*np.outer(delta_out,Hidden_Units) #update weights

        delta_hidden = np.delete(delta_hidden,2) #remove bias activity
        Weights_1 += alpha*np.outer(delta_hidden,np.append(Input,1.0))  #update weights

    if Training == False:
        return {'output':Output,'target':target,'input':Input} #record trial output
    elif Training == True:
        return {'Weights_1':Weights_1,'Weights_2':Weights_2,'target':target,'output':Output,'error':error}
def neural_net ( Input , Weights_1 , Weights_2 , Training = False ): #this function creates and runs the neural net    
 
     target = 1 #set target value
     if np . array ( Input [ 0 ]) == np . array ([ Input [ 1 ]]): target = 0 #change target value if needed
 
     #forward pass
     Hidden_Units = layer_InputOutput ( Input , Weights_1 ) #find hidden unit activity
     Output = layer_InputOutput ( Hidden_Units , Weights_2 ) #find Output layer actiity
 
     if Training == True :
         alpha = 0.5 #learning rate
 
         Weights_2 = np . atleast_2d ( Weights_2 ) #make sure this weight vector is 2d.
 
         error = target - Output #error
         delta_out = np . atleast_2d ( error * ( Output * ( 1 - Output ))) #delta between output and hidden
 
         Hidden_Units = np . append ( Hidden_Units , 1.0 ) #append an input for the bias
         delta_hidden = delta_out . dot ( np . atleast_2d ( Weights_2 )) * ( Hidden_Units * ( 1 - Hidden_Units )) #delta between hidden and input
 
         Weights_2 += alpha * np . outer ( delta_out , Hidden_Units ) #update weights
 
         delta_hidden = np . delete ( delta_hidden , 2 ) #remove bias activity
         Weights_1 += alpha * np . outer ( delta_hidden , np . append ( Input , 1.0 ))  #update weights
 
     if Training == False :
         return { 'output' : Output , 'target' : target , 'input' : Input } #record trial output
     elif Training == True :
         return { 'Weights_1' : Weights_1 , 'Weights_2' : Weights_2 , 'target' : target , 'output' : Output , 'error' : error }
 

Okay, thats the network. Below, I train the network until its answers are very close to the correct answer.

好的,那就是网络。 下面,我训练网络,直到其答案非常接近正确答案为止。

from random import choice
np.random.seed(seed=10) #seed random number generator for reproducibility

Weights_2 = np.random.rand(1,3)-0.5*2 #connections between hidden and output
Weights_1 = np.random.rand(2,3)-0.5*2 #connections between input and hidden

Weight_Dict = {'Weights_1':Weights_1,'Weights_2':Weights_2}

Train_Set = [[1.0,1.0],[0.0,0.0],[0.0,1.0],[1.0,0.0]] #train set

Error = []
while True: #train the neural net
    Train_Dict = neural_net(choice(Train_Set),Weight_Dict['Weights_1'],Weight_Dict['Weights_2'],Training=True)

    Error.append(abs(Train_Dict['error']))
    if len(Error) > 6 and np.mean(Error[-10:]) < 0.025: break #tell the code to stop iterating when recent mean error is small
from random import choice
 np . random . seed ( seed = 10 ) #seed random number generator for reproducibility
 
 Weights_2 = np . random . rand ( 1 , 3 ) - 0.5 * 2 #connections between hidden and output
 Weights_1 = np . random . rand ( 2 , 3 ) - 0.5 * 2 #connections between input and hidden
 
 Weight_Dict = { 'Weights_1' : Weights_1 , 'Weights_2' : Weights_2 }
 
 Train_Set = [[ 1.0 , 1.0 ],[ 0.0 , 0.0 ],[ 0.0 , 1.0 ],[ 1.0 , 0.0 ]] #train set
 
 Error = []
 while True : #train the neural net
     Train_Dict = neural_net ( choice ( Train_Set ), Weight_Dict [ 'Weights_1' ], Weight_Dict [ 'Weights_2' ], Training = True )
 
     Error . append ( abs ( Train_Dict [ 'error' ]))
     if len ( Error ) > 6 and np . mean ( Error [ - 10 :]) < 0.025 : break #tell the code to stop iterating when recent mean error is small
 

Lets see how error changed across training

让我们看看整个训练过程中错误的变化

Error_vec = np.array(Error)[:,0]
plt.plot(Error_vec)
plt.ylabel('Error')
plt.xlabel('Iteration #');
Error_vec = np . array ( Error )[:, 0 ]
 plt . plot ( Error_vec )
 plt . ylabel ( 'Error' )
 plt . xlabel ( 'Iteration #' );
 

Really cool. The network start with volatile error – sometimes being nearly correct ans sometimes being completely incorrect. Then After about 5000 iterations, the network starts down the slow path of perfecting an answer scheme. Below, I create a plot depicting the networks’ activity for the different input patterns.

真的很酷。 网络以易失性错误开始-有时几乎是正确的,有时是完全不正确的。 然后,经过约5000次迭代,网络开始完善答案方案的缓慢路径。 在下面,我创建了一个图,描述了不同输入模式下网络的活动。

Weights_1 = Weight_Dict['Weights_1']
Weights_2 = Weight_Dict['Weights_2']

Train_Set = [[1.0,1.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]] #train set

tempdict = {'output':[],'target':[],'input':[]} #data dictionary
temp = [neural_net(Input,Weights_1,Weights_2) for Input in Train_Set] #get the data
[tempdict[key].append([temp[x][key] for x in range(len(temp))]) for key in tempdict] #combine all the output dictionaries

plotter = np.ones((2,2))*np.reshape(np.array(tempdict['output']),(2,2))
plt.pcolor(plotter,vmin=0,vmax=1,cmap=plt.cm.bwr)
plt.colorbar(ticks=[0,0.25,0.5,0.75,1]);
plt.xlabel('Input 1')
plt.ylabel('Input 2')
plt.xticks([0.5,1.5], ['0','1'])
plt.yticks([0.5,1.5], ['0','1']);
Weights_1 = Weight_Dict [ 'Weights_1' ]
 Weights_2 = Weight_Dict [ 'Weights_2' ]
 
 Train_Set = [[ 1.0 , 1.0 ],[ 0.0 , 1.0 ],[ 1.0 , 0.0 ],[ 0.0 , 0.0 ]] #train set
 
 tempdict = { 'output' :[], 'target' :[], 'input' :[]} #data dictionary
 temp = [ neural_net ( Input , Weights_1 , Weights_2 ) for Input in Train_Set ] #get the data
 [ tempdict [ key ] . append ([ temp [ x ][ key ] for x in range ( len ( temp ))]) for key in tempdict ] #combine all the output dictionaries
 
 plotter = np . ones (( 2 , 2 )) * np . reshape ( np . array ( tempdict [ 'output' ]),( 2 , 2 ))
 plt . pcolor ( plotter , vmin = 0 , vmax = 1 , cmap = plt . cm . bwr )
 plt . colorbar ( ticks = [ 0 , 0.25 , 0.5 , 0.75 , 1 ]);
 plt . xlabel ( 'Input 1' )
 plt . ylabel ( 'Input 2' )
 plt . xticks ([ 0.5 , 1.5 ], [ '0' , '1' ])
 plt . yticks ([ 0.5 , 1.5 ], [ '0' , '1' ]);
 

Again, the Input 1 value is on the x-axis and the Input 2 value is on the y-axis. As you can see, the network guesses 1 when the inputs are different and it guesses 0 when the inputs are the same. Perfect! Below I depict the network with these correct weights.

同样,输入1的值在x轴上,输入2的值在y轴上。 如您所见,当输入不同时,网络猜测为1,而当输入相同时,网络猜测为0。 完善! 下面我用这些正确的权重描绘了网络。

Weight_Dict = {'Weights_1':Weights_1,'Weights_2':Weights_2}

network = NeuralNetwork()
network.add_layer(2,['Input 1','Input 2'],
                  [[round(x,2) for x in Weight_Dict['Weights_1'][0][:2]],
                   [round(x,2) for x in Weight_Dict['Weights_1'][1][:2]]])
network.add_layer(2,[round(Weight_Dict['Weights_1'][0][2],2),round(Weight_Dict['Weights_1'][1][2],2)],
                  [round(x,2) for x in Weight_Dict['Weights_2'][:2][0]])
network.add_layer(1,[round(Weight_Dict['Weights_2'][0][2],2)])
network.draw()
Weight_Dict = { 'Weights_1' : Weights_1 , 'Weights_2' : Weights_2 }
 
 network = NeuralNetwork ()
 network . add_layer ( 2 ,[ 'Input 1' , 'Input 2' ],
                   [[ round ( x , 2 ) for x in Weight_Dict [ 'Weights_1' ][ 0 ][: 2 ]],
                    [ round ( x , 2 ) for x in Weight_Dict [ 'Weights_1' ][ 1 ][: 2 ]]])
 network . add_layer ( 2 ,[ round ( Weight_Dict [ 'Weights_1' ][ 0 ][ 2 ], 2 ), round ( Weight_Dict [ 'Weights_1' ][ 1 ][ 2 ], 2 )],
                   [ round ( x , 2 ) for x in Weight_Dict [ 'Weights_2' ][: 2 ][ 0 ]])
 network . add_layer ( 1 ,[ round ( Weight_Dict [ 'Weights_2' ][ 0 ][ 2 ], 2 )])
 network . draw ()
 

The network finds a pretty cool solution. Both hidden units are relatively active, but one hidden unit sends a strong postitive signal and the other sends a strong negative signal. The output unit has a negative bias, so if neither input is on, it will have an activity around 0. If both Input units are on, then the hidden unit that sends a postitive signal will be inhibited, and the output unit will have activity near 0. Otherwise, the hidden unit with a positive signal gives the output unit an acitivty near 1.

网络找到了一个很酷的解决方案。 两个隐藏单元都相对活跃,但是一个隐藏单元发送一个强的正信号,另一个隐藏单元发送一个强的负信号。 输出单元具有负偏置,因此,如果两个输入均未打开,则其活动度约为0。如果两个输入单元均处于打开状态,则发送正信号的隐藏单元将被禁止,并且输出单元将具有活动性。接近0。否则,带有正信号的隐藏单元将使输出单元接近1。

This is all well and good, but if you try to train this network with random weights you might find that it produces an incorrect set of weights sometimes. This is because the network runs into a local minima. A local minima is an instance when any change in the weights would increase the error, so the network is left with a sub-optimal set of weights.

这一切都很好,但是,如果您尝试使用随机权重训练该网络,则可能会发现有时会产生一组不正确的权重。 这是因为网络遇到了本地最小值 。 局部极小值是权重的任何变化都会增加误差的一个实例,因此网络将剩下次优的权重集。

Below I hand-pick of set of weights that produce a local optima.

下面,我手动选择权重集以产生局部最优值。

Weights_2 = np.array([-4.5,5.3,-0.8]) #connections between hidden and output
Weights_1 = np.array([[-2.0,9.2,2.0],
                     [4.3,8.8,-0.1]])#connections between input and hidden

Weight_Dict = {'Weights_1':Weights_1,'Weights_2':Weights_2}

network = NeuralNetwork()
network.add_layer(2,['Input 1','Input 2'],
                  [[round(x,2) for x in Weight_Dict['Weights_1'][0][:2]],
                   [round(x,2) for x in Weight_Dict['Weights_1'][1][:2]]])
network.add_layer(2,[round(Weight_Dict['Weights_1'][0][2],2),round(Weight_Dict['Weights_1'][1][2],2)],
                  [round(x,2) for x in Weight_Dict['Weights_2'][:2]])
network.add_layer(1,[round(Weight_Dict['Weights_2'][2],2)])
network.draw()
Weights_2 = np . array ([ - 4.5 , 5.3 , - 0.8 ]) #connections between hidden and output
 Weights_1 = np . array ([[ - 2.0 , 9.2 , 2.0 ],
                      [ 4.3 , 8.8 , - 0.1 ]]) #connections between input and hidden
 
 Weight_Dict = { 'Weights_1' : Weights_1 , 'Weights_2' : Weights_2 }
 
 network = NeuralNetwork ()
 network . add_layer ( 2 ,[ 'Input 1' , 'Input 2' ],
                   [[ round ( x , 2 ) for x in Weight_Dict [ 'Weights_1' ][ 0 ][: 2 ]],
                    [ round ( x , 2 ) for x in Weight_Dict [ 'Weights_1' ][ 1 ][: 2 ]]])
 network . add_layer ( 2 ,[ round ( Weight_Dict [ 'Weights_1' ][ 0 ][ 2 ], 2 ), round ( Weight_Dict [ 'Weights_1' ][ 1 ][ 2 ], 2 )],
                   [ round ( x , 2 ) for x in Weight_Dict [ 'Weights_2' ][: 2 ]])
 network . add_layer ( 1 ,[ round ( Weight_Dict [ 'Weights_2' ][ 2 ], 2 )])
 network . draw ()
 

Using these weights as the start of the training set, lets see what the network will do with training.

使用这些权重作为训练集的开始,让我们看看网络将如何进行训练。

Train_Set = [[1.0,1.0],[0.0,0.0],[0.0,1.0],[1.0,0.0]] #train set

Error = []
while True:
    Train_Dict = neural_net(choice(Train_Set),Weight_Dict['Weights_1'],Weight_Dict['Weights_2'],Training=True)

    Error.append(abs(Train_Dict['error']))
    if len(Error) > 6 and np.mean(Error[-10:]) < 0.025: break

Error_vec = np.array(Error)[:]
plt.plot(Error_vec)
plt.ylabel('Error')
plt.xlabel('Iteration #');
Train_Set = [[ 1.0 , 1.0 ],[ 0.0 , 0.0 ],[ 0.0 , 1.0 ],[ 1.0 , 0.0 ]] #train set
 
 Error = []
 while True :
     Train_Dict = neural_net ( choice ( Train_Set ), Weight_Dict [ 'Weights_1' ], Weight_Dict [ 'Weights_2' ], Training = True )
 
     Error . append ( abs ( Train_Dict [ 'error' ]))
     if len ( Error ) > 6 and np . mean ( Error [ - 10 :]) < 0.025 : break
 
 Error_vec = np . array ( Error )[:]
 plt . plot ( Error_vec )
 plt . ylabel ( 'Error' )
 plt . xlabel ( 'Iteration #' );
 

As you can see the network never reduces error. Let’s see how the network answers to the different input patterns.

如您所见,网络永远不会减少错误。 让我们看看网络如何响应不同的输入模式。

Weights_1 = Weight_Dict['Weights_1']
Weights_2 = Weight_Dict['Weights_2']

Train_Set = [[1.0,1.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]] #train set

tempdict = {'output':[],'target':[],'input':[]} #data dictionary
temp = [neural_net(Input,Weights_1,Weights_2) for Input in Train_Set] #get the data
[tempdict[key].append([temp[x][key] for x in range(len(temp))]) for key in tempdict] #combine all the output dictionaries

plotter = np.ones((2,2))*np.reshape(np.array(tempdict['output']),(2,2))
plt.pcolor(plotter,vmin=0,vmax=1,cmap=plt.cm.bwr)
plt.colorbar(ticks=[0,0.25,0.5,0.75,1]);
plt.xlabel('Input 1')
plt.ylabel('Input 2')
plt.xticks([0.5,1.5], ['0','1'])
plt.yticks([0.5,1.5], ['0','1']);
Weights_1 = Weight_Dict [ 'Weights_1' ]
 Weights_2 = Weight_Dict [ 'Weights_2' ]
 
 Train_Set = [[ 1.0 , 1.0 ],[ 0.0 , 1.0 ],[ 1.0 , 0.0 ],[ 0.0 , 0.0 ]] #train set
 
 tempdict = { 'output' :[], 'target' :[], 'input' :[]} #data dictionary
 temp = [ neural_net ( Input , Weights_1 , Weights_2 ) for Input in Train_Set ] #get the data
 [ tempdict [ key ] . append ([ temp [ x ][ key ] for x in range ( len ( temp ))]) for key in tempdict ] #combine all the output dictionaries
 
 plotter = np . ones (( 2 , 2 )) * np . reshape ( np . array ( tempdict [ 'output' ]),( 2 , 2 ))
 plt . pcolor ( plotter , vmin = 0 , vmax = 1 , cmap = plt . cm . bwr )
 plt . colorbar ( ticks = [ 0 , 0.25 , 0.5 , 0.75 , 1 ]);
 plt . xlabel ( 'Input 1' )
 plt . ylabel ( 'Input 2' )
 plt . xticks ([ 0.5 , 1.5 ], [ '0' , '1' ])
 plt . yticks ([ 0.5 , 1.5 ], [ '0' , '1' ]);
 

Looks like the network produces the correct answer in some cases but not others. The network is particularly confused when Inputs 2 is 0. Below I depict the weights after “training.” As you can see, they have not changed too much from where the weights started before training.

看起来网络在某些情况下会给出正确的答案,而在其他情况下却不会。 当输入2为0时,网络特别混乱。下面我描述了“训练”之后的权重。 如您所见,从训练之前权重开始,它们没有太大变化。

Weights_1 = Weight_Dict['Weights_1']
Weights_2 = Weight_Dict['Weights_2']

Weight_Dict = {'Weights_1':Weights_1,'Weights_2':Weights_2}

network = NeuralNetwork()
network.add_layer(2,['Input 1','Input 2'],
                  [[round(x,2) for x in Weight_Dict['Weights_1'][0][:2]],
                   [round(x,2) for x in Weight_Dict['Weights_1'][1][:2]]])
network.add_layer(2,[round(Weight_Dict['Weights_1'][0][2],2),round(Weight_Dict['Weights_1'][1][2],2)],
                  [round(x,2) for x in Weight_Dict['Weights_2'][:2]])
network.add_layer(1,[round(Weight_Dict['Weights_2'][2],2)])
network.draw()
Weights_1 = Weight_Dict [ 'Weights_1' ]
 Weights_2 = Weight_Dict [ 'Weights_2' ]
 
 Weight_Dict = { 'Weights_1' : Weights_1 , 'Weights_2' : Weights_2 }
 
 network = NeuralNetwork ()
 network . add_layer ( 2 ,[ 'Input 1' , 'Input 2' ],
                   [[ round ( x , 2 ) for x in Weight_Dict [ 'Weights_1' ][ 0 ][: 2 ]],
                    [ round ( x , 2 ) for x in Weight_Dict [ 'Weights_1' ][ 1 ][: 2 ]]])
 network . add_layer ( 2 ,[ round ( Weight_Dict [ 'Weights_1' ][ 0 ][ 2 ], 2 ), round ( Weight_Dict [ 'Weights_1' ][ 1 ][ 2 ], 2 )],
                   [ round ( x , 2 ) for x in Weight_Dict [ 'Weights_2' ][: 2 ]])
 network . add_layer ( 1 ,[ round ( Weight_Dict [ 'Weights_2' ][ 2 ], 2 )])
 network . draw ()
 

This network was unable to push itself out of the local optima. While local optima are a problem, they’re are a couple things we can do to avoid them. First, we should always train a network multiple times with different random weights in order to test for local optima. If the network continually finds local optima, then we can increase the learning rate. By increasing the learning rate, the network can escape local optima in some cases. This should be done with care though as too big of a learning rate can also prevent finding the global minima.

该网络无法将自身推出局部最优状态。 尽管局部最优是一个问题,但是我们可以采取一些措施来避免它们。 首先,我们应该始终使用不同的随机权重多次训练网络,以测试局部最优。 如果网络不断找到局部最优值,那么我们可以提高学习率。 通过提高学习率,网络在某些情况下可以逃脱局部最优。 这应该谨慎进行,尽管学习率过高也会阻止找到全局最小值。

翻译自: https://www.pybloggers.com/2016/05/an-introduction-to-neural-networks-part-2/

dnn:深度神经网络

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值