Backpropagation 总结

48 篇文章 0 订阅

参照代码理解,比较直观:

        # These are the weights between the input layer and the hidden layer.
        self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))
    
        # These are the weights between the hidden layer and the output layer.
        self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5, 
                                                (self.hidden_nodes, self.output_nodes))
        
        # The input layer, a two-dimensional matrix with shape 1 x input_nodes
        self.layer_0 = np.zeros((1,input_nodes))

一共三层:weights_0_1[input_nodes,hidden_nodes],weights_1_2[hidden_nodes,output_nodes]

            
            #### Implement the forward pass here ####
            ### Forward pass ###

            # Input Layer
            self.update_input_layer(review)

            # Hidden layer
            layer_1 = self.layer_0.dot(self.weights_0_1)

            # Output layer
            layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))

输入和权重点乘:weights中每一列对应的是一个节点和前一层所有节点的权重
[1,input_nodes] · [input_nodes,hidden_nodes] = [1,hidden_nodes]

            #### Implement the backward pass here ####
            ### Backward pass ###

            # Output error
            layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
            layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

            # Backpropagated error
            layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
            layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

反向计算误差:
先有输出层误差,关注如何从i层误差到i-1层误差:
经过激活函数,无法相当于经过一个放大器:layer_2_error * self.sigmoid_output_2_derivative(layer_2)
i-1误差,考虑一个节点,误差来自于i层所有节点误差加权求和,
比如weights_1_2[hidden_nodes,output_nodes],求第1个节点误差,对应的权重weights_1_2[1,output_nodes],即第一行,所有计算方式为:layer_1_error = layer_2_delta.dot(self.weights_1_2.T)

            # Update the weights
            self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
            self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step

得到了每一层的误差delta之后,下面就考虑如何更新权重?
从现有数据怎么得到 ∇ E \nabla E E/ ∇ W \nabla W W?
在这里插入图片描述
V k V^{k} Vk为第k层的值,未添加激活函数的k层节点值
在这里插入图片描述
前半部分通过反向传播一个一个计算,已经计算出来了。
[hidden_nodes,1] · [1,output_nodes] = [hidden_nodes,output_nodes]

layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)
self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate

采取向量的方式理解也比较直观

在这里插入图片描述
∂ E R / ∂ W ( l ) = δ ( l + 1 ) ( a ( l ) ) T ∂ER/∂W^{(l)}=δ^{(l+1)} (a^{(l)})^T ER/W(l)=δ(l+1)(a(l))T
δ ( l ) = ( ( W ( l ) ) T δ ( l + 1 ) ) ∘ f ′ ( a ( l ) ) δ^{(l)}=((W^{(l)})^{T}δ^{(l+1)})∘f′(a^{(l)}) δ(l)=((W(l))Tδ(l+1))f(a(l))这个公式网上好多地方写错了.

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值