神经网络与深度学习第二章反向传播算法(两个假设、四个基本方程及其证明、代码及注释)

最新推荐文章于 2024-07-28 03:59:40 发布

土豆拍死马铃薯

最新推荐文章于 2024-07-28 03:59:40 发布

阅读量3.3k

点赞数 7

分类专栏：神经网络与深度学习文章标签：神经网络深度学习反向传播方程证明

本文链接：https://blog.csdn.net/csj941227/article/details/77150000

版权

神经网络与深度学习专栏收录该内容

8 篇文章 1 订阅

订阅专栏

2.1 热身：神经网络中使用矩阵快速计算输出的方法

2.2 关于代价函数的两个假设

2.3 Hadamard乘积 s⊙t

2.4 反向传播的四个基本方程

2.5 四个基本方程的证明

2.6 反向传播算法

2.7 代码

2.8 在哪种层面上，反向传播是快速的算法？

2.9 反向传播：全局观

2.7 代码：

    def update_mini_batch(self, mini_batch, eta):
        """
        基于反向传播的简单梯度下降算法更新网络的权重和偏置
        :param mini_batch: 最小训练集
        :param eta: 学习速率
        :return:
        """
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        """
        
        """

        """
        运行
        for b in self.biases:
            print("b.shape=",b.shape)
        输出(30，1) (10，1)
        np.zeros((a b))为生成a行b列的数组且每一个元素为0
        所以依次生成一个30行1列的数组和一个10行1列的数组，存放到nabla_b中
        nabla_b[0]为30行1列的数组，每一个元素为0
        nabla_b[1]为10行1列的数组，每一个元素为0
        """

        nabla_w = [np.zeros(w.shape) for w in self.weights]

        """
        同理
         nabla_w[0]为30行784列的数组，每一个元素为0
        nabla_w[1]为10行30列的数组，每一个元素为0
        """
        for x, y in mini_batch:
        #对于最小训练集中的每一个训练数据x及其正确分类y
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            """
            这行调用了一个称为反向传播的算法，一种快速计算代价函数的梯度的方法。
            delta_nabla_b[0]与biases[0]和nabla_b[0]一样为30*1数组（向量）
            delta_nabla_b[1]与biases[1]和nabla_b[1]一样为10*1数组（向量）
            delta_nabla_w[0]与weights[0]和nabla_w[0]一样为30*784数组（向量）
            delta_nabla_w[1]与weights[1]和nabla_w[1]一样为10*30数组（向量）
            """
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            #nabla_b中的每一个即为∂C/∂b
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
        # nabla_b中的每一个即为∂C/∂w
        self.weights = [w-(eta/len(mini_batch))*nw
                        for w, nw in zip(self.weights, nabla_w)]
        #更新权重向量组
        self.biases = [b-(eta/len(mini_batch))*nb
                       for b, nb in zip(self.biases, nabla_b)]
        #更新偏置向量组


    def backprop(self, x, y):
    # 反向传播的算法，一种快速计算代价函数的梯度的方法。
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        # feedforward
        activation = x
        #激活值
        activations = [x] # list to store all the activations, layer by layer
        #各层激活值集合
        zs = [] # list to store all the z vectors, layer by layer
        #z值集合
        for b, w in zip(self.biases, self.weights):
            #对于每一层的偏置和权重
            z = np.dot(w, activation)+b
            #计算z=w*a+b
            zs.append(z)
            #将z加入z值集合
            activation = sigmoid(z)
            #计算激活值σ（z）
            activations.append(activation)
            #将激活值加入激活值集合
        #至此计算最后的输出
        # backward pass
        delta = self.cost_derivative(activations[-1], y) * \
            sigmoid_prime(zs[-1])
        #计算δjL=∂C/∂ajL * σ'(ZjL)
        nabla_b[-1] = delta
        #∂C/∂bjl = δjL
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
        #∂C/∂wjkl = ak(l-1) * δjL
        #transpose函数为转置
        # Note that the variable l in the loop below is used a little
        # differently to the notation in Chapter 2 of the book.  Here,
        # l = 1 means the last layer of neurons, l = 2 is the
        # second-last layer, and so on.  It's a renumbering of the
        # scheme in the book, used here to take advantage of the fact
        # that Python can use negative indices in lists.
        for l in range(2, self.num_layers):
            z = zs[-l]
            sp = sigmoid_prime(z)
            #sp=σ'(ZjL)
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
            #δL=((w(l+1)T  δL+1) ) ⊙ σ'(ZjL)
            nabla_b[-l] = delta
            # ∂C/∂bjl = δjL
            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
            # ∂C/∂wjkl = ak(l-1) * δjL
        return (nabla_b, nabla_w)


    def cost_derivative(self, output_activations, y):
        """二次代价函数为:
        C=1/2 * ∑(Yj-Aj)^2
        ∂C/∂Aj = Aj-Yj
        也就是说二次代价函数对输出激活值的偏导数=输出激活值-正确分类值
        """
        return (output_activations-y)