全连接神经网络（numpy实现和公式推导） mnist数据集

最新推荐文章于 2024-04-17 20:42:48 发布

书剑与酒

最新推荐文章于 2024-04-17 20:42:48 发布

阅读量721

点赞数 1

文章标签：线性代数算法深度学习神经网络

本文链接：https://blog.csdn.net/weixin_42217488/article/details/106575966

版权

1、原理与实现

1.1 feedforward函数

对于l层神经网络的前向传播计算如下：
$\begin{aligned} & a^{0} = x = z^{0} \\ & z^{1} =w^{1}a^{0}+b^{1} \\ & a^{1} = sigmoid(z^{1}) \\ & z^{2} =w^{2}a^{1}+b^{2} \\ & a^{2} = sigmoid(z^{2}) \\ & \dots \\ & z^{l} = w^{l}a^{l-1}+b^{l} \\ & a^{l} = sigmoid(z^{l}) \end{aligned}$ 因为在求梯度的时候会用到 $a^{i}$ ,所以存储了前向传播中的 $a^{i}$ 。

    def feedforward(self, a):
        """Return the output of the network if ``a`` is input."""
        a_i = a
        self.a_stock = [a_i]
        for w_i, b_i in zip(self.weights, self.biases):
            a_i = sigmoid(np.dot(w_i, a_i) + b_i)
            self.a_stock.append(a_i)
        return a_i

1.2 backprop函数

反向传播计算梯度：
loss采用均方差:
$\begin{aligned} \text{cost function} & = \frac{1}{2}||y-a^{l}||^{2} \\ & = \frac{1}{2}\sum_{i}^{n}(y_{i}-a_{i}^{l})^{2} \\ \end{aligned}$
通过递归的方法求梯度：
$\begin{aligned} \nabla_{z^{l}}C & =\left( \begin{array}{cccc} (y_{1}-a_{1}^{l})(-sigmoid^{'}(z_{1}^{l})) \\ (y_{2}-a_{2}^{l})(-sigmoid^{'}(z_{2}^{l}))\\ \dots \\ (y_{n}-a_{n}^{l})(-sigmoid^{'}(z_{n}^{l})) \end{array} \right) \\ & = \left( \begin{array}{cccc} &sigmoid^{'}(z_{1}^{l}) & 0 &\dots &0\\ &0 &sigmoid^{'}(z_{2}^{l}) &\dots &0 \\ &\vdots &\vdots &\vdots &\vdots \\ &0 &0 &\dots &sigmoid^{'}(z_{n}^{l}) \end{array} \right)(a^{l}-y)\\ &=D^{l}(a^{l}-y) \end{aligned}$
根据定义求 $\nabla_{z^{l-1}}C$
$\begin{aligned} 设 C = f(z^{l})=f(w^{l}\sigma(z^{l-1})+b^{l})) \\ 则 f(w^{l}\sigma(z^{l-1}+h)+b^{l})) - f(w^{l}\sigma(z^{l-1}+h)+b^{l}))&=<\nabla_{z^{l}}C,w^{l}(\sigma(z^{l-1}+h)-\sigma(z^{l-1}))> + O||\sigma(z^{l-1}+h)-\sigma(z^{l-1})|| \\ & =<\nabla_{z^{l}}C,w^{l}D^{l-1}h> \\ & = tr(\nabla_{z^{l}}C^{T}w^{l}D^{l-1}h)\\ & = <(\nabla_{z^{l}}C^{T}w^{l}D^{l-1})^{T},h> \\ \nabla_{z^{l-1}}C = (D^{l-1})^{T}(w^{l})^{T}\nabla_{z^{l}}C \end{aligned}$
同理可得：
$\begin{aligned} & \nabla_{w^{l}}C = \nabla_{z^{l}}C(a^{l-1})^{T}\\ & \nabla_{b^{l}}C = \nabla_{z^{l}}C \end{aligned}$
所以可以递归求得所有参数的梯度：
$\begin{aligned} & \nabla_{z^{l}}C = D^{l}(a^{l}-y) \\ & \nabla_{w^{l}}C = \nabla_{z^{l}}C(a^{l-1})^{T}\\ & \nabla_{b^{l}}C = \nabla_{z^{l}}C \\ & \nabla_{z^{l-1}}C = (D^{l-1})^{T}(w^{l})^{T}\nabla_{z^{l}}C & \nabla_{w^{l-1}}C = \nabla_{z^{l-1}}C(a^{l-2})^{T}\\ & \nabla_{b^{l-1}}C = \nabla_{z^{l-1}}C \\ & \dots \\ & \nabla_{z^{1}}C = (D^{1})^{T}(w^{2})^{T}\nabla_{z^{2}}C \\ & \nabla_{w^{1}}C = \nabla_{z^{1}}C(a^{0})^{T}\\ & \nabla_{b^{1}}C = \nabla_{z^{1}}C \\ \end{aligned}$
代码实现如下：

    def cost_derivative(self, output_activations, y):
        """Return the vector of partial derivatives for the output.
        Assume the loss is quadratic loss 1/2 || output_activations-y ||^2
        """
        cos_deri = np.dot(np.diag(sigmoid_prime(output_activations)), (output_activations - y).T)
        return np.reshape(cos_deri, (cos_deri.shape[0], 1))  # compute the gradient here
    
    def backprop(self, x, y):
        """Return a tuple ``(nabla_b, nabla_w)`` representing the
        gradient for the cost function C_x.  ``nabla_b`` and
        ``nabla_w`` are layer-by-layer lists of numpy arrays, similar
        to ``self.biases`` and ``self.weights``."""
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        output_activations = self.feedforward(x)
        nabla_zi = self.cost_derivative(output_activations, y)
        assert nabla_zi.shape == nabla_b[-1].shape, "程序错误{}{}".format(nabla_b[-1].shape, nabla_zi.shape)
        nabla_b[-1] = nabla_zi
        assert nabla_w[-1].shape == np.dot(nabla_zi, self.a_stock[-2].T).shape, \
            "程序错误{}".format(nabla_w[-1].shape, np.dot(nabla_zi, self.a_stock[-2].T).shape)
        nabla_w[-1] = nabla_zi @ self.a_stock[-2].T
        for i in range(len(nabla_b) - 2, -1, -1):
            nabla_zi = np.diag(sigmoid_prime(self.a_stock[i + 1])[:, 0]) @ self.weights[i + 1].T @ nabla_zi
            assert nabla_zi.shape == nabla_b[i].shape, "程序错误{}".format(nabla_b[i].shape, nabla_zi.shape)
            nabla_b[i] = nabla_zi
            assert nabla_w[i].shape == np.dot(nabla_zi, self.a_stock[i].T).shape, \
                "程序错误{}".format(nabla_w[i].shape, np.dot(nabla_zi, self.a_stock[i].T).shape)
            nabla_w[i] = np.dot(nabla_zi, self.a_stock[i].T)
        ## to be finished
        return (nabla_b, nabla_w)

1.3 update_mini_batch函数

利用mini_batch 随机梯度下降法更新参数

    def update_mini_batch(self, mini_batch, eta):
        """Update the network's weights and biases by applying
        gradient descent using backpropagation to a single mini batch.
        The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``
        is the learning rate."""
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        for x, y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            nabla_b = [nb + delta_b for nb, delta_b in
                       zip(nabla_b, delta_nabla_b)]  # gradient computation of b in the mini_batch
            nabla_w = [nw + delta_w for nw, delta_w in
                       zip(nabla_w, delta_nabla_w)]  # gradient computation of w in the mini_batch

        self.weights = [w - eta * nw / len(mini_batch) for w, nw in
                        zip(self.weights, nabla_w)]  # sgd step update weights w
        self.biases = [b - eta * nb / len(mini_batch) for b, nb in
                       zip(self.biases, nabla_b)]  # sgd step update biases b

2、测试

2.1 网络设置

网络设置为：
[784, 50,30, 10]

2.2 学习率

在这里插入图片描述

2.3 对比tensorflow的实现

tf的版本：1.15.0
全连接网络设置：[784, 50,30, 10]
loss：mean_squared_error
优化方法：GradientDescentOptimizer
代码实现在：tf_mnist.py
运行效果对比：
epoch=10；minibatch=15；learning rate=1
在这里插入图片描述
平均运行时间对比
基于numpy的实现：532.818
基于tensorflow的实现：419.38s