机器学习深度学习基础笔记(2)——梯度下降之手写数字识别算法实现

最新推荐文章于 2024-03-15 21:11:15 发布

明夏小斯

最新推荐文章于 2024-03-15 21:11:15 发布

阅读量3k

点赞数 3

分类专栏： tensorflow 文章标签：深度学习机器学习算法手写数字识别梯度下降算法

本文链接：https://blog.csdn.net/qq_17105473/article/details/72416821

版权

tensorflow 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

该系列是笔者在机器学习深度学习系列课程学习过程中记录的笔记，简单粗暴，仅供参考。
下面的算法代码来自https://github.com/mnielsen/neural-networks-and-deep-learning
再次强调，代码不是笔者自己写的，是从上面的链接下载的！

实现一个手写数字识别的算法

1.训练数据

MNIST数据集：
训练集(train)：50000——用于训练
验证集(validation)：10000——用于训练中的自测
测试集(test)：10000——用于测试

2.神经网络初始化

class Network(object):

    def __init__(self, sizes):

        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x)
                        for x, y in zip(sizes[:-1], sizes[1:])]

解释：
init：只要实例化一个类，总要运行init，python的构造函数，类似于Java里面类的构造函数。
self：索引到当前类，类似于java中的this
sizes：神经元有几层及每层个数，eg:net=Network([ 2, 3, 1])#第一层2个神经元，第二层3个，第三层1个
num_layers=len(sizes)：神经网络层数
biases：偏移量初始化，0~1之间随机选取，一个神经元需要一个biases
weights：权重初始化，0~1之间随机选取，一个箭头对应一个权重
random：随机

为了便于理解此处单独运行如下代码：

sizes=[2,3,1]
bias=[np.random.randn(y, 1) for y in sizes[1:]]
print(bias)

运行结果：返回两个list，一个3×1和一个1×1的list。//np.random.randn(y, 1)可以理解为返回一个 y 行 1列 的list，list的值采用高斯分布随机赋值

[array([[-0.2310922 ],
       [-0.33350782],
       [ 0.88558646]]), array([[ 1.51042319]])]

Process finished with exit code 0

sizes[1:]：就是size中除了第一个数外的，后面所有的数
sizes[:-1]：就是size中除了最后一个数外的，前面所有的数
x, y in zip(sizes[:-1], sizes[1:])]：x，y分别取zip中的两个值，
net.weights[1]：存储连接第二层和第三层间的权重

以下图为例，初始化一个如下图那样的神经网络，运行一下开头那部分代码：

这里写图片描述

import numpy as np

class Network(object):

    def __init__(self, sizes):
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x)
                        for x, y in zip(sizes[:-1], sizes[1:])]
net = Network([2,3,1])
print(net.num_layers)
print(net.sizes)
print("偏移量：")
print(net.biases)
print("权重：")
print(net.weights)

运行结果：

3
[2, 3, 1]
偏移量：
[array([[ 0.72072723],
       [ 1.02129651],
       [ 0.0451003 ]]), array([[ 0.89568534]])]
权重：
[array([[ 0.35048635,  1.582825  ],
       [-0.6184383 ,  1.03039687],
       [-1.22620262, -0.48511089]]), array([[ 1.51702976, -0.59924277,  0.07869854]])]

Process finished with exit code 0

3.前向传播

a' = σ (w a + b)

$a^{'}=\sigma (wa+b)$
PS：

σ $\sigma$ （）就是上一节笔记中的sigma函数

定义一个前向传播的神经网络叫feedforward

def feedforward(self, a):
    """Return the output of the network if ``a`` is input."""
    for b, w in zip(self.biases, self.weights):
        a = sigmoid(np.dot(w, a)+b)
    return a

解释：
dot(w, a)：向量w和向量a作点乘运算

4.随机梯度下降

    def SGD(self, training_data, epochs, mini_batch_size, eta,
            test_data=None):
        """Train the neural network using mini-batch stochastic
        gradient descent.  The ``training_data`` is a list of tuples
        ``(x, y)`` representing the training inputs and the desired
        outputs.  The other non-optional parameters are
        self-explanatory.  If ``test_data`` is provided then the
        network will be evaluated against the test data after each
        epoch, and partial progress printed out.  This is useful for
        tracking progress, but slows things down substantially."""

    if test_data: n_test = len(test_data)
    n = len(training_data)
    for j in xrange(epochs):
        random.shuffle(training_data)
        mini_batches = [
            training_data[k:k + mini_batch_size]
            for k in xrange(0, n, mini_batch_size)]
        for mini_batch in mini_batches:
            self.update_mini_batch(mini_batch, eta)
        if test_data:
            print
            "Epoch {0}: {1} / {2}".format(
                j, self.evaluate(test_data), n_test)
        else:
            print
            "Epoch {0} complete".format(j)

解释：
training_data：一个list，包括了许多tuples，每一个tuple对应一个实例 (x,y)，x是输入，y是输出，以手写数字图片为例的话，x就代表784维的向量，y代表10维的向量。
epochs：训练轮数，根据先验知识和神经网络以及数据来设定的。
mini_batch_size：每一小块包含的实例数量。
eta：学习率
test_data=None：测试集，默认为空
n_test：测试集大小，即有多少张图片
n：训练集大小
j：代表第几轮
xrange(epochs)：0~epochs
shuffle：洗牌，随机打乱
for k in xrange(0, n, mini_batch_size)]：0~n，每次间隔mini_batch_size。eg：mini_batch_size是100的话，[k:k + mini_batch_size]就是0~100，100~200，200~300…
Epoch {0}: {1} / {2}中的0, 1, 2分别对应——(j, self.evaluate(test_data), n_test)

5.权重和偏移量的更新

w k \to w' k = w k - η m \sum j \partial C x \partial w k

$w_{k}\rightarrow w_{k}^{'}=w_{k}-\frac{\eta}{m}\sum_{j} \frac{\partial C_{x}}{\partial w_{k}}$

b l \to b' l = b l - η m \sum j \partial C x \partial b l

$b_{l}\rightarrow b_{l}^{'}=b_{l}-\frac{\eta}{m}\sum_{j} \frac{\partial C_{x}}{\partial b_{l}}$

def update_mini_batch(self, mini_batch, eta):
        """Update the network's weights and biases by applying
        gradient descent using backpropagation to a single mini batch.
        The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``
        is the learning rate."""
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        for x, y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
        self.weights = [w-(eta/len(mini_batch))*nw
                        for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b-(eta/len(mini_batch))*nb
                       for b, nb in zip(self.biases, nabla_b)]

解释：
nabla_b，nabla_w：初始化两个新的矩阵，形状和biases、weights一模一样。
backprop()：快速求偏导的一个方法。
delta_nabla_b, delta_nabla_w = self.backprop(x, y)：代入x和y之后，求nabla_b，nabla_w的偏导
nabla_b，nabla_w：把所有从每一对(x,y)求得的delta_nabla_b, delta_nabla_w 都累加起来
eta：学习率
最后两句就是更新权重和偏移量

关于backprop()在之后的笔记里会详细记录。

明夏小斯

关注

3
点赞
踩
22

收藏

觉得还不错? 一键收藏
0
评论
机器学习深度学习基础笔记(2)——梯度下降之手写数字识别算法实现

该系列是笔者在机器学习深度学习系列课程学习过程中记录的笔记，简单粗暴，仅供参考。下面的算法代码来自https://github.com/mnielsen/neural-networks-and-deep-learning再次强调，代码不是笔者自己写的，是从上面的链接下载的！实现一个手写数字识别的算法
复制链接

扫一扫