【计算机视觉】Mnist手写体训练

最新推荐文章于 2021-11-22 21:55:34 发布

来杯金桔柠檬

最新推荐文章于 2021-11-22 21:55:34 发布

阅读量352

点赞数 1

本文链接：https://blog.csdn.net/weixin_42648848/article/details/91356023

版权

本文介绍了MNIST手写数字数据集，包括数据来源及结构，并详细讲解了经典的LeNet-5网络模型，探讨其在手写体识别中的应用。通过实际代码展示训练过程，最终实现高达99%的识别准确率。

摘要由CSDN通过智能技术生成

一、原理

1. Mnist数据集简介

MNIST 数据集可在 http://yann.lecun.com/exdb/mnist/ 获取, 它包含了四个部分:

Training set images: train-images-idx3-ubyte.gz (9.9 MB, 解压后 47 MB, 包含 60,000 个样本)
Training set labels: train-labels-idx1-ubyte.gz (29 KB, 解压后 60 KB, 包含 60,000 个标签)
Test set images: t10k-images-idx3-ubyte.gz (1.6 MB, 解压后 7.8 MB, 包含 10,000 个样本)
Test set labels: t10k-labels-idx1-ubyte.gz (5KB, 解压后 10 KB, 包含 10,000 个标签)

MNIST 数据集来自美国国家标准与技术研究所, National Institute of Standards and Technology (NIST). 训练集 (training set) 由来自 250 个不同人手写的数字构成, 其中 50% 是高中学生, 50% 来自人口普查局 (the Census Bureau) 的工作人员. 测试集(test set) 也是同样比例的手写数字数据.

2.LeNet

在这里插入图片描述
LeNet-5 这个网络虽然很小，但是它包含了深度学习的基本模块：卷积层，池化层，全连接层。是其他深度学习模型的基础，这里我们对LeNet-5进行深入分析。同时，通过实例分析，加深对与卷积层和池化层的理解。

在这里插入图片描述
LeNet-5共有7层，不包含输入，每层都包含可训练参数；每个层有多个Feature Map，每个FeatureMap通过一种卷积滤波器提取输入的一种特征，然后每个FeatureMap有多个神经元。
详情请参考：https://www.cnblogs.com/duanhx/articles/9655228.html

二、代码

trainMnistFromImages.py

import os 
import cv2 
import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

sess = tf.InteractiveSession()


def getTrain():
    train=[[],[]] # 指定训练集的格式，一维为输入数据，一维为其标签
    # 读取所有训练图像，作为训练集
    train_root="mnist_train" 
    labels = os.listdir(train_root)
    for label in labels:
        imgpaths = os.listdir(os.path.join(train_root,label))
        for imgname in imgpaths:
            img = cv2.imread(os.path.join(train_root,label,imgname),0)
            array = np.array(img).flatten() # 将二维图像平铺为一维图像
            array=MaxMinNormalization(array)
            train[0].append(array)
            label_ = [0,0,0,0,0,0,0,0,0,0]
            label_[int(label)] = 1
            train[1].append(label_)
    train = shuff(train)
    return train

def getTest():
    test=[[],[]] # 指定训练集的格式，一维为输入数据，一维为其标签
    # 读取所有训练图像，作为训练集
    test_root="mnist_test" 
    labels = os.listdir(test_root)
    for label in labels:
        imgpaths = os.listdir(os.path.join(test_root,label))
        for imgname in imgpaths:
            img = cv2.imread(os.path.join(test_root,label,imgname),0)
            array = np.array(img).flatten() # 将二维图像平铺为一维图像
            array=MaxMinNormalization(array)
            test[0].append(array)
            label_ = [0,0,0,0,0,0,0,0,0,0]
            label_[int(label)] = 1
            test[1].append(label_)
    test = shuff(test)
    return test[0],test[1]

def shuff(data):
    temp=[]
    for i in range(len(data[0])):
        temp.append([data[0][i],data[1][i]])
    import random
    random.shuffle(temp)
    data=[[],[]]
    for tt in temp:
        data[0].append(tt[0])
        data[1].append(tt[1])
    return data

count = 0
def getBatchNum(batch_size,maxNum):
    global count
    if count ==0:
        count=count+batch_size
        return 0,min(batch_size,maxNum)
    else:
        temp = count
        count=count+batch_size
        if min(count,maxNum)==maxNum:
            count=0
            return getBatchNum(batch_size,maxNum)
        return temp,min(count,maxNum)
    
def MaxMinNormalization(x):
    x = (x - np.min(x)) / (np.max(x) - np.min(x))
    return x


# 1、权重初始化,偏置初始化
# 为了创建这个模型，我们需要创建大量的权重和偏置项
# 为了不在建立模型的时候反复操作，定义两个函数用于初始化
def weight_variable(shape):
    initial = tf.truncated_normal(shape,stddev=0.1)#正太分布的标准差设为0.1
    return tf.Variable(initial)
def bias_variable(shape):
    initial = tf.constant(0.1,sh