MXNet官方文档教程(2)：基于卷积神经网络的手写数字识别示例

最新推荐文章于 2024-05-27 11:08:38 发布

Catalyst_ZX

最新推荐文章于 2024-05-27 11:08:38 发布

阅读量7.1k

点赞数 2

分类专栏： mxnet 文章标签： python mexnet 深度学习神经网络计算机视觉

mxnet 专栏收录该内容

7 篇文章 1 订阅

订阅专栏

原本打算开始翻译计算图的部分，结果上一篇刚发完，MXNet就升级了教程文档（伤不起啊），更新了上一篇中手写数字识别示例的详细教程。那这一篇就与时俱进，来将刚更新的这篇教程翻译过来把。由于目前图片无法上传到博客中，相关图片可在原网站查看：Handwritten Digit Recognition。

本教程引导你完成一个有关计算机视觉分类的应用示例：使用人工神经网络识别手写数字

加载数据

我们首先需要获取MNIST 数据，该数据集是手写数字识别常用的数据集。数据集中的每一幅图像都被缩放为28*28像素大小的灰度图（灰度值介于0到254之间）。以下代码下载并加载图像和与图像对应的标签到numpy。

import numpy as np

import os

impor turllib

import gzip

import struct

def download_data(url, force_download=True):

fname = url.split("/")[-1]

if force_downloadornot os.path.exists(fname):

urllib.urlretrieve(url,fname)

return fname

def read_data(label_url, image_url):

with gzip.open(download_data(label_url))as flbl:

magic, num = struct.unpack(">II", flbl.read(8))

label = np.fromstring(flbl.read(), dtype=np.int8)

with gzip.open(download_data(image_url),'rb')as fimg:

magic, num, rows, cols = struct.unpack(">IIII", fimg.read(16))

image = np.fromstring(fimg.read(), dtype=np.uint8).reshape(len(label), rows, cols)

return (label, image)

path='http://yann.lecun.com/exdb/mnist/'

(train_lbl, train_img)= read_data(

path+'train-labels-idx1-ubyte.gz', path+'train-images-idx3-ubyte.gz')

(val_lbl, val_img) = read_data(

path+'t10k-labels-idx1-ubyte.gz', path+'t10k-images-idx3-ubyte.gz')

我们输出了前10幅图像和他们所对应的标签：

%matplotlib inline

import matplotlib.pyplot as plt

for i inrange(10):

    plt.subplot(1,10,i+1)

    plt.imshow(train_img[i], cmap='Greys_r')

    plt.axis('off')

plt.show()

print('label: %s'% (train_lbl[0:10],))

label: [5 0 4 1 9 2 1 3 1 4]

之后我们为MXNet创建数据迭代器。和迭代器一样，数据迭代器在每次调用next()函数时返回一批数据，包括多幅图片和其对应的标签。这些图像保存在一个大小为(batch_size, num_channels, width, height)的4维矩阵中。对于MNIST数据集来说，图像仅有一个色彩通道且高宽均为28。此外，我们经常洗乱用于训练的图像，以加快训练的速度。

import mxnet as mx

defto4d(img):

    return img.reshape(img.shape[0],1,28,28).astype(np.float32)/255

batch_size=100

train_iter= mx.io.NDArrayIter(to4d(train_img), train_lbl, batch_size, shuffle=True)

val_iter= mx.io.NDArrayIter(to4d(val_img), val_lbl, batch_size)

多层感知机

一个多层感知机包含多个全连接层。对于全连接层来说，假设输入矩阵X的大小为n*m，输出矩阵Y的大小为n*k，其中k通常被称为隐藏大小。这个层有两个参数，m*n的权重矩阵W和m*1的偏移向量b。则输出由下式得出：

Y =WX + b

全连接层的输出通常输入到一个卷积层，进行逐像素操作(elemental-wise operations)。其中一个很著名的函数就是Sigmoid函数：f(x)= 1/(1+e^(-x))。而如今人们也使用一个更简单的叫做relu的函数：f(x) = max(0,x)。

最后一个全连接层通常拥有和数据集中的类别个数一样的隐藏大小。最后我们压入一个softmax层，它可以将输入映射到表示可能性的分值。同样假设输入X大小为n*m，x_i为第i行。则第i行的输出为：

定义多层感知机在MXNet中是很简单的，如下所示。

# Create a place holder variable for the input data

data= mx.sym.Variable('data')

# Flatten the data from 4-D shape (batch_size, num_channel, width, height)

# into 2-D (batch_size, num_channel*width*height)

data= mx.sym.Flatten(data=data)

# The first fully-connected layer

fc1  = mx.sym.FullyConnected(data=data, name='fc1', num_hidden=128)

# Apply relu to the output of the first fully-connnected layer

act1= mx.sym.Activation(data=fc1, name='relu1', act_type="relu")

# The second fully-connected layer and the according activation function

fc2  = mx.sym.FullyConnected(data=act1, name='fc2', num_hidden =64)

act2= mx.sym.Activation(data=fc2, name='relu2', act_type="relu")

# The thrid fully-connected layer, note that the hidden size should be 10, which is the number of unique digits

fc3  = mx.sym.FullyConnected(data=act2, name='fc3', num_hidden=10)

# The softmax and loss layer

mlp  = mx.sym.SoftmaxOutput(data=fc3, name='softmax')

# We visualize the network structure with output size (the batch_size is ignored.)

shape= {"data" : (batch_size, 1,28,28)}

mx.viz.plot_network(symbol=mlp, shape=shape)

现在神经网络定义和数据迭代器都已经准备好了。我们可以开始训练了：

import logging

logging.getLogger().setLevel(logging.DEBUG)

model= mx.model.FeedForward(

    symbol = mlp,       # network structure

    num_epoch =10,     # number of data passes for training

    learning_rate =0.1# learning rate of SGD

model.fit(

    X=train_iter,       # training data

    eval_data=val_iter,# validation data

    batch_end_callback = mx.callback.Speedometer(batch_size,200)# output progress for each 200 data batches

INFO:root:Start training with [cpu(0)]

INFO:root:Epoch[0] Batch [200]  Speed: 26279.17 samples/sec Train-accuracy=0.111550

INFO:root:Epoch[0] Batch [400]  Speed: 27424.98 samples/sec Train-accuracy=0.111000

INFO:root:Epoch[0] Batch [600]  Speed: 27094.87 samples/sec Train-accuracy=0.133200

INFO:root:Epoch[0] Resetting Data Iterator

INFO:root:Epoch[0] Time cost=2.320

INFO:root:Epoch[0] Validation-accuracy=0.276800

INFO:root:Epoch[1] Batch [200]  Speed: 17739.48 samples/sec Train-accuracy=0.412650

INFO:root:Epoch[1] Batch [400]  Speed: 18869.69 samples/sec Train-accuracy=0.753500

INFO:root:Epoch[1] Batch [600]  Speed: 25618.04 samples/sec Train-accuracy=0.828750

INFO:root:Epoch[1] Resetting Data Iterator

INFO:root:Epoch[1] Time cost=2.988

INFO:root:Epoch[1] Validation-accuracy=0.854400

INFO:root:Epoch[2] Batch [200]  Speed: 21532.09 samples/sec Train-accuracy=0.859750

INFO:root:Epoch[2] Batch [400]  Speed: 27919.08 samples/sec Train-accuracy=0.888700

INFO:root:Epoch[2] Batch [600]  Speed: 26810.95 samples/sec Train-accuracy=0.905550

INFO:root:Epoch[2] Resetting Data Iterator

INFO:root:Epoch[2] Time cost=2.408

INFO:root:Epoch[2] Validation-accuracy=0.916300

INFO:root:Epoch[3] Batch [200]  Speed: 28097.98 samples/sec Train-accuracy=0.917300

INFO:root:Epoch[3] Batch [400]  Speed: 27490.20 samples/sec Train-accuracy=0.925850

INFO:root:Epoch[3] Batch [600]  Speed: 27937.45 samples/sec Train-accuracy=0.934900

INFO:root:Epoch[3] Resetting Data Iterator

INFO:root:Epoch[3] Time cost=2.167

INFO:root:Epoch[3] Validation-accuracy=0.938400

INFO:root:Epoch[4] Batch [200]  Speed: 26948.04 samples/sec Train-accuracy=0.942450

INFO:root:Epoch[4] Batch [400]  Speed: 24250.66 samples/sec Train-accuracy=0.943200

INFO:root:Epoch[4] Batch [600]  Speed: 22772.67 samples/sec Train-accuracy=0.951550

INFO:root:Epoch[4] Resetting Data Iterator

INFO:root:Epoch[4] Time cost=2.456

INFO:root:Epoch[4] Validation-accuracy=0.951500

INFO:root:Epoch[5] Batch [200]  Speed: 27313.59 samples/sec Train-accuracy=0.955500

INFO:root:Epoch[5] Batch [400]  Speed: 28061.48 samples/sec Train-accuracy=0.955100

INFO:root:Epoch[5] Batch [600]  Speed: 26730.32 samples/sec Train-accuracy=0.960500

INFO:root:Epoch[5] Resetting Data Iterator

INFO:root:Epoch[5] Time cost=2.206

INFO:root:Epoch[5] Validation-accuracy=0.956300

INFO:root:Epoch[6] Batch [200]  Speed: 28440.23 samples/sec Train-accuracy=0.962700

INFO:root:Epoch[6] Batch [400]  Speed: 28832.82 samples/sec Train-accuracy=0.962700

INFO:root:Epoch[6] Batch [600]  Speed: 27814.78 samples/sec Train-accuracy=0.967150

INFO:root:Epoch[6] Resetting Data Iterator

INFO:root:Epoch[6] Time cost=2.131

INFO:root:Epoch[6] Validation-accuracy=0.960300

INFO:root:Epoch[7] Batch [200]  Speed: 20942.23 samples/sec Train-accuracy=0.967550

INFO:root:Epoch[7] Batch [400]  Speed: 22264.85 samples/sec Train-accuracy=0.967750

INFO:root:Epoch[7] Batch [600]  Speed: 21294.69 samples/sec Train-accuracy=0.971500

INFO:root:Epoch[7] Resetting Data Iterator

INFO:root:Epoch[7] Time cost=2.805

INFO:root:Epoch[7] Validation-accuracy=0.961400

INFO:root:Epoch[8] Batch [200]  Speed: 17870.55 samples/sec Train-accuracy=0.972550

INFO:root:Epoch[8] Batch [400]  Speed: 11526.75 samples/sec Train-accuracy=0.971600

INFO:root:Epoch[8] Batch [600]  Speed: 15082.47 samples/sec Train-accuracy=0.974500

INFO:root:Epoch[8] Resetting Data Iterator

INFO:root:Epoch[8] Time cost=4.197

INFO:root:Epoch[8] Validation-accuracy=0.963000

INFO:root:Epoch[9] Batch [200]  Speed: 10139.52 samples/sec Train-accuracy=0.976000

INFO:root:Epoch[9] Batch [400]  Speed: 10321.69 samples/sec Train-accuracy=0.975550

INFO:root:Epoch[9] Batch [600]  Speed: 10820.23 samples/sec Train-accuracy=0.977750

INFO:root:Epoch[9] Resetting Data Iterator

INFO:root:Epoch[9] Time cost=5.777

INFO:root:Epoch[9] Validation-accuracy=0.964100

完成训练后，我们对单幅图片进行测试。

plt.imshow(val_img[0], cmap='Greys_r')

plt.axis('off')

plt.show()

prob= model.predict(val_img[0:1].astype(np.float32)/255)[0]

print'Classified as %d with probability %f'% (prob.argmax(),max(prob))

Classified as 7 with probability 0.999781

我们也可以通过给予一个数据迭代器来计算正确率。

print'Validation accuracy: %f%%'% (model.score(val_iter)*100,)

Validation accuracy: 96.410000%

甚至，我们可以识别写在框中的数字。

from IPython.display import HTML

import cv2

import numpy as np

from mnist_demo import html, script

def classify(img):

    img = img[len('data:image/png;base64,'):].decode('base64')

    img = cv2.imdecode(np.fromstring(img, np.uint8),-1)

    img = cv2.resize(img[:,:,3], (28,28))

    img = img.astype(np.float32).reshape((1,1,28,28))/255.0

    return model.predict(img)[0].argmax()

'''

To see the model in action, run the demo notebook at

https://github.com/dmlc/mxnet-notebooks/blob/master/python/tutorials/mnist.ipynb.

'''

HTML(html+ script)

卷积神经网络

注意之前的全连接层在训练时只是将图像转换为向量，而忽略了像素在水平和垂直维度上的空间信息。卷积层的作用就是通过使用一个更结构化的权重W来克服这一缺点。它使用2维卷积来代替简单的矩阵乘法来得到输出。

我们也可以使用多个特征图（每一个都拥有一个不同的权重矩阵）来提取不同的特征。

除了卷积层外，另一个卷积神经网络主要的变化就是加入了池化层(pooling layers)。池化层将一个n*m（通常我们称其为核大小）的图像转化为一个单独的值来降低人工神经网络对于空间位置的敏感程度（译者注：为了避免过拟合。）

data= mx.symbol.Variable('data')

# first conv layer

conv1= mx.sym.Convolution(data=data, kernel=(5,5), num_filter=20)

tanh1= mx.sym.Activation(data=conv1, act_type="tanh")

pool1= mx.sym.Pooling(data=tanh1, pool_type="max", kernel=(2,2), stride=(2,2))

# second conv layer

conv2= mx.sym.Convolution(data=pool1, kernel=(5,5), num_filter=50)

tanh2= mx.sym.Activation(data=conv2, act_type="tanh")

pool2= mx.sym.Pooling(data=tanh2, pool_type="max", kernel=(2,2), stride=(2,2))

# first fullc layer

flatten= mx.sym.Flatten(data=pool2)

fc1= mx.symbol.FullyConnected(data=flatten, num_hidden=500)

tanh3= mx.sym.Activation(data=fc1, act_type="tanh")

# second fullc

fc2= mx.sym.FullyConnected(data=tanh3, num_hidden=10)

# softmax loss

lenet= mx.sym.SoftmaxOutput(data=fc2, name='softmax')

注意上面的LeNet模型比多层感知机更加复杂，所以我们使用GPU代替CPU来进行训练。

model= mx.model.FeedForward(

    ctx = mx.gpu(0),     # use GPU 0 for training, others are same as before

    symbol = lenet,

    num_epoch =10,

    learning_rate =0.1)

model.fit(

    X=train_iter,

    eval_data=val_iter,

    batch_end_callback = mx.callback.Speedometer(batch_size,200)

INFO:root:Start training with [gpu(0)]

INFO:root:Epoch[0] Batch [200]  Speed: 14804.86 samples/sec Train-accuracy=0.111500

INFO:root:Epoch[0] Batch [400]  Speed: 14294.26 samples/sec Train-accuracy=0.111000

INFO:root:Epoch[0] Batch [600]  Speed: 14273.05 samples/sec Train-accuracy=0.113600

INFO:root:Epoch[0] Resetting Data Iterator

INFO:root:Epoch[0] Time cost=4.446

INFO:root:Epoch[0] Validation-accuracy=0.113500

INFO:root:Epoch[1] Batch [200]  Speed: 14332.64 samples/sec Train-accuracy=0.141350

INFO:root:Epoch[1] Batch [400]  Speed: 14785.42 samples/sec Train-accuracy=0.777650

INFO:root:Epoch[1] Batch [600]  Speed: 14796.36 samples/sec Train-accuracy=0.914550

INFO:root:Epoch[1] Resetting Data Iterator

INFO:root:Epoch[1] Time cost=4.105

INFO:root:Epoch[1] Validation-accuracy=0.937700

INFO:root:Epoch[2] Batch [200]  Speed: 14877.08 samples/sec Train-accuracy=0.941850

INFO:root:Epoch[2] Batch [400]  Speed: 14806.53 samples/sec Train-accuracy=0.955900

INFO:root:Epoch[2] Batch [600]  Speed: 14844.79 samples/sec Train-accuracy=0.965200

INFO:root:Epoch[2] Resetting Data Iterator

INFO:root:Epoch[2] Time cost=4.048

INFO:root:Epoch[2] Validation-accuracy=0.971200

INFO:root:Epoch[3] Batch [200]  Speed: 14873.95 samples/sec Train-accuracy=0.971150

INFO:root:Epoch[3] Batch [400]  Speed: 14793.99 samples/sec Train-accuracy=0.972400

INFO:root:Epoch[3] Batch [600]  Speed: 14806.52 samples/sec Train-accuracy=0.976600

INFO:root:Epoch[3] Resetting Data Iterator

INFO:root:Epoch[3] Time cost=4.052

INFO:root:Epoch[3] Validation-accuracy=0.980600

INFO:root:Epoch[4] Batch [200]  Speed: 14428.12 samples/sec Train-accuracy=0.979100

INFO:root:Epoch[4] Batch [400]  Speed: 14298.85 samples/sec Train-accuracy=0.979550

INFO:root:Epoch[4] Batch [600]  Speed: 14618.55 samples/sec Train-accuracy=0.982400

INFO:root:Epoch[4] Resetting Data Iterator

INFO:root:Epoch[4] Time cost=4.158

INFO:root:Epoch[4] Validation-accuracy=0.983300

INFO:root:Epoch[5] Batch [200]  Speed: 14919.47 samples/sec Train-accuracy=0.983700

INFO:root:Epoch[5] Batch [400]  Speed: 14809.71 samples/sec Train-accuracy=0.984050

INFO:root:Epoch[5] Batch [600]  Speed: 14550.25 samples/sec Train-accuracy=0.986250

INFO:root:Epoch[5] Resetting Data Iterator

INFO:root:Epoch[5] Time cost=4.071

INFO:root:Epoch[5] Validation-accuracy=0.985100

INFO:root:Epoch[6] Batch [200]  Speed: 14363.59 samples/sec Train-accuracy=0.986500

INFO:root:Epoch[6] Batch [400]  Speed: 14629.87 samples/sec Train-accuracy=0.986950

INFO:root:Epoch[6] Batch [600]  Speed: 14842.83 samples/sec Train-accuracy=0.988700

INFO:root:Epoch[6] Resetting Data Iterator

INFO:root:Epoch[6] Time cost=4.113

INFO:root:Epoch[6] Validation-accuracy=0.985300

INFO:root:Epoch[7] Batch [200]  Speed: 14863.48 samples/sec Train-accuracy=0.988950

INFO:root:Epoch[7] Batch [400]  Speed: 14824.65 samples/sec Train-accuracy=0.988800

INFO:root:Epoch[7] Batch [600]  Speed: 14278.57 samples/sec Train-accuracy=0.990350

INFO:root:Epoch[7] Resetting Data Iterator

INFO:root:Epoch[7] Time cost=4.102

INFO:root:Epoch[7] Validation-accuracy=0.986400

INFO:root:Epoch[8] Batch [200]  Speed: 14875.69 samples/sec Train-accuracy=0.990300

INFO:root:Epoch[8] Batch [400]  Speed: 14833.44 samples/sec Train-accuracy=0.990750

INFO:root:Epoch[8] Batch [600]  Speed: 14804.53 samples/sec Train-accuracy=0.992250

INFO:root:Epoch[8] Resetting Data Iterator

INFO:root:Epoch[8] Time cost=4.049

INFO:root:Epoch[8] Validation-accuracy=0.987200

INFO:root:Epoch[9] Batch [200]  Speed: 14864.23 samples/sec Train-accuracy=0.992000

INFO:root:Epoch[9] Batch [400]  Speed: 14699.46 samples/sec Train-accuracy=0.991650

INFO:root:Epoch[9] Batch [600]  Speed: 14853.07 samples/sec Train-accuracy=0.992800

INFO:root:Epoch[9] Resetting Data Iterator

INFO:root:Epoch[9] Time cost=4.058

INFO:root:Epoch[9] Validation-accuracy=0.987800

注意到对于同样超参数，LeNet模型达到了98.7%的精度，高于多层感知机的96.6%。

Catalyst_ZX

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
MXNet官方文档教程(2)：基于卷积神经网络的手写数字识别示例

原本打算开始翻译计算图的部分，结果上一篇刚发完，MXNet就升级了教程文档（伤不起啊），更新了上一篇中手写数字识别示例的详细教程。那这一篇就与时俱进，来将刚更新的这篇教程翻译过来把。由于目前图片无法上传到博客中，相关图片可在原网站查看：Handwritten Digit Recognition。本教程引导你完成一个有关计算机视觉分类的应用示例：使用人工神经网络识别手写数字
复制链接

扫一扫