recognition mnist handwriting digits

recognition mnist handwriting digits

download mnist and load data

MNIST can be downloaded in this website http://yann.lecun.com/exdb/mnist/.

After download data set, unzip it like this: tar -xzvf ‘****.gz’.

And there will two datasets and four files used by us in forward steps.

t10k-images.idx3-ubyte, t10k-labels.idx3-ubyte

train-images.idx3-ubyte, train-labels.idx3-ubyte

There are two functions provided to extract the data.
One is implemented in Python language.
The other is implemented in Matlab language.

function images = loadMNISTImages(filename)
%loadMNISTImages returns a [number of MNIST images]x28x28 matrix containing
%the raw MNIST images

fp = fopen(filename, 'rb');
assert(fp ~= -1, ['Could not open ', filename, '']);

magic = fread(fp, 1, 'int32', 0, 'ieee-be');
assert(magic == 2051, ['Bad magic number in ', filename, '']);

numImages = fread(fp, 1, 'int32', 0, 'ieee-be');
numRows = fread(fp, 1, 'int32', 0, 'ieee-be');
numCols = fread(fp, 1, 'int32', 0, 'ieee-be');

images = fread(fp, inf, 'unsigned char');
images = reshape(images, numCols, numRows, numImages);
images = permute(images,[2 1 3]);

fclose(fp);

% Reshape to #pixels x #examples
images = reshape(images, size(images, 1) * size(images, 2), size(images, 3));
% Convert to double and rescale to [0,1]
images = double(images) / 255;

end
function labels = loadMNISTLabels(filename)
%loadMNISTLabels returns a [number of MNIST images]x1 matrix containing
%the labels for the MNIST images

fp = fopen(filename, 'rb');
assert(fp ~= -1, ['Could not open ', filename, '']);

magic = fread(fp, 1, 'int32', 0, 'ieee-be');
assert(magic == 2049, ['Bad magic number in ', filename, '']);

numLabels = fread(fp, 1, 'int32', 0, 'ieee-be');

labels = fread(fp, inf, 'unsigned char');

assert(size(labels,1) == numLabels, 'Mismatch in label count');

fclose(fp);

end

The code above is provided by Prof. Andrew Ng.
The following code extracts data by Python.
Firstly, we should load the library we need.

# import libs we need
import numpy as np
import struct
import matplotlib.pyplot as plt

In LeCun’s blog, how the picture is saved has been illustrated in details.

lenet-5dataintroduc

# load data

def loadimg(imgfilename):
    with open(imgfilename, 'rb') as imgfile:
        datastr = imgfile.read()

    index = 0
    mgc_num, img_num, row_num, col_num = struct.unpack_from('>IIII', datastr, index)
    index += struct.calcsize('>IIII')

    image_array = np.zeros((img_num, row_num, col_num))
    for img_idx in xrange(img_num):
        img = struct.unpack_from('>784B', datastr, index)
        index += struct.calcsize('>784B')
        image_array[img_idx,:,:] = np.reshape(np.array(img), (28,28))
    image_array = image_array/255.0
    np.save(imgfilename[:6]+'image-py', image_array)
    return None

def loadlabel(labelfilename):
    with open(labelfilename, 'rb') as labelfile:
        datastr = labelfile.read()

    index = 0
    mgc_num, label_num = struct.unpack_from('>II', datastr, index)
    index += struct.calcsize('>II')

    label = struct.unpack_from('{}B'.format(label_num), datastr, index)
    index += struct.calcsize('{}B'.format(label_num))

    label_array = np.array(label)

    np.save(labelfilename[:5]+'label-py', label_array)
    return None

The two functions above are used to import data and save them as a python-fitting format (.npy).

loadimg('train-images.idx3-ubyte')
loadimg('t10k-images.idx3-ubyte')
loadlabel('train-labels.idx1-ubyte')
loadlabel('t10k-labels.idx1-ubyte')

Then it is easy to load data by numpy function.

train_image = np.load('train-image-py.npy')
train_label = np.load('trainlabel-py.npy')
test_image = np.load('t10k-iimage-py.npy')
test_label = np.load('t10k-label-py.npy')

What are the dimensions of our Array?

print(train_image.shape)
print(train_label.shape)
print(test_image.shape)
print(test_label.shape)
(60000, 28, 28)
(60000,)
(10000, 28, 28)
(10000,)

Loading one of the pictures, we can check whether the data is well saved. And then using matplotlib to display it, we get a digit picture.

# check data
%matplotlib inline
im = train_image[9,:,:]
im = 255*im
plt.imshow(im, cmap='gray')
plt.show()

4.png

print(train_label[9])
4
im = test_image[17,:,:]
im = 255*im
plt.imshow(im, cmap='gray')
plt.show()
print(test_label[17])

7.png

7

OK. Do we finish the data-preparation stage?

import tensorflow as tf
from six.moves import reduce
image_size = 28
num_labels = 10
num_channels = 1 # gray scale

reformat = lambda data,labels: (data.reshape((-1, image_size, image_size, 1)).astype(np.float32),(np.arange(num_labels) == labels[:,None]).astype(np.float32))

Sorry, we do not finish it yet.
For training networks, we have to change the dimensions of our data. What’s more, we will add one more variable for storing the label of each picture. Our label variable is in One-Hot encoding format. Thanks to tensorflow, it has provided reformatfunction to help us.

If you have doubts with convolution networks or One-Hot Encoding, you can find some detailed explanations in my former blogs: Convolution Networks,One-Hot Encoding

train_dataset, train_labels = reformat(train_image, train_label)
test_dataset, test_labels = reformat(test_image, test_label)
print('train_dataset size: ', train_dataset.shape)
print('train_labels size: ', train_labels.shape)
print('test_dataset size: ', test_dataset.shape)
print('test_labels size: ', test_labels.shape)
('train_dataset size: ', (60000, 28, 28, 1))
('train_labels size: ', (60000, 10))
('test_dataset size: ', (10000, 28, 28, 1))
('test_labels size: ', (10000, 10))

At this step, we have finished our preparation.

We have got train_dataset,train_labels,test_dataset,test_labels in right format.

accuracy = lambda pred, labels: (100.0 * np.sum(np.argmax(pred,1) == np.argmax(labels,1))/pred.shape[0] )

Function accuracy is used to compute the accuracy of our model.

Training and Testing

Our architecture of convolution network is based on the following picture.
arch

We define the architecture in the function model.

Due to the dimensions of our initial picture is (28,28), the first convolution in the picture is ignored by us.

The optimizer of gradient descend algorithm is used as:

tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss) .

batch_size = 128

num_steps = 4501

graph = tf.Graph()  
with graph.as_default():  
    # Input data.  
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels)) # num_channels=1 grayscale   
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))  
    tf_test_dataset = tf.constant(test_dataset)  

        # Variables.

    filter1 = tf.Variable(tf.truncated_normal([1,1,1,6], stddev=0.1))  
    biases1 = tf.Variable(tf.zeros([6]))  

    filter2 = tf.Variable(tf.truncated_normal( [5,5,6,16], stddev=0.1))  
    biases2 = tf.Variable(tf.constant(1.0, shape=[16]))  

    filter3 = tf.Variable(tf.truncated_normal([5,5, 16, 120], stddev=0.1))  
    biases3 = tf.Variable(tf.constant(1.0, shape=[120]))  

    weights1 = tf.Variable(tf.truncated_normal([120, 84], stddev=0.1))  
    w_biases1 = tf.Variable(tf.zeros([84]))  
    weights2 = tf.Variable(tf.truncated_normal([84, 10], stddev=0.1)) 
    w_biases2 = tf.Variable(tf.zeros([10]))  

    def model(data):
        # data (batch, 28, 28, 1)
        # filter1 (1, 1, 1, 6)
        conv = tf.nn.conv2d(data, filter1, [1,1,1,1], padding='SAME')
        conv = tf.nn.tanh(conv + biases1)
    # data reshaped to (batch, 28, 28, 1)
    # filter1 reshaped yo (1*1*1, 6)
    # conv shape (batch, 28, 28, 6)
    # sub-smapling
        conv = tf.nn.avg_pool(conv, [1,2,2,1], [1,2,2,1], padding='SAME')
    # conv shape(batch, 14, 14, 6)
    # filter2 shape(5, 5, 6, 16)
        conv = tf.nn.conv2d(conv, filter2, [1,1,1,1], padding='VALID')
    # conv reshaped to (batch, 10, 10, 5*5*6)
    # filter2 reshaped to (5*5*6, 16)
    # conv shape (batch, 10, 10, 16)
        conv = tf.nn.tanh(conv + biases2)
    # conv shape (batch, 10, 10, 16)
        conv = tf.nn.avg_pool(conv, [1,2,2,1], [1,2,2,1], padding='SAME')
    # conv shape (batch, 5,5 16)
    # filter3 shape (5,5, 16, 120)
        conv = tf.nn.conv2d(conv, filter3, [1,1,1,1], padding='VALID')
    # conv reshape( batch, 1, 1, 5*5*16)
    # filter3 reshape (5*5*16, 120)
    # conv = (batch, 1,1, 120)
        conv = tf.nn.tanh(conv + biases3)
        shape = conv.get_shape().as_list()
        reshape = tf.reshape(conv, (shape[0], reduce(lambda a,b:a*b, shape[1:])))
        hidden = tf.nn.relu(tf.matmul(reshape, weights1) + w_biases1) 
        hidden = tf.nn.dropout(hidden, 0.8)
        logits = tf.matmul(hidden, weights2) + w_biases2
        return logits

     # Training computation.  
    logits = model(tf_train_dataset)  
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))  

        # Optimizer.  
    optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss)  

        # Predictions for the training, validation, and test data.  
    train_prediction = tf.nn.softmax(logits)  
    test_prediction = tf.nn.softmax(model(tf_test_dataset))  



with tf.Session(graph=graph) as session:
    tf.initialize_all_variables().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 500 == 0):

            print('Minibatch loss at step %d: %f' % (step, l))

            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))
Initialized
Minibatch loss at step 0: 2.441036
Minibatch accuracy: 10.9%
Minibatch loss at step 500: 0.214182
Minibatch accuracy: 92.2%
Minibatch loss at step 1000: 0.086537
Minibatch accuracy: 97.7%
Minibatch loss at step 1500: 0.107810
Minibatch accuracy: 96.9%
Minibatch loss at step 2000: 0.088142
Minibatch accuracy: 97.7%
Minibatch loss at step 2500: 0.150886
Minibatch accuracy: 96.1%
Minibatch loss at step 3000: 0.088806
Minibatch accuracy: 98.4%
Minibatch loss at step 3500: 0.039191
Minibatch accuracy: 97.7%
Minibatch loss at step 4000: 0.018480
Minibatch accuracy: 99.2%
Minibatch loss at step 4500: 0.010719
Minibatch accuracy: 100.0%
Test accuracy: 98.2%

The accuracy rate is about 98%. However, that is not a good enough accuracy rate. There are several aspects of our model to be improved:
1. subsampling type: max pooling, average pooling, etc.
2. activate function: relu, tanh, etc.
3. initial value.

  • 2
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 5
    评论
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值