Recognize Written Numbers by using CNN_machine learjing to recognize numbers-CSDN博客

本文链接：https://blog.csdn.net/wzwwangziwen/article/details/81983337

What is CNN(Convolutional Neuron Network)

A convolutional neural network (CNN) is a class of deep, feed-forward artificial neural networks, most commonly applied to analyzing visual imagery.
Convolutional networks were inspired by biological processes[4] in that the connectivity pattern between neurons resembles the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field.

Design

We’ll talk deeply about the process building up a Convolutional neural network for recognizing written numbers.
这里写图片描述

Dataset

The dataset we are going to use for training and testing in our network is the open-source data set MNIST(yann.lecun.com/exdb/mnist/) which contains 60,000 training images and 10,000 testing images. The images are all in the form of 28x28.

Download dataset

In python, there has already exist a package especially for using the MNIST dataset to train machine learning models. The only thing we need is to import this package and use the function to import the dataset.

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

At here, one_hot should be True. One_hot is often used to differentiate multiple classes that are usually words. For example, when we want to train a model that can predict the houses’ price, we use on_hot to express the position of the houses. Also, in the work for recognizing the numbers, one_hot is used to express 10 numbers:
(1,0,0,0,0,0,0,0,0,0) stands for the number 1.

Visualization

After downloading the dataset, we want to have a glimpse of the images. We can write a simple code to visualize the data.

from tensorflow.examples.tutorials.mnist import input_data
import matplotlib.pyplot as plt

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

print('Training data size: ', mnist.train.num_examples)
print('Validation data size: ', mnist.validation.num_examples)
print('Test data size: ', mnist.test.num_examples)

img0 = mnist.validation.images[0].reshape(28,28)
img1 = mnist.test.images[100].reshape(28,28)
img2 = mnist.train.images[2].reshape(28,28)
img3 = mnist.train.images[3].reshape(28,28)

fig = plt.figure(figsize=(10,10))
ax0 = fig.add_subplot(221)
ax1 = fig.add_subplot(222)
ax2 = fig.add_subplot(223)
ax3 = fig.add_subplot(224)

ax0.imshow(img0)
ax1.imshow(img1)
ax2.imshow(img2)
ax3.imshow(img3)
fig.show()

Convolution

Convolution is a mathematical operation on two functions (f and g) to produce a third function that expresses how the shape of one is modified by the other.
Why do we need to use convolution in our network? As we all know a picture contains RGB 3 layers and various noise which could influence the result of our model. By using convolution, we can reduce the noise in some level and extract the features.
For more information, see this blog: https://blog.csdn.net/chaipp0607/article/details/72236892?locationNum=9&fps=1

In order to save time, we define two functions for both initialize the weights and biases.

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

After that, we use 32 convolutional kernel which are 5x5 size in the first layer.

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])


x_image = tf.reshape(x, [-1,28,28,1])

Activate

After linear calculation, we need to add an activate function in each neuron. Here we use the Relu as my activation function.

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)

Maxpooling

Sometimes we want to extract the most outstanding feature in the data like a picture. The max pooling is a way to minimize the size of the picture and find the largest pixel in a area.
这里写图片描述

Fully-connected Layer

After two layers of hidden layers, a fully-connected layer is followed. We use 1024 neurons to connect 64 neurons in the former layer.

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

Accuracy

By using two-layers CNN, the accuracy can reach to 99.8% which is pretty enough for recognizing the written numbers.

Code

import os 
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

import tensorflow as tf
sess = tf.InteractiveSession()

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')


# Create the model
# placeholder
x = tf.placeholder("float", [None, 784])
y_ = tf.placeholder("float", [None, 10])

#first
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])


x_image = tf.reshape(x, [-1,28,28,1])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

#second
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

#

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)



#dropout

keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)


#softmax

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)




cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "double"))
sess.run(tf.initialize_all_variables())
for i in range(100):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})
    print ("step %d, training accuracy %f"%(i, train_accuracy))
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print ("test accuracy %f"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))