A popular demonstration of the capability of deep learning techniques is object recognition in image data. The hello world of object recognition for machine learning and deep learning is the MNIST dataset for handwritten digit recognition. In this project you will discover how to develop a deep learning model to achieve near state-of-the-art performance on the MNIST handwritten digit recognition task in Python using the Keras deep learning library. After completing this step-by-step tutorial, you will know:
- How to load the MNIST dataset in Keras and develop a baseline neural network model for the problem.
- How to implement and evaluate a simple Convolutional Neural Network for MNIST.
- How to implement a close to state-of-the-art deep learning model for MNIST.
Note: You may want to speed up the computation for this tutorial by using GPU rather than CPU hardware, such as the process described in Chapter 5. This is a suggestion, not a requirement. The tutorial will work just fine on the CPU.
1.1 Handwritten Digit Recognition Dataset
The MNIST problem is a dataset developed by Yann LeCun, Corinna Cortes and Christopher Burges for evaluating machine learning models on the handwritten digit classification problem1. The dataset was constructed from a number of scanned document datasets available from the National Institute of Standards and Technology (NIST). This is where the name for the dataset comes from, as the Modified NIST or MNIST dataset.
Images of digits were taken from a variety of scanned documents, normalized in size and centered. This makes it an excellent dataset for evaluating models, allowing the developer to focus on the machine learning with very little data cleaning or preparation required. Each image is a 28 X 28 pixel square (784 pixels total). A standard split of the dataset is used to evaluate and compare models, where 60,000 images are used to train a model and a separate set of 10,000 images are used to test it.
It is a digit recognition task. As such there are 10 digits (0 to 9) or 10 classes to predict. Results are reported using prediction error, which is nothing more than the inverted classification accuracy. Excellent results achieve a prediction error of less than 1%. State-of-the-art prediction error of approximately 0.2% can be achieved with large Convolutional Neural Networks. There is a listing of the state-of-the-art results and links to the relevant papers on the MNIST and other datasets on Rodrigo Benenson’s webpage2.
1.2 Loading the MNIST dataset in Keras
The Keras deep learning library provides a convenience method for loading the MNIST dataset. The dataset is downloaded automatically the first time this function is called and is stored in your home directory in ~/.keras/datasets/mnist.pkl.gz as a 15 megabyte file. This is very handy for developing and testing deep learning models. To demonstrate how easy it is to load the MNIST dataset, we will first write a little script to download and visualize the first 4 images in the training dataset.
# Plot ad hoc mnist instances
from keras.datasets import mnist
import matplotlib.pyplot as plt
# load (download if needed) the MNIST dataset
(X_train, y_train),(X_test, y_test) = mnist.load_data()
# plot 4 images as gray scale
plt.subplot(221)
plt.imshow(X_train[0], cmap=plt.get_cmap('gray'))
plt.subplot(222)
plt.imshow(X_train[1], cmap=plt.get_cmap('gray'))
plt.subplot(223)
plt.imshow(X_train[2], cmap=plt.get_cmap('gray'))
plt.subplot(224)
plt.imshow(X_train[3], cmap=plt.get_cmap('gray'))
#show the plot
plt.show()

You can see that downloading and loading the MNIST dataset is as easy as calling the mnist.load data() function. Running the above example, you should see the image below.

1.3 Baseline Model with Multilayer Perceptrons
Do we really need a complex model like a convolutional neural network to get the best results with MNIST? You can get good results using a very simple neural network model with a single hidden layer. In this section we will create a simple Multilayer Perceptron model that achieves an error rate of 1.74%. We will use this as a baseline for comparison to more complex convolutional neural network models. Let’s start off↵ by importing the classes and functions we will need.
# Import Classes and Functions
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.utils import np_utils

It is always a good idea to initialize the random number generator to a constant to ensure that the results of your script are reproducible.
# Initialize The Random Number Generator
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

Now we can load the MNIST dataset using the Keras helper function.
# Load the MNIST Dataset
# load data
(X_train,y_train),(X_test, y_test) = mnist.load_data()

The training dataset is structured as a 3-dimensional array of instance, image width and image height. For a Multilayer Perceptron model we must reduce the images down into a vector of pixels. In this case the 28 X 28 sized images will be 784 pixel input vectors. We can do this transform easily using the reshape() function on the NumPy array. The pixel values are integers, so we cast them to floating point values so that we can normalize them easily in the next step.
# Prepare MNIST Dataset For Modeling
# flatten 28*28 images to a 784 vector for each image
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train = X_train.reshape(X_train.shape[0], num_pixels).astype('float32')
X_test = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')

The pixel values are gray scale between 0 and 255. It is almost always a good idea to perform some scaling of input values when using neural network models. Because the scale is well known and well behaved, we can very quickly normalize the pixel values to the range 0 and 1 by dividing each value by the maximum of 255.
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255

Finally, the output variable is an integer from 0 to 9. This is a multiclass classification problem. As such, it is good practice to use a one hot encoding of the class values, transforming the vector of class integers into a binary matrix. We can easily do this using the built-in np_utils.to_categorical() helper function in Keras.
# One hot Encode The Output Variable
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]

We are now ready to create our simple neural network model. We will define our model in a function. This is handy if you want to extend the example later and try and get a better score.
# Define and Compile the Baseline Model
# define baseline model()
def baseline_model():
# create model
model = Sequential()
model.add(Dense(num_pixels,input_dim=num_pixels, kernel_initializer='normal', activation='relu'))
model.add(Dense(num_classws,kernel_initializer='normal',activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
return model

The model is a simple neural network with one hidden layer with the same number of neurons as there are inputs (784). A rectifier activation function is used for the neurons in the hidden layer. A softmax activation function is used on the output layer to turn the outputs into probability-like values and allow one class of the 10 to be selected as the model’s output prediction. Logarithmic loss is used as the loss function (called categorical crossentropy in Keras) and the efficient ADAM gradient descent algorithm is used to learn the weights. A summary of the network structure is provided below:

We can now fit and evaluate the model. The model is fit over 10 epochs with updates every 200 images. The test data is used as the validation dataset, allowing you to see the skill of the model as it trains. A verbose value of 2 is used to reduce the output to one line for each training epoch. Finally, the test dataset is used to evaluate the model and a classification error rate is printed.
# Evaluate the Baseline Model
# build the model
model = baseline_model()
# Fit the model
model.fit(X_train,y_train,validation_data=(X_test, y_test), epochs=10, batch_size=200,verbose=2)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Baseline Error: %.2f%%" % (100-scores[1]*100))

The full code listing is provided below for completeness.
# Baseline MLP for MNIST dataset
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.utils import np_utils
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
# load data
(X_train, y_train),(X_test, y_test) = mnist.load_data()
# flatten 28*28 images to a 784 vector for each image
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train = X_train.reshape(X_train.shape[0],num_pixels).astype('float32')
X_test = X_test.reshape(X_test.shape[0],num_pixels).astype('float32')
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
# define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer='normal',activation='relu'))
model.add(Dense(num_classes, kernel_initializer='normal',activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
return model
# build the model
model = baseline_model()
# Fit the model
model.fit(X_train,y_train,validation_data=(X_test,y_test),epochs=10,batch_size=200,verbose=2)
# Final evaluation of the model
scores = model.evaluate(X_test,y_test, verbose=0)
print("Baseline Error: %.2f%%" % (100-scores[1]*100))

Running the example might take a few minutes when run on a CPU. You should see the output below. This simple network defined in very few lines of code achieves a respectable error rate of 1.92%.
Epoch 1/10 300/300 - 2s - loss: 0.2769 - accuracy: 0.9209 - val_loss: 0.1445 - val_accuracy: 0.9579 - 2s/epoch - 8ms/step Epoch 2/10 300/300 - 2s - loss: 0.1110 - accuracy: 0.9681 - val_loss: 0.1003 - val_accuracy: 0.9698 - 2s/epoch - 6ms/step Epoch 3/10 300/300 - 2s - loss: 0.0702 - accuracy: 0.9797 - val_loss: 0.0768 - val_accuracy: 0.9772 - 2s/epoch - 6ms/step Epoch 4/10 300/300 - 2s - loss: 0.0493 - accuracy: 0.9857 - val_loss: 0.0760 - val_accuracy: 0.9761 - 2s/epoch - 6ms/step Epoch 5/10 300/300 - 2s - loss: 0.0353 - accuracy: 0.9903 - val_loss: 0.0671 - val_accuracy: 0.9791 - 2s/epoch - 7ms/step Epoch 6/10 300/300 - 2s - loss: 0.0276 - accuracy: 0.9926 - val_loss: 0.0658 - val_accuracy: 0.9809 - 2s/epoch - 6ms/step Epoch 7/10 300/300 - 2s - loss: 0.0194 - accuracy: 0.9950 - val_loss: 0.0614 - val_accuracy: 0.9807 - 2s/epoch - 6ms/step Epoch 8/10 300/300 - 2s - loss: 0.0142 - accuracy: 0.9968 - val_loss: 0.0605 - val_accuracy: 0.9817 - 2s/epoch - 6ms/step Epoch 9/10 300/300 - 2s - loss: 0.0108 - accuracy: 0.9976 - val_loss: 0.0630 - val_accuracy: 0.9812 - 2s/epoch - 8ms/step Epoch 10/10 300/300 - 2s - loss: 0.0077 - accuracy: 0.9985 - val_loss: 0.0638 - val_accuracy: 0.9808 - 2s/epoch - 6ms/step Baseline Error: 1.92%
1.4 Simple Convolutional Neural Network for MNIST
Now that we have seen how to load the MNIST dataset and train a simple Multilayer Perceptron model on it, it is time to develop a more sophisticated convolutional neural network or CNN model. Keras does provide a lot of capability for creating convolutional neural networks. In this section we will create a simple CNN for MNIST that demonstrates how to use all of the aspects of a modern CNN implementation, including Convolutional layers, Pooling layers and Dropout layers. The first step is to import the classes and functions needed.
# Import classes and functions
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Convolution2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils

Again, we always initialize the random number generator to a constant seed value for reproducibility of results.
# Seed Random Number Generator
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
Next we need to load the MNIST dataset and reshape it so that it is suitable for use training a CNN. In Keras, the layers used for two-dimensional convolutions expect pixel values with the dimensions [channels][width][height]. In the case of RGB, the first dimension channels would be 3 for the red, green and blue components and it would be like having 3 image inputs for every color image. In the case of MNIST where the channels values are gray scale, the pixel dimension is set to 1.
# Load Dataset and Separate Into Train and Test Sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples] [channels][width][height]
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28).astype('float32')
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28).astype('float32')
As before, it is a good idea to normalize the pixel values to the range 0 and 1 and one hot encode the output variable.
# Normalize and One Hot Encode Data
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
Next we define our neural network model. Convolutional neural networks are more complex than standard Multilayer Perceptrons, so we will start by using a simple structure to begin with that uses all of the elements for state-of-the-art results. Below summarizes the network architecture. 1. The first hidden layer is a convolutional layer called a Convolution2D. The layer has 32 feature maps, which with the size of 5 ⇥ 5 and a rectifier activation function. This is the input layer, expecting images with the structure outline above.
2. Next we define a pooling layer that takes the maximum value called MaxPooling2D. It is configured with a pool size of 2 ⇥ 2.
3. The next layer is a regularization layer using dropout called Dropout. It is configured to randomly exclude 20% of neurons in the layer in order to reduce overfitting.
4. Next is a layer that converts the 2D matrix data to a vector called Flatten. It allows the output to be processed by standard fully connected layers.
5. Next a fully connected layer with 128 neurons and rectifier activation function is used.
6. Finally, the output layer has 10 neurons for the 10 classes and a softmax activation function to output probability-like predictions for each class.
As before, the model is trained using logarithmic loss and the ADAM gradient descent algorithm. A depiction of the network structure is provided below.

# Simple CNN for the MNIST Dataset
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers.convolutional import Convolution2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils
from keras import backend as K
K.set_image_data_format('channels_first')
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
# load data
(X_train, y_train),(X_test, y_test) = mnist.load_data()
# reshape to be [samples][channels][width][height]
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28).astype('float32')
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28).astype('float32')
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
# define a simple CNN model
def baseline_model():
# create model
model = Sequential()
model.add(Convolution2D(32, 5, 5, input_shape=(1, 28, 28),activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
return model
# build the model
model = baseline_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test),epochs=10, batch_size=200,verbose=2)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("CNN Error: %.2f%%" % (100-scores[1]*100))
19.6 Summary
In this lesson you discovered the MNIST handwritten digit recognition problem and deep learning models developed in Python using the Keras library that are capable of achieving excellent results. Working through this tutorial you learned:
- How to load the MNIST dataset in Keras and generate plots of the dataset.
- How to reshape the MNIST dataset and develop a simple but well performing Multilayer Perceptron model for the problem.
- How to use Keras to create convolutional neural network models for MNIST.
685

被折叠的 条评论
为什么被折叠?



