手写字的识别——卷积神经网络

最新推荐文章于 2024-09-22 22:20:05 发布

黑桃5200

最新推荐文章于 2024-09-22 22:20:05 发布

阅读量1.6k

点赞数 3

分类专栏： Python 机器学习算法 Tensorflow 深度学习

本文链接：https://blog.csdn.net/Heitao5200/article/details/89303663

版权

Python 同时被 3 个专栏收录

72 篇文章 0 订阅

订阅专栏

机器学习算法

38 篇文章 11 订阅

订阅专栏

深度学习

13 篇文章 1 订阅

订阅专栏

介绍

数据集：MNIST手写数字集
训练集：42,000个0-9手写数字的图像
测试集：有28,000个无label样本
每个图像的大小是28×28=784个像素
目标：使用卷积神经网络识别图像是什么数字

导入相关包

# Python的内置垃圾收集。用来删除一些变量，并收集必要的空间来保存RAM。
import gc 
# 用来生成随机数。
import random as rd 
#用来检查运行时间。
import time 
# 在数据增强部分，我们使用圆周率旋转图像。
from math import pi 
# 用Keras来构建我们的CNN模型。它使用TensorFlow作为后端。
import keras 
# 绘制手写的数字图像。
import matplotlib.pyplot as plt 
# 矩阵操作。
import numpy as np 
# 操作数据，比如加载和输出
import pandas as pd
# 用TensorFlow作为数据增强部分
import tensorflow as tf
# 用来建立学习速率衰减的模型
from keras.callbacks import ReduceLROnPlateau, EarlyStopping
# 构建CNN所需要的一些基本构件。 
from keras.layers import (BatchNormalization, Conv2D, Dense, Dropout, Flatten,
                          MaxPool2D, ReLU)
# 图像显示。
from PIL import Image
# 将数据分解为训练和验证两部分。
from sklearn.model_selection import train_test_split
%matplotlib inline

Using TensorFlow backend.

数据处理

导入数据

print("Loading...")
path = "E:/机器学习/Tensorflow学习/data/"
data_train = pd.read_csv(path + "train.csv",engine="python")
data_test = pd.read_csv(path + "test.csv",engine="python")
print("Done!")

Loading...
Done!

查看数据集的大小

print("Training data: {} rows, {} columns.".format(data_train.shape[0], data_train.shape[1]))
print("Test data: {} rows, {} columns.".format(data_test.shape[0], data_test.shape[1]))

Training data: 42000 rows, 785 columns.
Test data: 28000 rows, 784 columns.

训练集有42000行，785列，其中包括784个像素和一个标签，标注了这张图片是什么数字。

测试数据有28000行，没有标签。

数据集拆分成x（图像数据）和y（标签）

x_train = data_train.values[:, 1:]
y_train = data_train.values[:, 0]

def convert_2d(x):
    """x: 2d numpy array. m*n data image.
       return a 3d image data. m * height * width * channel."""
    if len(x.shape) == 1:
        m = 1
        height = width = int(np.sqrt(x.shape[0]))
    else:
        m = x.shape[0]
        height = width = int(np.sqrt(x.shape[1]))

    x_2d = np.reshape(x, (m, height, width, 1))
    
    return x_2d

查看图像

x_display = convert_2d(data_train.values[0, 1:])
plt.imshow(x_display.squeeze())

<matplotlib.image.AxesImage at 0x22b013e7780>

在这里插入图片描述

数据增强

在这里，我们直接研究数据增强。
当您没有足够的数据或想要扩展数据以提高性能时，数据增强是一种非常有用的技术。
在这场比赛中，数据增强基本上是指在不损害图像可识别性的前提下，对图像进行切割、旋转和缩放。
这里我使用了缩放、平移、白噪声和旋转。
随着数据的增加，您可以预期1-2%的准确性提高。

放大

使用crop_image函数来裁剪围绕中心的图像的一部分，调整其大小并将其保存为增强数据。

def crop_image(x, y, min_scale):
    """x: 2d(m*n) numpy array. 1-dimension image data;
       y: 1d numpy array. The ground truth label;
       min_scale: float. The minimum scale for cropping.
       return zoomed images.
    # 该函数对图像进行裁剪，放大裁剪后的部分，并将其作为增强数据"""
    # 将数据转换为二维图像。图像应该是一个m*h*w*c数字数组。
    images = convert_2d(x)
    # m是图像的个数。由于这是从0到255的灰度图像，所以它只有一个通道。
    m, height, width, channel = images.shape
    
    # 原始图像的tf张量
    img_tensor = tf.placeholder(tf.int32, [1, height, width, channel])
    # tf tensor for 4 coordinates for corners of the cropped image
    box_tensor = tf.placeholder(tf.float32, [1, 4])
    box_idx = [0]
    crop_size = np.array([height, width])
    # 裁剪并调整图像张量
    cropped_img_tensor = tf.image.crop_and_resize(img_tensor, box_tensor, box_idx, crop_size)
    # numpy array for the cropped image
    cropped_img = np.zeros((m, height, width, 1))

    with tf.Session() as sess:

        for i in range(m):
            
            # randomly select a scale between [min_scale, min(min_scale + 0.05, 1)]
            rand_scale = np.random.randint(min_scale * 100, np.minimum(min_scale * 100 + 5, 100)) / 100
            # calculate the 4 coordinates
            x1 = y1 = 0.5 - 0.5 * rand_scale
            x2 = y2 = 0.5 + 0.5 * rand_scale
            # lay down the cropping area
            box = np.reshape(np.array([y1, x1, y2, x2]), (1, 4))
            # save the cropped image
            cropped_img[i:i + 1, :, :, :] = sess.run(cropped_img_tensor, feed_dict={img_tensor: images[i:i + 1], box_tensor: box})
    
    # flat the 2d image
    cropped_img = np.reshape(cropped_img, (m, -1))
    cropped_img = np.concatenate((y.reshape((-1, 1)), cropped_img), axis=1).astype(int)

    return cropped_img

平移

def translate(x, y, dist):
    """x: 2d(m*n) numpy array. 1-dimension image data;
       y: 1d numpy array. The ground truth label;
       dist: float. Percentage of height/width to shift.
       return translated images.
       这个函数将图像移动到4个不同的方向。
       裁剪图像的一部分，移动，用0填充左边的部分"""
    # 将一维图像数据转换为m*h*w*c数组
    images = convert_2d(x)
    m, height, width, channel = images.shape
    
    # set 4 groups of anchors. The first 4 int in a certain group lay down the area we crop.
    # The last 4 sets the area to be moved to. E.g.,
    # new_img[new_top:new_bottom, new_left:new_right] = img[top:bottom, left:right]
    anchors = []
    anchors.append((0, height, int(dist * width), width, 0, height, 0, width - int(dist * width)))
    anchors.append((0, height, 0, width - int(dist * width), 0, height, int(dist * width), width))
    anchors.append((int(dist * height), height, 0, width, 0, height - int(dist * height), 0, width))
    anchors.append((0, height - int(dist * height), 0, width, int(dist * height), height, 0, width))
    
    # new_images: d*m*h*w*c array. The first dimension is the 4 directions.
    new_images = np.zeros((4, m, height, width, channel))
    for i in range(4):
        # shift the image
        top, bottom, left, right, new_top, new_bottom, new_left, new_right = anchors[i]
        new_images[i, :, new_top:new_bottom, new_left:new_right, :] = images[:, top:bottom, left:right, :]
    
    new_images = np.reshape(new_images, (4 * m, -1))
    y = np.tile(y, (4, 1)).reshape((-1, 1))
    new_images = np.concatenate((y, new_images), axis=1).astype(int)

    return new_images

添加白噪声

现在我们给图像添加一些白噪声。我们随机选取一些像素，用均匀分布的噪声代替它们。

def add_noise(x, y, noise_lvl):
    """x: 2d(m*n) numpy array. 1-dimension image data;
       y: 1d numpy array. The ground truth label;
       noise_lvl: float. Percentage of pixels to add noise in.
       return images with white noise.
       This function randomly picks some pixels and replace them with noise."""
    m, n = x.shape
    # calculate the # of pixels to add noise in
    noise_num = int(noise_lvl * n)

    for i in range(m):
        # generate n random numbers, sort it and choose the first noise_num indices
        # which equals to generate random numbers w/o replacement
        noise_idx = np.random.randint(0, n, n).argsort()[:noise_num]
        # replace the chosen pixels with noise from 0 to 255
        x[i, noise_idx] = np.random.randint(0, 255, noise_num)

    noisy_data = np.concatenate((y.reshape((-1, 1)), x), axis=1).astype("int")

    return noisy_data

旋转

def rotate_image(x, y, max_angle):
    """x: 2d(m*n) numpy array. 1-dimension image data;
       y: 1d numpy array. The ground truth label;
       max_angle: int. The maximum degree for rotation.
       return rotated images.
       This function rotates the image for some random degrees(0.5 to 1 * max_angle degree)."""
    images = convert_2d(x)
    m, height, width, channel = images.shape
    
    img_tensor = tf.placeholder(tf.float32, [m, height, width, channel])
    
    # half of the images are rotated clockwise. The other half counter-clockwise
    # positive angle: [max/2, max]
    # negative angle: [360-max/2, 360-max]
    rand_angle_pos = np.random.randint(max_angle / 2, max_angle, int(m / 2))
    rand_angle_neg = np.random.randint(-max_angle, -max_angle / 2, m - int(m / 2)) + 360
    rand_angle = np.transpose(np.hstack((rand_angle_pos, rand_angle_neg)))
    np.random.shuffle(rand_angle)
    # convert the degree to radian
    rand_angle = rand_angle / 180 * pi
    
    # rotate the images
    rotated_img_tensor = tf.contrib.image.rotate(img_tensor, rand_angle)

    with tf.Session() as sess:
        rotated_imgs = sess.run(rotated_img_tensor, feed_dict={img_tensor: images})
    
    rotated_imgs = np.reshape(rotated_imgs, (m, -1))
    rotated_imgs = np.concatenate((y.reshape((-1, 1)), rotated_imgs), axis=1)
    
    return rotated_imgs

合并

start = time.clock()
print("Augment the data...")
cropped_imgs = crop_image(x_train, y_train, 0.9)
translated_imgs = translate(x_train, y_train, 0.1)
noisy_imgs = add_noise(x_train, y_train, 0.1)
rotated_imgs = rotate_image(x_train, y_train, 10)

data_train = np.vstack((data_train, cropped_imgs, translated_imgs, noisy_imgs, rotated_imgs))
np.random.shuffle(data_train)
print("Done!")
time_used = int(time.clock() - start)
print("Time used: {}s.".format(time_used))

G:\Anaconda\lib\site-packages\ipykernel_launcher.py:1: DeprecationWarning: time.clock has been deprecated in Python 3.3 and will be removed from Python 3.8: use time.perf_counter or time.process_time instead
  """Entry point for launching an IPython kernel.


Augment the data...

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

Done!
Time used: 26s.


G:\Anaconda\lib\site-packages\ipykernel_launcher.py:11: DeprecationWarning: time.clock has been deprecated in Python 3.3 and will be removed from Python 3.8: use time.perf_counter or time.process_time instead
  # This is added back by InteractiveShellApp.init_path()

数据准备

检查数据

x_train = data_train[:, 1:]
y_train = data_train[:, 0]
x_test = data_test.values
print("Augmented training data: {} rows, {} columns.".format(data_train.shape[0], data_train.shape[1]))

Augmented training data: 336000 rows, 785 columns.

使用数据增强之后的训练数据总共有33万6千行，是原来的8倍。

向量转化为一个矩阵

因为CNN接受的是输入是二维的图像，我们需要将向量转化为一个矩阵
格式： $m (图像数量) \times h (图像高度) \times w (图像宽度) \times c (图像通道数量)$

x_train = convert_2d(x_train)
x_test = convert_2d(x_test)

将类别型数据转换成哑变量

num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)

为了加快CNN优化速度，缩小像素值的范围。

x_train = x_train / 255
x_test = x_test / 255

划分训练集，验证集

为了验证模型的好坏，用sklearn提供的一个函数来将数据按照9:1进行分割，90%为训练集，10%为验证集

# generate a random seed for train-test-split
seed = np.random.randint(1, 100)
x_train, x_dev, y_train, y_dev = train_test_split(x_train, y_train, test_size=0.1, random_state=seed)

清理内存

del data_train
del data_test
gc.collect()

搭建CNN模型

一个普通的CNN通常包括三种类型的层，卷积层，池化层和全连接层。
我还在模型中添加了标准化层和dropout层。

这里使用了5×5的卷积核，而不是3×3的。5×5的卷积核感受野更大，效果更好。
这里的批量归一化放在了ReLU激活函数之后，当然也可以放在激活函数之前。
Dropout使用了0.2的drop概率，意味着在Dropout层的输入中20%的像素点会被重置为0。

# 每个卷积层的信道数。 
filters = (32, 32, 64, 64)
# 每个conv层使用一个5x5内核
kernel = (5, 5)
# 在Dropout层的输入中20%的像素点会被重置为0。
drop_prob = 0.2

model = keras.models.Sequential()

model.add(Conv2D(filters[0], kernel, padding="same", input_shape=(28, 28, 1),
                 kernel_initializer=keras.initializers.he_normal()))
model.add(BatchNormalization())
model.add(ReLU())
model.add(Conv2D(filters[0], kernel, padding="same",
                 kernel_initializer=keras.initializers.he_normal()))
model.add(BatchNormalization())
model.add(ReLU())
model.add(MaxPool2D())
model.add(Dropout(drop_prob))

model.add(Conv2D(filters[1], kernel, padding="same",
                 kernel_initializer=keras.initializers.he_normal()))
model.add(BatchNormalization())
model.add(ReLU())
model.add(MaxPool2D())
model.add(Dropout(drop_prob))

model.add(Conv2D(filters[2], kernel, padding="same",
                 kernel_initializer=keras.initializers.he_normal()))
model.add(BatchNormalization())
model.add(ReLU())
model.add(MaxPool2D())
model.add(Dropout(drop_prob))

model.add(Conv2D(filters[3], kernel, padding="same",
                 kernel_initializer=keras.initializers.he_normal()))
model.add(BatchNormalization())
model.add(ReLU())
model.add(MaxPool2D())
model.add(Dropout(drop_prob))

# several fully-connected layers after the conv layers
model.add(Flatten())
model.add(Dropout(drop_prob))
model.add(Dense(128, activation="relu"))
model.add(Dropout(drop_prob))
model.add(Dense(num_classes, activation="softmax"))
# use the Adam optimizer to accelerate convergence
model.compile(keras.optimizers.Adam(), "categorical_crossentropy", metrics=["accuracy"])

WARNING:tensorflow:From G:\Anaconda\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From G:\Anaconda\lib\site-packages\keras\backend\tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.

查看模型架构

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 28, 28, 32)        832       
_________________________________________________________________
batch_normalization_1 (Batch (None, 28, 28, 32)        128       
_________________________________________________________________
re_lu_1 (ReLU)               (None, 28, 28, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 28, 28, 32)        25632     
_________________________________________________________________
batch_normalization_2 (Batch (None, 28, 28, 32)        128       
_________________________________________________________________
re_lu_2 (ReLU)               (None, 28, 28, 32)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 14, 14, 32)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 14, 14, 32)        25632     
_________________________________________________________________
batch_normalization_3 (Batch (None, 14, 14, 32)        128       
_________________________________________________________________
re_lu_3 (ReLU)               (None, 14, 14, 32)        0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 7, 7, 32)          0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 7, 7, 32)          0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 7, 7, 64)          51264     
_________________________________________________________________
batch_normalization_4 (Batch (None, 7, 7, 64)          256       
_________________________________________________________________
re_lu_4 (ReLU)               (None, 7, 7, 64)          0         
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 3, 3, 64)          0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 3, 3, 64)          0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 3, 3, 64)          102464    
_________________________________________________________________
batch_normalization_5 (Batch (None, 3, 3, 64)          256       
_________________________________________________________________
re_lu_5 (ReLU)               (None, 3, 3, 64)          0         
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 1, 1, 64)          0         
_________________________________________________________________
dropout_4 (Dropout)          (None, 1, 1, 64)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 64)                0         
_________________________________________________________________
dropout_5 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               8320      
_________________________________________________________________
dropout_6 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1290      
=================================================================
Total params: 216,330
Trainable params: 215,882
Non-trainable params: 448
_________________________________________________________________

The list above is the structure of my CNN model. It goes:

(Conv-ReLU-BatchNormalization-MaxPooling-Dropout) x 4;
3 fully-connected(dense) layers with 1 dropout layer. Dense(64)-Dense(128)-Dropout-Dense(with softmax activation).
In CNN people often use 3x3 or 5x5 kernel. I found that with a 5x5 kernel, the model’s accuracy improved about 0.125%, which is quite a lot when you pass 99% threshold.
Convolutional layers and max pooling layers can extract some high-level traits from the pixels. With the ReLU unit the and max pooling, we also add non-linearity into the network;
Batch normalization helps the network converge faster since it keeps the input of every layer at the same scale;
Dropout layers help us prevent overfitting by randomly drop some of the input units. With dropout our model won’t overfit to some specific extreme data or some noisy pixels;
The Adam optimizer also accelerates the optimization. Usually when the dataset is too large, we use mini-batch gradient descent or stochastic gradient descent to save some training time. The randomness in MBGD or SGD means that the steps towards the optimum are zig-zag rather than straight forward. Adam, or Adaptive Moment Estimation, uses exponential moving average on the gradients and the secend moment of gradients to make the steps straight and in turn accelerate the optimization.

训练CNN

# number of epochs we run
iters = 100
# batch size. Number of images we train before we take one step in MBGD.
batch_size = 1024

当我们接近最佳状态时，我们需要降低学习速度以防止过度学习。高学习率会使我们远离最佳状态。因此，当验证数据的准确性不再提高时，我将这个学习率衰减设置为降低它。

# monitor: :要监视的数量。当它不再显著改善时，我们就降低了学习速度
# factor: 新学习率=旧学习率 * factor
# patience:在降低学习速度之前，我们要等待的时间
# verbose: 是否显示信息
# min_lr: 最小的学习率

lr_decay = ReduceLROnPlateau(monitor="val_acc", factor=0.5, patience=3, verbose=1, min_lr=1e-5)
# 如果模型在验证数据上没有得到任何改善，可以设置早期停止，以防止过度拟合，并节省一些时间。当监控量没有提高时，提前停止训练。
early_stopping = EarlyStopping(monitor="val_acc", patience=7, verbose=1)

训练模型

print("Training model...")
fit_params = {
    "batch_size": batch_size,
    "epochs": iters,
    "verbose": 1,
    "callbacks": [lr_decay, early_stopping],
    "validation_data": (x_dev, y_dev)     # data for monitoring the model accuracy
}
model.fit(x_train, y_train, **fit_params)
print("Done!")

Training model...
WARNING:tensorflow:From G:\Anaconda\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Train on 302400 samples, validate on 33600 samples
Epoch 1/100
  3072/302400 [..............................] - ETA: 32:43 - loss: 2.6548 - acc: 0.1156

模型评估

model.evaluate(x_dev, y_dev)

33600/33600 [==============================] - 3s 75us/step





[0.0018058670724439621, 0.9994047619047619]

evaluate这个方法会输出两个值，第一个是当期的损失函数值，第二个是模型的准确率。我们可以看到，模型的准确率在验证集上达到了99.84%！

输出预测

y_pred = model.predict(x_test, batch_size=batch_size)
y_pred = np.argmax(y_pred, axis=1).reshape((-1, 1))
idx = np.reshape(np.arange(1, len(y_pred) + 1), (len(y_pred), -1))
y_pred = np.hstack((idx, y_pred))
y_pred = pd.DataFrame(y_pred, columns=['ImageId', 'Label'])
y_pred.to_csv('y_pred.csv', index=False)