4.3 CNN识别器实战
在现实应用中,通常使用CNN卷积神经网络实现物体识别功能。在本节的内容中,将通过两个具体实例讲解使用TensorFlow开发两个CNN物体识别器的知识。
4.3.1 创建CNN物体识别模型
请看下面的实例文件main.py,功能是基于CIFAR-10数据集开发一个CNN物体识别模型,能够分类出“飞机”、“汽车”、“鸟”、“猫”、“鹿”、“狗”、“青蛙”、“马”、“船”和“卡车”。CIFAR-10数据集共有60000张彩色图像,这些图像是32*32,分为10个类,每类6000张图。这里面有50000张用于训练,构成了5个训练批,每一批10000张图;另外10000用于测试,单独构成一批。测试批的数据里,取自10类中的每一类,每一类随机取1000张。抽剩下的就随机排列组成了训练批。注意一个训练批中的各类图像并不一定数量相同,总的来看训练批,每一类都有5000张图。下图4-9列举了10个类,每一类展示了随机的10张图片。
图4-9 CIFAR-10数据集中的图片
实例文件main.py的具体实现代码如下所示。
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
#将像素值格式化为介于0和1之间
train_images, test_images = train_images / 255.0, test_images / 255.0
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']
plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(train_images[i], cmap=plt.cm.binary)
# CIFAR标签恰好是数组,这是需要额外索引的原因
plt.xlabel(class_names[train_labels[i][0]])
plt.show()
#创建卷积
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
# Add Dense layers on top
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
model.summary()
# Compile and train the model
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10,
validation_data=(test_images, test_labels))
#评估模型
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.legend(loc='lower right')
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print(test_acc)
执行后会输出如下训练模型的过程,
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 30, 30, 32) 896
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 13, 13, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 4, 4, 64) 36928
_________________________________________________________________
flatten (Flatten) (None, 1024) 0
_________________________________________________________________
dense (Dense) (None, 64) 65600
_________________________________________________________________
dense_1 (Dense) (None, 10) 650
=================================================================
Total params: 122,570
Trainable params: 122,570
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
1563/1563 [==============================] - 249s 160ms/step - loss: 1.5445 - accuracy: 0.4354 - val_loss: 1.2372 - val_accuracy: 0.5517
Epoch 2/10
1563/1563 [==============================] - 257s 165ms/step - loss: 1.1574 - accuracy: 0.5893 - val_loss: 1.1760 - val_accuracy: 0.5857
Epoch 3/10
4.3.2 CNN服饰识别器
请看下面的实例文件clothing.py,功能是基于Fashion-MNIST数据集开发一个CNN服饰识别器能够分类出“T恤/上衣”、“裤子”、“套头衫”、“连衣裙”、“外套”、“凉鞋”、“衬衫”、“运动鞋”、“包”、“踝靴”。实例文件clothing.py的具体实现流程如下所示。
(1)导入数据集并训练模型,代码如下:
import tensorflow as tf
# Helper libraries
import numpy as np
import matplotlib.pyplot as plt
print(tf.__version__)
# 导入MNIST数据集
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
#标签
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
# 打印训练数据集的格式(60K图像,每个图像为28x28像素)
train_images.shape
len(train_labels)
train_labels # 每个标签都是0到9之间的整数(根据类名^)
#打印测试数据集格式(10K图像,每个图像28x28像素)
test_images.shape
len(test_labels)
#预处理图像(打印信息)
plt.figure()
plt.imshow(train_images[0])
plt.colorbar()
plt.grid(False)
plt.show()
#缩放值应在0到1之间
train_images = train_images / 255.0
test_images = test_images / 255.0
#使用标签打印训练集中的前25幅图像
plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(train_images[i], cmap=plt.cm.binary)
plt.xlabel(class_names[train_labels[i]])
plt.show()
#设置nn层
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)), # 第一层将2D 28x28阵列转换为1D 784阵列(“取消堆叠”阵列并将其排列为一个阵列)
tf.keras.layers.Dense(128, activation='relu'), # 第二层(致密层)有128个节点/神经元,每个节点/神经元都有表示当前图像类别的分数
tf.keras.layers.Dense(10) # 第三层(densed layer)返回长度为10的logits(线性输出)数组
])
#编译模型
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
#通过将标签拟合到训练图像来训练模型
model.fit(train_images, train_labels, epochs=10)
#在测试数据集上运行模型并评估性能
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print('\nTest accuracy:', test_acc)
# 附加softmax层将logits转换为概率,然后使用模型进行预测
probability_model = tf.keras.Sequential([model,
tf.keras.layers.Softmax()])
predictions = probability_model.predict(test_images)
#打印第一个预测(每个num表示模型对图像对应于类的信心)
predictions[0]
np.argmax(predictions[0]) #最可能的类别
test_labels[0] #实际类别
执行后输出如下训练过程,并使用标签展示训练集中的前25幅图像,如图4-10所示。
Epoch 1/10
1875/1875 [==============================] - 15s 8ms/step - loss: 0.4982 - accuracy: 0.8244
Epoch 2/10
1875/1875 [==============================] - 16s 9ms/step - loss: 0.3750 - accuracy: 0.8661
Epoch 3/10
1875/1875 [==============================] - 16s 8ms/step - loss: 0.3355 - accuracy: 0.8779
Epoch 4/10
1875/1875 [==============================] - 16s 8ms/step - loss: 0.3142 - accuracy: 0.8844
Epoch 5/10
1875/1875 [==============================] - 17s 9ms/step - loss: 0.2955 - accuracy: 0.8908
Epoch 6/10
1875/1875 [==============================] - 16s 9ms/step - loss: 0.2799 - accuracy: 0.8968
Epoch 7/10
1875/1875 [==============================] - 13s 7ms/step - loss: 0.2681 - accuracy: 0.9010
Epoch 8/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.2551 - accuracy: 0.9045
Epoch 9/10
1875/1875 [==============================] - 14s 7ms/step - loss: 0.2468 - accuracy: 0.9080
Epoch 10/10
1875/1875 [==============================] - 13s 7ms/step - loss: 0.2379 - accuracy: 0.9119
313/313 - 2s - loss: 0.3420 - accuracy: 0.8783
Test accuracy: 0.8783000111579895
图4-10 前25幅图像
(2)编写预测函数plot_image(),并绘制可视化图展示预测结果,其中蓝色表示预测正确,红色表示预测错误。代码如下:
# 图形预测
def plot_image(i, predictions_array, true_label, img):
true_label, img = true_label[i], img[i]
plt.grid(False)
plt.xticks([])
plt.yticks([])
plt.imshow(img, cmap=plt.cm.binary)
predicted_label = np.argmax(predictions_array)
if predicted_label == true_label:
color = 'blue'
else:
color = 'red'
plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
100*np.max(predictions_array),
class_names[true_label]),
color=color)
def plot_value_array(i, predictions_array, true_label):
true_label = true_label[i]
plt.grid(False)
plt.xticks(range(10))
plt.yticks([])
thisplot = plt.bar(range(10), predictions_array, color="#777777")
plt.ylim([0, 1])
predicted_label = np.argmax(predictions_array)
thisplot[predicted_label].set_color('red')
thisplot[true_label].set_color('blue')
#验证预测(蓝色=正确,红色=不正确)
i = 0
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions[i], test_labels, test_images)
plt.subplot(1,2,2)
plot_value_array(i, predictions[i], test_labels)
plt.show()
i = 12
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions[i], test_labels, test_images)
plt.subplot(1,2,2)
plot_value_array(i, predictions[i], test_labels)
plt.show()
# 绘制第一个X测试图像、其预测标签和真实标签。
# 蓝色显示正确预测,红色显示错误预测.
num_rows = 5
num_cols = 3
num_images = num_rows*num_cols
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for i in range(num_images):
plt.subplot(num_rows, 2*num_cols, 2*i+1)
plot_image(i, predictions[i], test_labels, test_images)
plt.subplot(num_rows, 2*num_cols, 2*i+2)
plot_value_array(i, predictions[i], test_labels)
plt.tight_layout()
plt.show()
#从测试数据集中获取图像
img = test_images[1]
print(img.shape)
#将图像添加到其为唯一成员的批处理中
img = (np.expand_dims(img,0))
print(img.shape)
#单幅图像的预测
predictions_single = probability_model.predict(img)
print(predictions_single)
np.argmax(predictions_single[0])
print(test_labels[1])
plot_value_array(1, predictions_single[0], test_labels)
_ = plt.xticks(range(10), class_names, rotation=45)
plt.show()
预测结果如图4-11所示。
图4-11 预测结果