写前面,代码算是自己撸的,所以有些地方写的并不是很规范,大大吐槽轻点
一:VGG16模型
先上VGG16模型图,熟悉的朋友们肯定已经把这张图看烂了(没错,我也是在网上截的图)
简单点说,VGG16就是不停的卷积,最大池化,卷积,最大池化,来五次后,再来两个全连接层,最后输出一个1000的分类。
二:Cifar10数据集
数据集是公共数据集,下载下来的文件一共6个,五个训练集和一个测试集,一共6万张图片,共10个分类,以字典的方式保存的。其中图片是保存在关键字<b’data’>种,标签保存在<b’labels’ >中。<b’batch_label’>关键字是你调用的第几个cifar10文件(前面说了下载下来的文件一共6个)。
另外。图片数据保存的shape是(10000,3072),代表的意思就是10000张图片,也就是batch。3072=32x32x3则是图像的分辨率,所以cifar10数据集图片尺寸都是32*32的。
cifar数据集地址
import numpy as np
import tensorflow as tf
from PIL import Image
import matplotlib.pyplot as plt
from tensorflow.keras.utils import to_categorical
'''
返回的dict是一个字典类型,共包含dict_keys([b'batch_label', b'labels', b'data', b'filenames'])四个关键词
b'batch_label' :为cifar10的第几个文件
b'labels' :为所含样本标签,共10000个,shape为(10000,)
b'data' :为所含样本数据,共10000个,shape为(10000,3072),3072=32*32*3的图像
b'filenames' :为所含样本名
每一个文件包括1万张图片,10个类别,5个文件总共5万张图片10个类。还有1万张测试集图片10个类
'''
# 读取路径中的cifar10内容
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
def cifar10():
file1 = "E:/deeplearning/alldata/CIFAR-10/cifar-10-batches-py/data_batch_1"
file2 = "E:/deeplearning/alldata/CIFAR-10/cifar-10-batches-py/data_batch_2"
file3 = "E:/deeplearning/alldata/CIFAR-10/cifar-10-batches-py/data_batch_3"
file4 = "E:/deeplearning/alldata/CIFAR-10/cifar-10-batches-py/data_batch_4"
file5 = "E:/deeplearning/alldata/CIFAR-10/cifar-10-batches-py/data_batch_5"
file6 = "E:/deeplearning/alldata/CIFAR-10/cifar-10-batches-py/test_batch"
train1 = unpickle(file1)
train2 = unpickle(file2)
train3 = unpickle(file3)
train4 = unpickle(file4)
train5 = unpickle(file5)
test = unpickle(file6)
return train1, train2, train3, train4, train5, test
# 数据可视化显示,输入data为样本数据,i为第几张
def keshihua(data, i):
data = data[i]
image = data.reshape(3, 32, 32)
r = Image.fromarray(image[0])
g = Image.fromarray(image[1])
b = Image.fromarray(image[2])
pil_img = Image.merge('RGB', (r, g, b))
plt.imshow(pil_img)
def con_data():
train1, train2, train3, train4, train5, test = cifar10()
# 提取5个训练集和1个测试集的数据
data_train1 = train1[b'data']
data_train2 = train2[b'data']
data_train3 = train3[b'data']
data_train4 = train4[b'data']
data_train5 = train5[b'data']
data_test = test[b'data']
# 更改数据集尺寸
data_train1 = np.resize(data_train1, (10000, 3, 32, 32))
data_train2 = np.resize(data_train2, (10000, 3, 32, 32))
data_train3 = np.resize(data_train3, (10000, 3, 32, 32))
data_train4 = np.resize(data_train4, (10000, 3, 32, 32))
data_train5 = np.resize(data_train5, (10000, 3, 32, 32))
data_test = np.resize(data_test, (10000, 3, 32, 32))
data_train1 = tf.transpose(data_train1, (0, 2, 3, 1))
data_train2 = tf.transpose(data_train2, (0, 2, 3, 1))
data_train3 = tf.transpose(data_train3, (0, 2, 3, 1))
data_train4 = tf.transpose(data_train4, (0, 2, 3, 1))
data_train5 = tf.transpose(data_train5, (0, 2, 3, 1))
data_test = np.transpose(data_test, (0, 2, 3, 1))
data_test = np.array(data_test)/255.0
# 提取5个训练集和1个测试集的标签
label_train1 = train1[b'labels']
label_train2 = train2[b'labels']
label_train3 = train3[b'labels']
label_train4 = train4[b'labels']
label_train5 = train5[b'labels']
label_test = test[b'labels']
label_test = np.array(label_test)
# 合并5个训练集数据
data_train1 = np.array(np.concatenate((data_train1, data_train2, data_train3, data_train4, data_train5), axis=0))/255.0
# 合并5个训练集标签
label_train1.extend(label_train2)
label_train1.extend(label_train3)
label_train1.extend(label_train4)
label_train1.extend(label_train5)
label_train1 = np.array(label_train1)
return data_train1, label_train1, data_test, label_test
def pre_data():
data_train1, label_train1, data_test, label_test = con_data()
return data_train1, label_train1, data_test, label_test
pre_data()
简单说明一下代码需要修改的地方,cifar10函数里(代码27-32行)的file1~file6就是下载的cifar10文件夹的路径,需要根据你们自己的下载路径进行修改。
三:VGG16构建和训练
前面的模型也能看出来,VGG16直接一条道儿走到黑。所以我就直接顺着模型结构往下写了。
但是!!!
VGG16模型本身的输入尺寸是224x224的,cifar10的确实32x32,如果直接用32x32作为输入的话,在最后全连接层的部位就只有1x1了,所以我把vgg16的有些地方改了改,中间的最大池化部分我删除了三个,使得最后全连接层的尺寸变成了8x8,各位也可以自己试试改改其他部位的结构。
另一种让输入匹配的方式就是更改图片的尺寸,我个人的一个看法就是用resize将32x32改为224x224,我自己也试了试,跑是能跑,但是导致的后果就是数据量太大,完全没法运行。
import numpy as np
from tensorflow.keras.layers import Input
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.imagenet_utils import preprocess_input
from tensorflow.keras.applications.imagenet_utils import decode_predictions
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, BatchNormalization
from tensorflow.keras import Model
from openpyxl import load_workbook
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from cifar10 import pre_data
# weights_path = 'E:/deeplearning/code/vgg16_weights_tf_dim_ordering_tf_kernels.h5'
def vgg16():
image_input = Input(shape=(32, 32, 3))
# 进行两次64个3*3卷积,一次全局池化
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='conv1')(image_input)
x = BatchNormalization(axis=3)(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='conv2')(x)
x = BatchNormalization(axis=3)(x)
# x = MaxPooling2D((2, 2), strides=(2, 2), name='pool1')(x)
# 此时输出为112*112*64
# 进行两次128个3*3卷积, 一次全局池化
x = Conv2D(128, (3, 3), activation='relu', padding='same', name='conv3')(x)
x = BatchNormalization(axis=3)(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same', name='conv4')(x)
x = BatchNormalization(axis=3)(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='pool2')(x)
# 此时输出为56*56*128
# 进行三次256个3*3卷积,一次全局池化
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='conv5')(x)
x = BatchNormalization(axis=3)(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='conv6')(x)
x = BatchNormalization(axis=3)(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='conv7')(x)
x = BatchNormalization(axis=3)(x)
# x = MaxPooling2D((2, 2), strides=(2, 2), name='pool3')(x)
# 此时输出为28*28*256
# 进行三次512个3*3卷积,一次全局池化
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='conv8')(x)
x = BatchNormalization(axis=3)(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='conv9')(x)
x = BatchNormalization(axis=3)(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='conv10')(x)
x = BatchNormalization(axis=3)(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='pool4')(x)
# 此时输出为14*14*512
# 进行三次512个3*3卷积,一次全局池化
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='conv11')(x)
x = BatchNormalization(axis=3)(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='conv12')(x)
x = BatchNormalization(axis=3)(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='conv13')(x)
x = BatchNormalization(axis=3)(x)
# x = MaxPooling2D((2, 2), strides=(2, 2), name='pool5')(x)
# 此时输出为7*7*512
# 进行两次4096全连接
x = Flatten(name='flatten')(x)
x = Dense(4096, activation='relu', name='fc1')(x)
x = Dense(4096, activation='relu', name='fc2')(x)
x = Dense(10, activation='softmax', name='fc3')(x)
# 模型的创建及编译
model = Model(inputs=image_input, outputs=x)
model.summary()
# model.load_weights(weights_path)
return model
if __name__ == '__main__':
train_images, train_labels, test_images, test_labels = pre_data()
model = vgg16()
# img_path = 'C:/Users/00/Desktop/dog.jpg'
# img = image.load_img(img_path, target_size=(224, 224, 3))
# x = image.img_to_array(img)
# x = np.expand_dims(img, axis=0)
# x = preprocess_input(x)
# print('Input image shape:', x.shape)
# preds = model.predict(x)
# print('Predicted:', decode_predictions(preds))
# train_datagen = ImageDataGenerator(rescale=1. / 255, rotation_range=40, width_shift_range=0.2,
# height_shift_range=0.2,
# shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode='nearest')
# validation_datagen = ImageDataGenerator(rescale=1. / 255)
# train = train_datagen.flow(train_images, train_labels, batch_size=32, shuffle=True)
# test = validation_datagen.flow(test_images, test_labels, batch_size=32)
model.compile(optimizer='adam', loss="sparse_categorical_crossentropy", metrics='acc')
history = model.fit(train_images,
train_labels,
batch_size=32,
epochs=100,
validation_data=(test_images, test_labels),
validation_batch_size=32,
verbose=1)
model.save_weights('E:/deeplearning/code/vgg16.h5')
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
wb = load_workbook('C:/Users/00/Desktop/deeplearning.xlsx')
sheets = wb.worksheets # 获取当前所有的sheet
sheet1 = sheets[0] # 获取第1张sheet
for index, item in enumerate(acc):
sheet1.cell(row=index + 1, column=2).value = item
wb.save('C:/Users/00/Desktop/deeplearning.xlsx')
for index, item in enumerate(val_acc):
sheet1.cell(row=index + 1, column=3).value = item
wb.save('C:/Users/00/Desktop/deeplearning.xlsx')
for index, item in enumerate(loss):
sheet1.cell(row=index + 1, column=4).value = item
wb.save('C:/Users/00/Desktop/deeplearning.xlsx')
for index, item in enumerate(val_loss):
sheet1.cell(row=index + 1, column=5).value = item
wb.save('C:/Users/00/Desktop/deeplearning.xlsx')
还是简单解释一下代码。
vgg16函数是VGG16模型的主体结构。
模型的输入我改为了32x32,最后的softmax输出也改为了cifar10的10类输出。
模型结构中用了BN,这份代码还没有用dropout,大家可以自己添加一下。
代码中灰色的一大坨中,有一部分是数据增强的部分,不用它的原因是:用了数据增强,模型效果会变得极差。
代码中103行以后是对程序的结果进行保存。
三:运行结果
最后运行了100个epoch,训练集有接近100的准确率,测试集大概在80出头。因为没有用什么正则化的方法,所以过拟合了。
结果如下(图比较丑,用的wps的表格生成的):