CNN物体分类识别(基于MATLAB)
1.数据集选择STL-10
1.1数据集简介
选择STL-10数据集,STL-10数据集基于CIFAR-10数据集进行修改,与CIFAR-10相比,每个类的训练集的数量更少,且图片有着更高的分辨率(96×96)。
该数据集包括10个类,分别为airplane,bird, car, cat, deer, dog, horse, monkey, ship, truck,训练集中每个类有500张训练图片,测试集中每个类有800张测试图片。
STL-10数据集官方链接: STL-10
1.2数据集处理
数据集下载后为二进制bin形式,需要将数据集转化为图片形式。
在Pycharm中新建一个binConvert目录,将下载好的stl10_binary文件夹粘贴进去。
新建convert.py
因为已经下载好数据集,所以注释掉了代码中下载数据集的部分,如果未下载,可以取消掉这部分的注释进行数据集的下载。
生成训练集图片
convert.py
from __future__ import print_function
import sys
import os, sys, tarfile, errno
import numpy as np
import matplotlib.pyplot as plt
if sys.version_info >= (3, 0, 0):
import urllib.request as urllib # ugly but works
else:
import urllib
try:
from imageio import imsave
except:
from scipy.misc import imsave
print(sys.version_info)
# image shape
HEIGHT = 96
WIDTH = 96
DEPTH = 3
# size of a single image in bytes
SIZE = HEIGHT * WIDTH * DEPTH
# path to the directory with the data
DATA_DIR = './stl10_binary'
# url of the binary data
DATA_URL = 'http://ai.stanford.edu/~acoates/stl10/stl10_binary.tar.gz'
# path to the binary train file with image data
DATA_PATH = './stl10_binary/train_X.bin'
# path to the binary train file with labels
LABEL_PATH = './stl10_binary/train_y.bin'
def read_labels(path_to_labels):
"""
:param path_to_labels: path to the binary file containing labels from the STL-10 dataset
:return: an array containing the labels
"""
with open(path_to_labels, 'rb') as f:
labels = np.fromfile(f, dtype=np.uint8)
return labels
def read_all_images(path_to_data):
"""
:param path_to_data: the file containing the binary images from the STL-10 dataset
:return: an array containing all the images
"""
with open(path_to_data, 'rb') as f:
# read whole file in uint8 chunks
everything = np.fromfile(f, dtype=np.uint8)
# We force the data into 3x96x96 chunks, since the
# images are stored in "column-major order", meaning
# that "the first 96*96 values are the red channel,
# the next 96*96 are green, and the last are blue."
# The -1 is since the size of the pictures depends
# on the input file, and this way numpy determines
# the size on its own.
images = np.reshape(everything, (-1, 3, 96, 96))
# Now transpose the images into a standard image format
# readable by, for example, matplotlib.imshow
# You might want to comment this line or reverse the shuffle
# if you will use a learning algorithm like CNN, since they like
# their channels separated.
images = np.transpose(images, (0, 3, 2, 1))
return images
def read_single_image(image_file):
"""
CAREFUL! - this method uses a file as input instead of the path - so the
position of the reader will be remembered outside of context of this method.
:param image_file: the open file containing the images
:return: a single image
"""
# read a single image, count determines the number of uint8's to read
image = np.fromfile(image_file, dtype=np.uint8, count=SIZE)
# force into image matrix
image = np.reshape(image, (3, 96, 96))
# transpose to standard format
# You might want to comment this line or reverse the shuffle
# if you will use a learning algorithm like CNN, since they like
# their channels separated.
image = np.transpose(image, (2, 1, 0))
return image
def plot_image(image):
"""
:param image: the image to be plotted in a 3-D matrix format
:return: None
"""
plt.imshow(image)
plt.show()
def save_image(image, name):
imsave("%s.png" % name, image, format="png")
# def download_and_extract():
# """
# Download and extract the STL-10 dataset
# :return: None
# """
# dest_directory = DATA_DIR
# if not os.path.exists(dest_directory):
# os.makedirs(dest_directory)
# filename = DATA_URL.split('/')[-1]
# filepath = os.path.join(dest_directory, filename)
# if not os.path.exists(filepath):
# def _progress(count, block_size, total_size):
# sys.stdout.write('\rDownloading %s %.2f%%' % (filename,
# float(count * block_size) / float(total_size) * 100.0))
# sys.stdout.flush()
#
# filepath, _ = urllib.urlretrieve(DATA_URL, filepath, reporthook=_progress)
# print('Downloaded', filename)
# tarfile.open(filepath, 'r:gz').extractall(dest_directory)
def save_images(images, labels):
print("Saving images to disk")
i = 0
for image in images:
label = labels[i]
directory = './img/' + str(label) + '/'
try:
os.makedirs(directory, exist_ok=True)
except OSError as exc:
if exc.errno == errno.EEXIST:
pass
filename = directory + str(i)
print(filename)
save_image(image, filename)
i = i + 1
if __name__ == "__main__":
# download data if needed
# download_and_extract()
# test to check if the image is read correctly
with open(DATA_PATH) as f:
image = read_single_image(f)
plot_image(image)
# test to check if the whole dataset is read correctly
images = read_all_images(DATA_PATH)
print(images.shape)
labels = read_labels(LABEL_PATH)
print(labels.shape)
# save images to disk
save_images(images, labels)
运行代码,会生成一个img文件夹,里面包含10个文件夹,分别为10个类,每个类中包含500张训练图片。
接下来生成测试集图片
删除生成的img文件夹
改变代码的34-38行:
# path to the binary train file with image data
DATA_PATH = './stl10_binary/test_X.bin'
# path to the binary train file with labels
LABEL_PATH = './stl10_binary/test_y.bin'
再次运行,会生成一个img文件夹,里面包含10个文件夹,分别为10个类,每个类中包含500张测试图片。
手动将生成的测试集和训练集的各个文件夹改为类名字,效果如下
数据集处理完毕,开始网络的设计
2.CNN网络设计
在Matlab的命令行窗口中输入net = vgg16和net = resnet18,将会输出已有的网络结构体,如果还没有安装此网络,则可根据提示进行添加。
在Matlab的命令行窗口中输入deepNetworkDesigner,将会弹出深度网络设计界面,可以加载已有的网络或自己重新设计网络。
在网络结构设计完毕后,可单击“Analyze”按钮进行自动检查分析并查看网络结构。
返回设计窗口单击 Export 按钮进行导出,选择导出网络结构代码,并将其选中、复制、生成新的函数文件,保存为函数来直接调用。
2.1对VggNet进行修改
若使用原始的VggNet,网络层数太多,过于复杂,会出现过拟合的现象,准确率不高。
在原始的VggNet网络基础上进行重新设计简化,减少神经网络的层数,并加一些dropout层,可以更好的拟合数据,建立的模型更适合此数据集,会有更高的准确率。
除此之外还要改变输入层的图片维度为96×96,并做数据增强。
get_vggnet.m
function [layers, lgraph] = get_vggnet()
layers = [
imageInputLayer([96 96 3],'Name','imageinput','DataAugmentation','randfliplr')
convolution2dLayer([5 5],64,'Name','conv_1',"Padding","same")
batchNormalizationLayer('Name','bn_1')
reluLayer('Name','relu_1')
maxPooling2dLayer([2 2],'Name','maxpool_1','Padding','same','Stride',[2 2])
convolution2dLayer([5 5],128,'Name','conv_2','Padding','same')
batchNormalizationLayer('Name','bn_2')
reluLayer('Name','relu_2')
maxPooling2dLayer([2 2],'Name','maxpool_2','Padding','same','Stride',[2 2])
convolution2dLayer([5 5],128,'Name','conv_3','Padding','same')
batchNormalizationLayer('Name','bn_3')
reluLayer('Name','relu_3')
dropoutLayer(0.4,'Name','dp_1')
maxPooling2dLayer([2 2],'Name','maxpool_3','Padding','same','Stride',[2 2])
convolution2dLayer([5 5],256,'Name','conv_4','Padding','same')
batchNormalizationLayer('Name','bn_4')
reluLayer('Name','relu_4')
dropoutLayer(0.4,'Name','dp_2')
maxPooling2dLayer([2 2],'Name','maxpool_4','Padding','same','Stride',[2 2])
convolution2dLayer([5 5],256,'Name','conv_5','Padding','same')
batchNormalizationLayer('Name','bn_5')
reluLayer('Name','relu_5')
dropoutLayer(0.4,'Name','dp_3')
maxPooling2dLayer([2 2],'Name','maxpool_5','Padding','same','Stride',[2 2])
dropoutLayer(0.5,'Name','dp_4')
fullyConnectedLayer(512,'Name','fc_1')
reluLayer('Name','relu_6')
fullyConnectedLayer(512,'Name','fc_2')
reluLayer('Name','relu_7')
dropoutLayer(0.5,'Name','dp_5')
fullyConnectedLayer(10,'Name','fc_3')
softmaxLayer('Name','softmax')
classificationLayer('Name','classoutput')];
lgraph = layerGraph(layers);
2.2对ResNet进行修改
在原始的ResNet的基础上加入一些dropout层,可以在一定程度上减轻过拟合,可以更好的拟合数据,会有更高的准确率。
除此之外还要改变输入层的图片维度为96×96,并做数据增强。
get_resnet.m
function [layers, lgraph] = get_resnet()
netWidth = 16;
layers = [
imageInputLayer([96 96 3],'Name','input','DataAugmentation','randfliplr')
convolution2dLayer(3,netWidth,'Padding','same','Name','convInp')
batchNormalizationLayer('Name','bn_res')
reluLayer('Name','relu_sp')
convolutionalUnit(netWidth,1,'conv_sa1')
additionLayer(2,'Name','add_11')
reluLayer('Name','relu_11')
convolutionalUnit(netWidth,1,'conv_sa2')
additionLayer(2,'Name','add_12')
reluLayer('Name','relu_12')
dropoutLayer(0.4,'Name','dp_1')
convolutionalUnit(2*netWidth,2,'conv_sc1')
additionLayer(2,'Name','add_21')
reluLayer('Name','relu_21')
convolutionalUnit(2*netWidth,1,'conv_sc2')
additionLayer(2,'Name','add_22')
reluLayer('Name','relu_22')
dropoutLayer(0.4,'Name','dp_2')
convolutionalUnit(4*netWidth,2,'conv_se1')
additionLayer(2,'Name','add_31')
reluLayer('Name','relu_31')
convolutionalUnit(4*netWidth,1,'conv_se2')
additionLayer(2,'Name','add_32')
reluLayer('Name','relu_32')
dropoutLayer(0.4,'Name','dp_3')
averagePooling2dLayer(8,'Name','globalPool')
dropoutLayer(0.5,'Name','dp_4')
fullyConnectedLayer(10,'Name','fcFinal')
softmaxLayer('Name','softmax')
classificationLayer('Name','classoutput')
];
lgraph = layerGraph(layers);
lgraph = connectLayers(lgraph,'relu_sp','add_11/in2');
lgraph = connectLayers(lgraph,'relu_11','add_12/in2');
skip1 = [
convolution2dLayer(1,2*netWidth,'Stride',2,'Name','skipConv1')
batchNormalizationLayer('Name','skipBN1')];
lgraph = addLayers(lgraph,skip1);
lgraph = connectLayers(lgraph,'relu_12','skipConv1');
lgraph = connectLayers(lgraph,'skipBN1','add_21/in2');
lgraph = connectLayers(lgraph,'relu_21','add_22/in2');
skip2 = [
convolution2dLayer(1,4*netWidth,'Stride',2,'Name','skipConv2')
batchNormalizationLayer('Name','skipBN2')];
lgraph = addLayers(lgraph,skip2);
lgraph = connectLayers(lgraph,'relu_22','skipConv2');
lgraph = connectLayers(lgraph,'skipBN2','add_31/in2');
lgraph = connectLayers(lgraph,'relu_31','add_32/in2');
layers = lgraph.Layers;
function layers = convolutionalUnit(numF,stride,tag)
layers = [
convolution2dLayer(3,numF,'Padding','same','Stride',stride,'Name',[tag,'conv1'])
batchNormalizationLayer('Name',[tag,'BN1'])
reluLayer('Name',[tag,'relu1'])
convolution2dLayer(3,numF,'Padding','same','Name',[tag,'conv2'])
batchNormalizationLayer('Name',[tag,'BN2'])];
3.进行训练
3.1用修改后的VggNet进行训练
options_train选项设置
options_train = trainingOptions('sgdm',...
'MaxEpochs',MaxEpochs,...
'InitialLearnRate',0.01,...
'L2Regularization', 0.01, ...
'Verbose',true,'MiniBatchSize', 128,...
'Shuffle','every-epoch',...
'Plots','training-progress',...
'ValidationData',handles.augimdsValidation , ...
'ValidationFrequency',10, ...
'ExecutionEnvironment', ExecutionEnvironment);
参数 | 值 |
---|---|
训练步数 | 100 |
学习率 | 0.01 |
批次数 | 128 |
L2正则化惩罚参数 | 0.01 |
训练效果 | |
![]() | |
准确率72.55% | |
![]() |
3.2用修改后的ResNet进行训练
options_train选项设置
options_train = trainingOptions('sgdm',...
'MaxEpochs',MaxEpochs,...
'InitialLearnRate',0.001,...
'L2Regularization', 0.01, ...
'Verbose',true,'MiniBatchSize', 128,...
'Shuffle','every-epoch',...
'Plots','training-progress',...
'ValidationData',handles.augimdsValidation , ...
'ValidationFrequency',10, ...
'ExecutionEnvironment', ExecutionEnvironment);
参数 | 值 |
---|---|
训练步数 | 100 |
学习率 | 0.001 |
批次数 | 128 |
L2正则化惩罚参数 | 0.01 |
训练效果 | |
![]() | |
准确率为65.36% | |
![]() |
4.Github代码及补充说明
代码已放到Github上: CNNItemRec-MATLAB
我的显卡为GTX1050 2G,由于显卡内存的限制,网络的设计受到限制,设计更复杂结构(如加入更多的dropout层)和加大卷积核数量等参数时候,会报显存不足的错误,无法进行训练。
如果显卡内存足够,可以加大卷积核的数量,并多加一些dropout层,可以得到更好的模型,更高的准确率。
注:若显卡配置低,会报错显存不足,则需要调小MiniBatchSize或减小卷积层convolution2dLayer和全连接层fullyConnectedLayer的参数,并减少dropout层的数量。
若想要达到极佳的模型效果,可以参考: 92.45% on CIFAR-10 in Torch
参考书籍:《计算机视觉与深度学习实战》