CNN物体分类识别（基于MATLAB）

slaythedragon

已于 2023-05-13 10:56:24 修改

阅读量3.4k

点赞数 14

分类专栏：深度学习文章标签：深度学习神经网络计算机视觉 matlab

于 2020-12-17 14:39:22 首次发布

本文链接：https://blog.csdn.net/kkill_youokk_/article/details/109842140

版权

深度学习专栏收录该内容

3 篇文章 1 订阅

订阅专栏

CNN物体分类识别（基于MATLAB）

1.数据集选择STL-10
- 1.1数据集简介
- 1.2数据集处理
2.CNN网络设计
- 2.1对VggNet进行修改
- 2.2对ResNet进行修改
3.进行训练
- 3.1用修改后的VggNet进行训练
- 3.2用修改后的ResNet进行训练
4.Github代码及补充说明

1.数据集选择STL-10

1.1数据集简介

选择STL-10数据集，STL-10数据集基于CIFAR-10数据集进行修改，与CIFAR-10相比，每个类的训练集的数量更少，且图片有着更高的分辨率（96×96）。
该数据集包括10个类，分别为airplane，bird, car, cat, deer, dog, horse, monkey, ship, truck，训练集中每个类有500张训练图片，测试集中每个类有800张测试图片。
STL-10数据集官方链接: STL-10

1.2数据集处理

数据集下载后为二进制bin形式，需要将数据集转化为图片形式。
在Pycharm中新建一个binConvert目录，将下载好的stl10_binary文件夹粘贴进去。
新建convert.py
因为已经下载好数据集，所以注释掉了代码中下载数据集的部分，如果未下载，可以取消掉这部分的注释进行数据集的下载。
生成训练集图片
convert.py

  from __future__ import print_function

import sys
import os, sys, tarfile, errno
import numpy as np
import matplotlib.pyplot as plt

if sys.version_info >= (3, 0, 0):
    import urllib.request as urllib  # ugly but works
else:
    import urllib

try:
    from imageio import imsave
except:
    from scipy.misc import imsave

print(sys.version_info)

# image shape
HEIGHT = 96
WIDTH = 96
DEPTH = 3

# size of a single image in bytes
SIZE = HEIGHT * WIDTH * DEPTH

# path to the directory with the data
DATA_DIR = './stl10_binary'

# url of the binary data
DATA_URL = 'http://ai.stanford.edu/~acoates/stl10/stl10_binary.tar.gz'

# path to the binary train file with image data
DATA_PATH = './stl10_binary/train_X.bin'

# path to the binary train file with labels
LABEL_PATH = './stl10_binary/train_y.bin'


def read_labels(path_to_labels):
    """
    :param path_to_labels: path to the binary file containing labels from the STL-10 dataset
    :return: an array containing the labels
    """
    with open(path_to_labels, 'rb') as f:
        labels = np.fromfile(f, dtype=np.uint8)
        return labels


def read_all_images(path_to_data):
    """
    :param path_to_data: the file containing the binary images from the STL-10 dataset
    :return: an array containing all the images
    """

    with open(path_to_data, 'rb') as f:
        # read whole file in uint8 chunks
        everything = np.fromfile(f, dtype=np.uint8)

        # We force the data into 3x96x96 chunks, since the
        # images are stored in "column-major order", meaning
        # that "the first 96*96 values are the red channel,
        # the next 96*96 are green, and the last are blue."
        # The -1 is since the size of the pictures depends
        # on the input file, and this way numpy determines
        # the size on its own.

        images = np.reshape(everything, (-1, 3, 96, 96))

        # Now transpose the images into a standard image format
        # readable by, for example, matplotlib.imshow
        # You might want to comment this line or reverse the shuffle
        # if you will use a learning algorithm like CNN, since they like
        # their channels separated.
        images = np.transpose(images, (0, 3, 2, 1))
        return images


def read_single_image(image_file):
    """
    CAREFUL! - this method uses a file as input instead of the path - so the
    position of the reader will be remembered outside of context of this method.
    :param image_file: the open file containing the images
    :return: a single image
    """
    # read a single image, count determines the number of uint8's to read
    image = np.fromfile(image_file, dtype=np.uint8, count=SIZE)
    # force into image matrix
    image = np.reshape(image, (3, 96, 96))
    # transpose to standard format
    # You might want to comment this line or reverse the shuffle
    # if you will use a learning algorithm like CNN, since they like
    # their channels separated.
    image = np.transpose(image, (2, 1, 0))
    return image


def plot_image(image):
    """
    :param image: the image to be plotted in a 3-D matrix format
    :return: None
    """
    plt.imshow(image)
    plt.show()


def save_image(image, name):
    imsave("%s.png" % name, image, format="png")


# def download_and_extract():
#     """
#     Download and extract the STL-10 dataset
#     :return: None
#     """
#     dest_directory = DATA_DIR
#     if not os.path.exists(dest_directory):
#         os.makedirs(dest_directory)
#     filename = DATA_URL.split('/')[-1]
#     filepath = os.path.join(dest_directory, filename)
#     if not os.path.exists(filepath):
#         def _progress(count, block_size, total_size):
#             sys.stdout.write('\rDownloading %s %.2f%%' % (filename,
#                                                           float(count * block_size) / float(total_size) * 100.0))
#             sys.stdout.flush()
#
#         filepath, _ = urllib.urlretrieve(DATA_URL, filepath, reporthook=_progress)
#         print('Downloaded', filename)
#         tarfile.open(filepath, 'r:gz').extractall(dest_directory)


def save_images(images, labels):
    print("Saving images to disk")
    i = 0
    for image in images:
        label = labels[i]
        directory = './img/' + str(label) + '/'
        try:
            os.makedirs(directory, exist_ok=True)
        except OSError as exc:
            if exc.errno == errno.EEXIST:
                pass
        filename = directory + str(i)
        print(filename)
        save_image(image, filename)
        i = i + 1


if __name__ == "__main__":
    # download data if needed
    # download_and_extract()

    # test to check if the image is read correctly
    with open(DATA_PATH) as f:
        image = read_single_image(f)
        plot_image(image)

    # test to check if the whole dataset is read correctly
    images = read_all_images(DATA_PATH)
    print(images.shape)

    labels = read_labels(LABEL_PATH)
    print(labels.shape)

    # save images to disk
    save_images(images, labels)

运行代码，会生成一个img文件夹，里面包含10个文件夹，分别为10个类，每个类中包含500张训练图片。

接下来生成测试集图片
删除生成的img文件夹
改变代码的34-38行：

# path to the binary train file with image data
DATA_PATH = './stl10_binary/test_X.bin'

# path to the binary train file with labels
LABEL_PATH = './stl10_binary/test_y.bin'

再次运行，会生成一个img文件夹，里面包含10个文件夹，分别为10个类，每个类中包含500张测试图片。
手动将生成的测试集和训练集的各个文件夹改为类名字，效果如下
在这里插入图片描述
数据集处理完毕，开始网络的设计

2.CNN网络设计

在Matlab的命令行窗口中输入net = vgg16和net = resnet18，将会输出已有的网络结构体，如果还没有安装此网络，则可根据提示进行添加。
在Matlab的命令行窗口中输入deepNetworkDesigner，将会弹出深度网络设计界面，可以加载已有的网络或自己重新设计网络。
在网络结构设计完毕后，可单击“Analyze”按钮进行自动检查分析并查看网络结构。
返回设计窗口单击 Export 按钮进行导出，选择导出网络结构代码，并将其选中、复制、生成新的函数文件，保存为函数来直接调用。

2.1对VggNet进行修改

若使用原始的VggNet，网络层数太多，过于复杂，会出现过拟合的现象，准确率不高。
在原始的VggNet网络基础上进行重新设计简化，减少神经网络的层数，并加一些dropout层，可以更好的拟合数据，建立的模型更适合此数据集，会有更高的准确率。
除此之外还要改变输入层的图片维度为96×96，并做数据增强。

get_vggnet.m

function [layers, lgraph] = get_vggnet()
layers = [
    imageInputLayer([96 96 3],'Name','imageinput','DataAugmentation','randfliplr')

    convolution2dLayer([5 5],64,'Name','conv_1',"Padding","same")
    batchNormalizationLayer('Name','bn_1')
    reluLayer('Name','relu_1')
    maxPooling2dLayer([2 2],'Name','maxpool_1','Padding','same','Stride',[2 2])

    convolution2dLayer([5 5],128,'Name','conv_2','Padding','same')
    batchNormalizationLayer('Name','bn_2')
    reluLayer('Name','relu_2')
    maxPooling2dLayer([2 2],'Name','maxpool_2','Padding','same','Stride',[2 2])

    convolution2dLayer([5 5],128,'Name','conv_3','Padding','same')
    batchNormalizationLayer('Name','bn_3')
    reluLayer('Name','relu_3')
    dropoutLayer(0.4,'Name','dp_1')
    maxPooling2dLayer([2 2],'Name','maxpool_3','Padding','same','Stride',[2 2])
    
    convolution2dLayer([5 5],256,'Name','conv_4','Padding','same')
    batchNormalizationLayer('Name','bn_4')
    reluLayer('Name','relu_4')
    dropoutLayer(0.4,'Name','dp_2')
    maxPooling2dLayer([2 2],'Name','maxpool_4','Padding','same','Stride',[2 2])

    convolution2dLayer([5 5],256,'Name','conv_5','Padding','same')
    batchNormalizationLayer('Name','bn_5')
    reluLayer('Name','relu_5')
    dropoutLayer(0.4,'Name','dp_3')
    maxPooling2dLayer([2 2],'Name','maxpool_5','Padding','same','Stride',[2 2])
    
    dropoutLayer(0.5,'Name','dp_4')
    fullyConnectedLayer(512,'Name','fc_1')
    reluLayer('Name','relu_6')
    fullyConnectedLayer(512,'Name','fc_2')
    reluLayer('Name','relu_7')
    dropoutLayer(0.5,'Name','dp_5')
    fullyConnectedLayer(10,'Name','fc_3')
    softmaxLayer('Name','softmax')
    classificationLayer('Name','classoutput')];

lgraph = layerGraph(layers);

2.2对ResNet进行修改

在原始的ResNet的基础上加入一些dropout层，可以在一定程度上减轻过拟合，可以更好的拟合数据，会有更高的准确率。
除此之外还要改变输入层的图片维度为96×96，并做数据增强。
get_resnet.m

function [layers, lgraph] = get_resnet()
netWidth = 16;
layers = [
    imageInputLayer([96 96 3],'Name','input','DataAugmentation','randfliplr')
    convolution2dLayer(3,netWidth,'Padding','same','Name','convInp')
    batchNormalizationLayer('Name','bn_res')
    reluLayer('Name','relu_sp')
    
    convolutionalUnit(netWidth,1,'conv_sa1')
    additionLayer(2,'Name','add_11')
    reluLayer('Name','relu_11')
    convolutionalUnit(netWidth,1,'conv_sa2')
    additionLayer(2,'Name','add_12')
    reluLayer('Name','relu_12')
    dropoutLayer(0.4,'Name','dp_1')
    
    convolutionalUnit(2*netWidth,2,'conv_sc1')
    additionLayer(2,'Name','add_21')
    reluLayer('Name','relu_21')
    convolutionalUnit(2*netWidth,1,'conv_sc2')
    additionLayer(2,'Name','add_22')
    reluLayer('Name','relu_22')
    dropoutLayer(0.4,'Name','dp_2')
    
    convolutionalUnit(4*netWidth,2,'conv_se1')
    additionLayer(2,'Name','add_31')
    reluLayer('Name','relu_31')
    convolutionalUnit(4*netWidth,1,'conv_se2')
    additionLayer(2,'Name','add_32')
    reluLayer('Name','relu_32')
    dropoutLayer(0.4,'Name','dp_3')
    
    averagePooling2dLayer(8,'Name','globalPool')
    dropoutLayer(0.5,'Name','dp_4')
    fullyConnectedLayer(10,'Name','fcFinal')
    softmaxLayer('Name','softmax')
    classificationLayer('Name','classoutput')
    ];
lgraph = layerGraph(layers);
lgraph = connectLayers(lgraph,'relu_sp','add_11/in2');
lgraph = connectLayers(lgraph,'relu_11','add_12/in2');
skip1 = [
    convolution2dLayer(1,2*netWidth,'Stride',2,'Name','skipConv1')
    batchNormalizationLayer('Name','skipBN1')];
lgraph = addLayers(lgraph,skip1);
lgraph = connectLayers(lgraph,'relu_12','skipConv1');
lgraph = connectLayers(lgraph,'skipBN1','add_21/in2');

lgraph = connectLayers(lgraph,'relu_21','add_22/in2');
skip2 = [
    convolution2dLayer(1,4*netWidth,'Stride',2,'Name','skipConv2')
    batchNormalizationLayer('Name','skipBN2')];
lgraph = addLayers(lgraph,skip2);
lgraph = connectLayers(lgraph,'relu_22','skipConv2');
lgraph = connectLayers(lgraph,'skipBN2','add_31/in2');
lgraph = connectLayers(lgraph,'relu_31','add_32/in2');

layers = lgraph.Layers;

function layers = convolutionalUnit(numF,stride,tag)
layers = [
    convolution2dLayer(3,numF,'Padding','same','Stride',stride,'Name',[tag,'conv1'])
    batchNormalizationLayer('Name',[tag,'BN1'])
    reluLayer('Name',[tag,'relu1'])
    convolution2dLayer(3,numF,'Padding','same','Name',[tag,'conv2'])
    batchNormalizationLayer('Name',[tag,'BN2'])];

3.进行训练

3.1用修改后的VggNet进行训练

options_train选项设置

options_train = trainingOptions('sgdm',...
    'MaxEpochs',MaxEpochs,...
    'InitialLearnRate',0.01,...
    'L2Regularization', 0.01, ...
    'Verbose',true,'MiniBatchSize', 128,...
    'Shuffle','every-epoch',...
    'Plots','training-progress',...
    'ValidationData',handles.augimdsValidation , ...
    'ValidationFrequency',10, ...
    'ExecutionEnvironment', ExecutionEnvironment);

参数	值
训练步数	100
学习率	0.01
批次数	128
L2正则化惩罚参数	0.01
训练效果

准确率72.55%

3.2用修改后的ResNet进行训练

options_train选项设置

options_train = trainingOptions('sgdm',...
    'MaxEpochs',MaxEpochs,...
    'InitialLearnRate',0.001,...
    'L2Regularization', 0.01, ...
    'Verbose',true,'MiniBatchSize', 128,...
    'Shuffle','every-epoch',...
    'Plots','training-progress',...
    'ValidationData',handles.augimdsValidation , ...
    'ValidationFrequency',10, ...
    'ExecutionEnvironment', ExecutionEnvironment);

参数	值
训练步数	100
学习率	0.001
批次数	128
L2正则化惩罚参数	0.01
训练效果

准确率为65.36%

4.Github代码及补充说明

代码已放到Github上: CNNItemRec-MATLAB
我的显卡为GTX1050 2G，由于显卡内存的限制，网络的设计受到限制，设计更复杂结构（如加入更多的dropout层）和加大卷积核数量等参数时候，会报显存不足的错误，无法进行训练。
如果显卡内存足够，可以加大卷积核的数量，并多加一些dropout层，可以得到更好的模型，更高的准确率。
注：若显卡配置低，会报错显存不足，则需要调小MiniBatchSize或减小卷积层convolution2dLayer和全连接层fullyConnectedLayer的参数，并减少dropout层的数量。
若想要达到极佳的模型效果，可以参考： 92.45% on CIFAR-10 in Torch
参考书籍：《计算机视觉与深度学习实战》

slaythedragon

关注

14
点赞
踩
77

收藏

觉得还不错? 一键收藏
20
评论
CNN物体分类识别（基于MATLAB）

CNN物体图像分类识别（基于MATLAB）STL-10数据集选择数据集简介数据集处理CNN网络设计对VggNet进行修改对ResNet进行修改进行训练用修改后的VggNet进行训练训练效果用修改后的ResNet进行训练训练效果STL-10数据集选择数据集简介选择STL-10数据集，STL-10数据集基于CIFAR-10数据集进行修改，与CIFAR-10相比，每个类的训练集的数量更少，且图片有着更高的分辨率（96×96）。该数据集包括10个类，分别为airplane，bird, car, cat, d
复制链接

扫一扫