CNN物体分类识别(基于MATLAB)

1.数据集选择STL-10

1.1数据集简介

选择STL-10数据集,STL-10数据集基于CIFAR-10数据集进行修改,与CIFAR-10相比,每个类的训练集的数量更少,且图片有着更高的分辨率(96×96)。
该数据集包括10个类,分别为airplane,bird, car, cat, deer, dog, horse, monkey, ship, truck,训练集中每个类有500张训练图片,测试集中每个类有800张测试图片。
STL-10数据集官方链接: STL-10

1.2数据集处理

数据集下载后为二进制bin形式,需要将数据集转化为图片形式。
在Pycharm中新建一个binConvert目录,将下载好的stl10_binary文件夹粘贴进去。
新建convert.py
因为已经下载好数据集,所以注释掉了代码中下载数据集的部分,如果未下载,可以取消掉这部分的注释进行数据集的下载。
生成训练集图片
convert.py

  from __future__ import print_function

import sys
import os, sys, tarfile, errno
import numpy as np
import matplotlib.pyplot as plt

if sys.version_info >= (3, 0, 0):
    import urllib.request as urllib  # ugly but works
else:
    import urllib

try:
    from imageio import imsave
except:
    from scipy.misc import imsave

print(sys.version_info)

# image shape
HEIGHT = 96
WIDTH = 96
DEPTH = 3

# size of a single image in bytes
SIZE = HEIGHT * WIDTH * DEPTH

# path to the directory with the data
DATA_DIR = './stl10_binary'

# url of the binary data
DATA_URL = 'http://ai.stanford.edu/~acoates/stl10/stl10_binary.tar.gz'

# path to the binary train file with image data
DATA_PATH = './stl10_binary/train_X.bin'

# path to the binary train file with labels
LABEL_PATH = './stl10_binary/train_y.bin'


def read_labels(path_to_labels):
    """
    :param path_to_labels: path to the binary file containing labels from the STL-10 dataset
    :return: an array containing the labels
    """
    with open(path_to_labels, 'rb') as f:
        labels = np.fromfile(f, dtype=np.uint8)
        return labels


def read_all_images(path_to_data):
    """
    :param path_to_data: the file containing the binary images from the STL-10 dataset
    :return: an array containing all the images
    """

    with open(path_to_data, 'rb') as f:
        # read whole file in uint8 chunks
        everything = np.fromfile(f, dtype=np.uint8)

        # We force the data into 3x96x96 chunks, since the
        # images are stored in "column-major order", meaning
        # that "the first 96*96 values are the red channel,
        # the next 96*96 are green, and the last are blue."
        # The -1 is since the size of the pictures depends
        # on the input file, and this way numpy determines
        # the size on its own.

        images = np.reshape(everything, (-1, 3, 96, 96))

        # Now transpose the images into a standard image format
        # readable by, for example, matplotlib.imshow
        # You might want to comment this line or reverse the shuffle
        # if you will use a learning algorithm like CNN, since they like
        # their channels separated.
        images = np.transpose(images, (0, 3, 2, 1))
        return images


def read_single_image(image_file):
    """
    CAREFUL! - this method uses a file as input instead of the path - so the
    position of the reader will be remembered outside of context of this method.
    :param image_file: the open file containing the images
    :return: a single image
    """
    # read a single image, count determines the number of uint8's to read
    image = np.fromfile(image_file, dtype=np.uint8, count=SIZE)
    # force into image matrix
    image = np.reshape(image, (3, 96, 96))
    # transpose to standard format
    # You might want to comment this line or reverse the shuffle
    # if you will use a learning algorithm like CNN, since they like
    # their channels separated.
    image = np.transpose(image, (2, 1, 0))
    return image


def plot_image(image):
    """
    :param image: the image to be plotted in a 3-D matrix format
    :return: None
    """
    plt.imshow(image)
    plt.show()


def save_image(image, name):
    imsave("%s.png" % name, image, format="png")


# def download_and_extract():
#     """
#     Download and extract the STL-10 dataset
#     :return: None
#     """
#     dest_directory = DATA_DIR
#     if not os.path.exists(dest_directory):
#         os.makedirs(dest_directory)
#     filename = DATA_URL.split('/')[-1]
#     filepath = os.path.join(dest_directory, filename)
#     if not os.path.exists(filepath):
#         def _progress(count, block_size, total_size):
#             sys.stdout.write('\rDownloading %s %.2f%%' % (filename,
#                                                           float(count * block_size) / float(total_size) * 100.0))
#             sys.stdout.flush()
#
#         filepath, _ = urllib.urlretrieve(DATA_URL, filepath, reporthook=_progress)
#         print('Downloaded', filename)
#         tarfile.open(filepath, 'r:gz').extractall(dest_directory)


def save_images(images, labels):
    print("Saving images to disk")
    i = 0
    for image in images:
        label = labels[i]
        directory = './img/' + str(label) + '/'
        try:
            os.makedirs(directory, exist_ok=True)
        except OSError as exc:
            if exc.errno == errno.EEXIST:
                pass
        filename = directory + str(i)
        print(filename)
        save_image(image, filename)
        i = i + 1


if __name__ == "__main__":
    # download data if needed
    # download_and_extract()

    # test to check if the image is read correctly
    with open(DATA_PATH) as f:
        image = read_single_image(f)
        plot_image(image)

    # test to check if the whole dataset is read correctly
    images = read_all_images(DATA_PATH)
    print(images.shape)

    labels = read_labels(LABEL_PATH)
    print(labels.shape)

    # save images to disk
    save_images(images, labels)

运行代码,会生成一个img文件夹,里面包含10个文件夹,分别为10个类,每个类中包含500张训练图片。

接下来生成测试集图片
删除生成的img文件夹
改变代码的34-38行:

# path to the binary train file with image data
DATA_PATH = './stl10_binary/test_X.bin'

# path to the binary train file with labels
LABEL_PATH = './stl10_binary/test_y.bin'

再次运行,会生成一个img文件夹,里面包含10个文件夹,分别为10个类,每个类中包含500张测试图片。
手动将生成的测试集和训练集的各个文件夹改为类名字,效果如下
在这里插入图片描述
数据集处理完毕,开始网络的设计

2.CNN网络设计

在Matlab的命令行窗口中输入net = vgg16和net = resnet18,将会输出已有的网络结构体,如果还没有安装此网络,则可根据提示进行添加。
在Matlab的命令行窗口中输入deepNetworkDesigner,将会弹出深度网络设计界面,可以加载已有的网络或自己重新设计网络。
在网络结构设计完毕后,可单击“Analyze”按钮进行自动检查分析并查看网络结构。
返回设计窗口单击 Export 按钮进行导出,选择导出网络结构代码,并将其选中、复制、生成新的函数文件,保存为函数来直接调用。

2.1对VggNet进行修改

若使用原始的VggNet,网络层数太多,过于复杂,会出现过拟合的现象,准确率不高。
在原始的VggNet网络基础上进行重新设计简化,减少神经网络的层数,并加一些dropout层,可以更好的拟合数据,建立的模型更适合此数据集,会有更高的准确率。
除此之外还要改变输入层的图片维度为96×96,并做数据增强。

get_vggnet.m

function [layers, lgraph] = get_vggnet()
layers = [
    imageInputLayer([96 96 3],'Name','imageinput','DataAugmentation','randfliplr')

    convolution2dLayer([5 5],64,'Name','conv_1',"Padding","same")
    batchNormalizationLayer('Name','bn_1')
    reluLayer('Name','relu_1')
    maxPooling2dLayer([2 2],'Name','maxpool_1','Padding','same','Stride',[2 2])

    convolution2dLayer([5 5],128,'Name','conv_2','Padding','same')
    batchNormalizationLayer('Name','bn_2')
    reluLayer('Name','relu_2')
    maxPooling2dLayer([2 2],'Name','maxpool_2','Padding','same','Stride',[2 2])

    convolution2dLayer([5 5],128,'Name','conv_3','Padding','same')
    batchNormalizationLayer('Name','bn_3')
    reluLayer('Name','relu_3')
    dropoutLayer(0.4,'Name','dp_1')
    maxPooling2dLayer([2 2],'Name','maxpool_3','Padding','same','Stride',[2 2])
    
    convolution2dLayer([5 5],256,'Name','conv_4','Padding','same')
    batchNormalizationLayer('Name','bn_4')
    reluLayer('Name','relu_4')
    dropoutLayer(0.4,'Name','dp_2')
    maxPooling2dLayer([2 2],'Name','maxpool_4','Padding','same','Stride',[2 2])

    convolution2dLayer([5 5],256,'Name','conv_5','Padding','same')
    batchNormalizationLayer('Name','bn_5')
    reluLayer('Name','relu_5')
    dropoutLayer(0.4,'Name','dp_3')
    maxPooling2dLayer([2 2],'Name','maxpool_5','Padding','same','Stride',[2 2])
    
    dropoutLayer(0.5,'Name','dp_4')
    fullyConnectedLayer(512,'Name','fc_1')
    reluLayer('Name','relu_6')
    fullyConnectedLayer(512,'Name','fc_2')
    reluLayer('Name','relu_7')
    dropoutLayer(0.5,'Name','dp_5')
    fullyConnectedLayer(10,'Name','fc_3')
    softmaxLayer('Name','softmax')
    classificationLayer('Name','classoutput')];

lgraph = layerGraph(layers);

2.2对ResNet进行修改

在原始的ResNet的基础上加入一些dropout层,可以在一定程度上减轻过拟合,可以更好的拟合数据,会有更高的准确率。
除此之外还要改变输入层的图片维度为96×96,并做数据增强。
get_resnet.m

function [layers, lgraph] = get_resnet()
netWidth = 16;
layers = [
    imageInputLayer([96 96 3],'Name','input','DataAugmentation','randfliplr')
    convolution2dLayer(3,netWidth,'Padding','same','Name','convInp')
    batchNormalizationLayer('Name','bn_res')
    reluLayer('Name','relu_sp')
    
    convolutionalUnit(netWidth,1,'conv_sa1')
    additionLayer(2,'Name','add_11')
    reluLayer('Name','relu_11')
    convolutionalUnit(netWidth,1,'conv_sa2')
    additionLayer(2,'Name','add_12')
    reluLayer('Name','relu_12')
    dropoutLayer(0.4,'Name','dp_1')
    
    convolutionalUnit(2*netWidth,2,'conv_sc1')
    additionLayer(2,'Name','add_21')
    reluLayer('Name','relu_21')
    convolutionalUnit(2*netWidth,1,'conv_sc2')
    additionLayer(2,'Name','add_22')
    reluLayer('Name','relu_22')
    dropoutLayer(0.4,'Name','dp_2')
    
    convolutionalUnit(4*netWidth,2,'conv_se1')
    additionLayer(2,'Name','add_31')
    reluLayer('Name','relu_31')
    convolutionalUnit(4*netWidth,1,'conv_se2')
    additionLayer(2,'Name','add_32')
    reluLayer('Name','relu_32')
    dropoutLayer(0.4,'Name','dp_3')
    
    averagePooling2dLayer(8,'Name','globalPool')
    dropoutLayer(0.5,'Name','dp_4')
    fullyConnectedLayer(10,'Name','fcFinal')
    softmaxLayer('Name','softmax')
    classificationLayer('Name','classoutput')
    ];
lgraph = layerGraph(layers);
lgraph = connectLayers(lgraph,'relu_sp','add_11/in2');
lgraph = connectLayers(lgraph,'relu_11','add_12/in2');
skip1 = [
    convolution2dLayer(1,2*netWidth,'Stride',2,'Name','skipConv1')
    batchNormalizationLayer('Name','skipBN1')];
lgraph = addLayers(lgraph,skip1);
lgraph = connectLayers(lgraph,'relu_12','skipConv1');
lgraph = connectLayers(lgraph,'skipBN1','add_21/in2');

lgraph = connectLayers(lgraph,'relu_21','add_22/in2');
skip2 = [
    convolution2dLayer(1,4*netWidth,'Stride',2,'Name','skipConv2')
    batchNormalizationLayer('Name','skipBN2')];
lgraph = addLayers(lgraph,skip2);
lgraph = connectLayers(lgraph,'relu_22','skipConv2');
lgraph = connectLayers(lgraph,'skipBN2','add_31/in2');
lgraph = connectLayers(lgraph,'relu_31','add_32/in2');

layers = lgraph.Layers;

function layers = convolutionalUnit(numF,stride,tag)
layers = [
    convolution2dLayer(3,numF,'Padding','same','Stride',stride,'Name',[tag,'conv1'])
    batchNormalizationLayer('Name',[tag,'BN1'])
    reluLayer('Name',[tag,'relu1'])
    convolution2dLayer(3,numF,'Padding','same','Name',[tag,'conv2'])
    batchNormalizationLayer('Name',[tag,'BN2'])];

3.进行训练

3.1用修改后的VggNet进行训练

options_train选项设置

options_train = trainingOptions('sgdm',...
    'MaxEpochs',MaxEpochs,...
    'InitialLearnRate',0.01,...
    'L2Regularization', 0.01, ...
    'Verbose',true,'MiniBatchSize', 128,...
    'Shuffle','every-epoch',...
    'Plots','training-progress',...
    'ValidationData',handles.augimdsValidation , ...
    'ValidationFrequency',10, ...
    'ExecutionEnvironment', ExecutionEnvironment);
参数
训练步数100
学习率0.01
批次数128
L2正则化惩罚参数0.01
训练效果
在这里插入图片描述
准确率72.55%
在这里插入图片描述

3.2用修改后的ResNet进行训练

options_train选项设置

options_train = trainingOptions('sgdm',...
    'MaxEpochs',MaxEpochs,...
    'InitialLearnRate',0.001,...
    'L2Regularization', 0.01, ...
    'Verbose',true,'MiniBatchSize', 128,...
    'Shuffle','every-epoch',...
    'Plots','training-progress',...
    'ValidationData',handles.augimdsValidation , ...
    'ValidationFrequency',10, ...
    'ExecutionEnvironment', ExecutionEnvironment);
参数
训练步数100
学习率0.001
批次数128
L2正则化惩罚参数0.01
训练效果
在这里插入图片描述
准确率为65.36%
在这里插入图片描述

4.Github代码及补充说明

代码已放到Github上: CNNItemRec-MATLAB
我的显卡为GTX1050 2G,由于显卡内存的限制,网络的设计受到限制,设计更复杂结构(如加入更多的dropout层)和加大卷积核数量等参数时候,会报显存不足的错误,无法进行训练。
如果显卡内存足够,可以加大卷积核的数量,并多加一些dropout层,可以得到更好的模型,更高的准确率。
注:若显卡配置低,会报错显存不足,则需要调小MiniBatchSize或减小卷积层convolution2dLayer和全连接层fullyConnectedLayer的参数,并减少dropout层的数量。
若想要达到极佳的模型效果,可以参考: 92.45% on CIFAR-10 in Torch
参考书籍:《计算机视觉与深度学习实战》

  • 14
    点赞
  • 77
    收藏
    觉得还不错? 一键收藏
  • 20
    评论
下面是一个使用MATLAB实现CNN分类的示例: 1. 加载数据集 首先,你需要加载数据集。可以使用MATLAB中的ImageDatastore函数来加载图像数据集。例如: ```matlab imds = imageDatastore('path_to_images_folder', 'IncludeSubfolders', true, 'LabelSource', 'foldernames'); ``` 这里,'path_to_images_folder'是包含所有图像的文件夹的路径,'IncludeSubfolders'参数告诉函数在所有子文件夹中查找图像,'LabelSource'参数告诉函数使用文件夹名称作为标签。 2. 数据预处理 接下来,你需要对数据进行预处理,以便它们可以作为CNN的输入。使用MATLAB中的augmentedImageDatastore函数可以对图像进行数据增强,例如翻转、旋转、缩放等。例如: ```matlab augimds = augmentedImageDatastore([224 224 3], imds); ``` 这里,[224 224 3]是CNN的输入大小,imds是ImageDatastore对象。 3. 构建CNN模型 可以使用MATLAB中的deepLearningToolbox来构建CNN模型。例如: ```matlab layers = [ imageInputLayer([224 224 3]) convolution2dLayer(3,64,'Padding','same') reluLayer maxPooling2dLayer(2,'Stride',2) convolution2dLayer(3,128,'Padding','same') reluLayer maxPooling2dLayer(2,'Stride',2) convolution2dLayer(3,256,'Padding','same') reluLayer convolution2dLayer(3,256,'Padding','same') reluLayer maxPooling2dLayer(2,'Stride',2) convolution2dLayer(3,512,'Padding','same') reluLayer convolution2dLayer(3,512,'Padding','same') reluLayer maxPooling2dLayer(2,'Stride',2) convolution2dLayer(3,512,'Padding','same') reluLayer convolution2dLayer(3,512,'Padding','same') reluLayer maxPooling2dLayer(2,'Stride',2) fullyConnectedLayer(4096) reluLayer dropoutLayer(0.5) fullyConnectedLayer(4096) reluLayer dropoutLayer(0.5) fullyConnectedLayer(num_classes) softmaxLayer classificationLayer]; ``` 这里,我们使用了VGG16模型的结构。 4. 训练模型 可以使用MATLAB中的trainNetwork函数来训练CNN模型。例如: ```matlab options = trainingOptions('sgdm', 'MaxEpochs', 10, 'InitialLearnRate', 0.001); net = trainNetwork(augimds, layers, options); ``` 这里,'sgdm'是一种优化算法,'MaxEpochs'是训练的最大轮数,'InitialLearnRate'是学习率。 5. 测试模型 可以使用MATLAB中的classify函数来测试模型。例如: ```matlab test_imds = imageDatastore('path_to_test_images_folder', 'IncludeSubfolders', true, 'LabelSource', 'foldernames'); test_augimds = augmentedImageDatastore([224 224 3], test_imds); labels = classify(net, test_augimds); ``` 这里,'path_to_test_images_folder'是包含测试图像的文件夹的路径。 这就是使用MATLAB实现CNN分类的步骤。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 20
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值