使用Inception V4训练自己的数据集

最新推荐文章于 2024-07-10 11:33:38 发布

置顶 YYLin-AI

最新推荐文章于 2024-07-10 11:33:38 发布

阅读量7k

点赞数 7

分类专栏：深度学习之图像识别经典图像识别模型文章标签：深度学习图像识别 Inception v4

版权归世界上所有无产阶级所有

本文链接：https://blog.csdn.net/qq_41776781/article/details/94476538

版权

深度学习之图像识别同时被 2 个专栏收录

9 篇文章 8 订阅

订阅专栏

经典图像识别模型

7 篇文章 23 订阅

订阅专栏

前言：

Inception v1-v4是谷歌推出的一系列产品。这节我们主要介绍一下Inception v1-v4各个模型的特点，并在最后实现一下使用Inception v4进行卫星图像的分类。在这里谈一下我对Inception系列的粗浅的认识。我们可以看到之前一系列深度卷积模型只是在关注如何在不过拟合的情况下加深网络的结构。但是他们却不关注网络的宽度，而Inception系列不仅着手加深网络的深度而且还要加深网络的宽度。看到这里不熟悉Inception 系列的人肯定不知道我说的是啥啦。 okay我们看一下大牛们是如何解释Inception 系列的

Inception v1-v4各个模型的特点：

第一： Inception 系列整体的特点是什么？

我的理解是：之前的网络就是一层层卷积，并把结果输入到下一层，但是人家Inception不同，人家定义一个模块，模块里面进行不同的卷积操作，最后把不同的卷积操作拼接之后作为输出。为什么这样做呢？因为实验结果表明效果好啊。

知乎上一篇文章中的说的是:

GoogLeNet 最大的特点就是使用了 Inception 模块，它的目的是设计一种具有优良局部拓扑结构的网络，即对输入图像并行地执行多个卷积运算或池化操作，并将所有输出结果拼接为一个非常深的特征图。因为 1*1、3*3 或 5*5 等不同的卷积运算与池化操作可以获得输入图像的不同信息，并行处理这些运算并结合所有结果将获得更好的图像表征。

第二： Inception 系列各自的特点是什么？每个模型解决了什么问题？

这种问题肯定要专业人士来回答啦！！！！！！！！！！！！

先上Paper列表：

[v1] Going Deeper with Convolutions, 6.67% test error, http://arxiv.org/abs/1409.4842
[v2] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 4.8% test error, http://arxiv.org/abs/1502.03167
[v3] Rethinking the Inception Architecture for Computer Vision, 3.5% test error, http://arxiv.org/abs/1512.00567
[v4] Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning, 3.08% test error, http://arxiv.org/abs/1602.07261

大体思路：

Inception v1的网络，将1x1，3x3，5x5的conv和3x3的pooling，stack在一起，一方面增加了网络的width，另一方面增加了网络对尺度的适应性；
v2的网络在v1的基础上，进行了改进，一方面了加入了BN层，减少了Internal Covariate Shift（内部neuron的数据分布发生变化），使每一层的输出都规范化到一个N(0, 1)的高斯，另外一方面学习VGG用2个3x3的conv替代inception模块中的5x5，既降低了参数数量，也加速计算；
v3一个最重要的改进是分解（Factorization），将7x7分解成两个一维的卷积（1x7,7x1），3x3也是一样（1x3,3x1），这样的好处，既可以加速计算（多余的计算能力可以用来加深网络），又可以将1个conv拆成2个conv，使得网络深度进一步增加，增加了网络的非线性，还有值得注意的地方是网络输入从224x224变为了299x299，更加精细设计了35x35/17x17/8x8的模块；
v4研究了Inception模块结合Residual Connection能不能有改进？发现ResNet的结构可以极大地加速训练，同时性能也有提升，得到一个Inception-ResNet v2网络，同时还设计了一个更深更优化的Inception v4模型，能达到与Inception-ResNet v2相媲美的性能。

Inception v4的架构图：

这个之所以和之前介绍的模型优点不一样，是因为Inception系列将大部分的操作封装在某个层中，这个在后面的代码中有所体现。左边的图为粗略的结构图，右边的图为枝干图。这个就参考一下知乎上的文章，https://zhuanlan.zhihu.com/p/52802896

同样首先是程序的主程序：

代码没有太多的修改，表明自己写的模板可适性还是比较不错的，因为简单，所以可适性比较强。

# -*- coding: utf-8 -*-
# @Time    : 2019/7/2 20:39
# @Author  : YYLin
# @Email   : 854280599@qq.com
# @File    : Inception_v4_train.py
import inception_V4
import tensorflow as tf
import os
import cv2
import numpy as np
from keras.utils import to_categorical

# os.environ['CUDA_VISIBLE_DEVICES'] = "-1"
# 定义一些模型中所需要的参数
batch_size = 16
img_high = 100
img_width = 100
Channel = 3
label = 9

# 定义一些仅仅用于 Densenet 的超参
growth_k = 12
nb_block = 2

# 定义输入图像的占位符
inputs = tf.placeholder(tf.float32, [batch_size, img_high, img_width, Channel], name='inputs')
y = tf.placeholder(dtype=tf.float32, shape=[batch_size, label], name='label')
keep_prob = tf.placeholder("float")
is_train = tf.placeholder(tf.bool)


# 第三个参数表示最后的类别 不能使用占位符
score = inception_V4.inference(inputs, batch_size, label)
softmax_result = tf.nn.softmax(score)

# 定义损失函数 以及相对应的优化器
cross_entropy = -tf.reduce_sum(y*tf.log(softmax_result))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

# 显示最后预测的结果
correct_prediction = tf.equal(tf.argmax(softmax_result, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))


# 只需要最后一步 如何加载数据集 参考之前的AC-GAN 今天晚上完成！！！！！！！！！！！
# 现在的我只需要加载图像和对应的label即可 不需要加载text中的内容
def load_satetile_image(batch_size=128, dataset='train'):
    img_list = []
    label_list = []
    dir_counter = 0

    if dataset == 'train':
        path = '../Dataset/baidu/train_image/train'

        # 对路径下的所有子文件夹中的所有jpg文件进行读取并存入到一个list中
        for child_dir in os.listdir(path):
            child_path = os.path.join(path, child_dir)
            for dir_image in os.listdir(child_path):
                img = cv2.imread(os.path.join(child_path, dir_image))
                img = img/255.0
                img_list.append(img)
                label_list.append(dir_counter)

            dir_counter += 1
    else:
        path = '../Dataset/baidu/valid_image/valid'

        # 对路径下的所有子文件夹中的所有jpg文件进行读取并存入到一个list中
        for child_dir in os.listdir(path):
            child_path = os.path.join(path, child_dir)
            for dir_image in os.listdir(child_path):
                img = cv2.imread(os.path.join(child_path, dir_image))
                img = img / 255.0
                img_list.append(img)
                label_list.append(dir_counter)

            dir_counter += 1

    # 返回的img_list转成了 np.array的格式
    X_train = np.array(img_list)
    Y_train = to_categorical(label_list, 9)
    # print('to_categorical之后Y_train的类型和形状:', type(Y_train), Y_train.shape)

    # 加载数据的时候 重新排序
    # print('X_train.shape, Y_train.shape:', X_train.shape, Y_train.shape)
    data_index = np.arange(X_train.shape[0])
    np.random.shuffle(data_index)
    data_index = data_index[:batch_size]
    x_batch = X_train[data_index, :, :, :]
    y_batch = Y_train[data_index, :]

    return x_batch, y_batch


# 开始feed 数据并且训练数据
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(500000//batch_size):
        # 加载训练集和验证集
        img, img_label = load_satetile_image(batch_size, dataset='train')
        img_valid, img_valid_label = load_satetile_image(batch_size, dataset='vaild')
        # print('使用 mnist.train.next_batch加载的数据集形状', img.shape, type(img))

        # print('模型使用的是dropout的模型')
        dropout_rate = 0.5
        # print('经过 tf.reshape之后数据的形状以及类型是:', img.shape, type(img))
        if i % 20 == 0:
            train_accuracy = accuracy.eval(feed_dict={inputs: img, y: img_label, keep_prob: dropout_rate})
            print("step %d, training accuracy %g" % (i, train_accuracy))
        train_step.run(feed_dict={inputs: img, y: img_label, keep_prob: dropout_rate})

        # 输出验证集上的结果
        if i % 50 == 0:
            dropout_rate = 1
            valid_socre = accuracy.eval(feed_dict={inputs: img_valid, y: img_valid_label, keep_prob: dropout_rate})
            print("step %d, valid accuracy %g" % (i, valid_socre))

然后是本节的核心代码： Inception v4

第一： Inception v4代码比较咱们就直接按照整体的命名来看吧，从上面的左图来看和程序主要部分的命名，我们可以看到 inception_A、reduction_A、inception_B、reduction_B、inception_C，主要模块是正确的。

第二：初始的函数(inference)卷积核大小为 3 * 3 步长为1 * 1 和上图的右边的步长是有点不一致的。说句实话因为Inception是封装成块的，具体的还真不好说。但是看看生成结果和之前的相差不大，所以这个模型可以直接用没问题的。

# -*- coding: utf-8 -*-
# @Time    : 2019/7/2 9:57
# @Author  : YYLin
# @Email   : 854280599@qq.com
# @File    : inception_V4.py
# 参考的代码链接 https://github.com/cena001plus/inception 有训练集和测试集
import tensorflow as tf


# 定义变量函数初始化函数
def define_variable(shape, name):
    return tf.Variable(tf.truncated_normal(shape,stddev=0.1),name)


# 最大下采样操作
def max_pool(name, l_input, k1, k2):
    return tf.nn.max_pool(l_input, ksize=[1, k1, k1, 1], strides=[1, k2, k2, 1], padding='SAME', name=name)


# network structure: inception_A
def inception_A(input):
    p1f11 = 96
    p2f11 = 64
    p2f22 = 96
    p3f11 = 64
    p3f22 = 96
    p3f33 = 96
    p4f11 = 96
    path1 = tf.layers.conv2d(input, p1f11, 1, padding='same', activation=tf.nn.relu)
    path2 = tf.layers.conv2d(input, p2f11, 1, padding='same', activation=tf.nn.relu)
    path2 = tf.layers.conv2d(path2, p2f22, 3, padding='same', activation=tf.nn.relu)
    path3 = tf.layers.conv2d(input, p3f11, 1, padding='same', activation=tf.nn.relu)
    path3 = tf.layers.conv2d(path3, p3f22, 3, padding='same', activation=tf.nn.relu)
    path3 = tf.layers.conv2d(path3, p3f33, 3, padding='same', activation=tf.nn.relu)
    path4 = tf.layers.average_pooling2d(input, pool_size=3, strides=1, padding='same')
    path4 = tf.layers.conv2d(path4, p4f11, 1, padding='same', activation=tf.nn.relu)
    out = tf.concat((path1, path2, path3, path4), axis=-1)
    return out


# network structure: Reduction_A
def reduction_A(input):
    channel = 384
    path1 = tf.layers.max_pooling2d(input, pool_size=3, strides=2, padding='same')
    path2 = tf.layers.conv2d(input, channel, 3, strides=2, padding='same', activation=tf.nn.relu)
    path3 = tf.layers.conv2d(input, channel, 1, padding='same', activation=tf.nn.relu)
    path3 = tf.layers.conv2d(path3, channel, 3, padding='same', activation=tf.nn.relu)
    path3 = tf.layers.conv2d(path3, channel, 3, strides=2, padding='same', activation=tf.nn.relu)
    out = tf.concat((path1, path2, path3), axis=-1)
    return out


# network structure: inception_B
def inception_B(input):
    p1f11 = 384
    p2f11 = 192
    p2f22 = 224
    p2f33 = 256
    p3f11 = 192
    p3f22 = 192
    p3f33 = 224
    p3f44 = 224
    p3f55 = 256
    p4f11 = 128
    path1 = tf.layers.conv2d(input, p1f11, 1, padding='same', activation=tf.nn.relu)
    path2 = tf.layers.conv2d(input, p2f11, 1, padding='same', activation=tf.nn.relu)
    path2 = tf.layers.conv2d(path2, p2f22, [1, 7], padding='same',activation=tf.nn.relu)
    path2 = tf.layers.conv2d(path2, p2f33, [7, 1], padding='same', activation=tf.nn.relu)
    path3 = tf.layers.conv2d(input, p3f11, 1, padding='same',activation=tf.nn.relu)
    path3 = tf.layers.conv2d(path3, p3f22, [1, 7], padding='same',activation=tf.nn.relu)
    path3 = tf.layers.conv2d(path3, p3f33, [7, 1], padding='same',activation=tf.nn.relu)
    path3 = tf.layers.conv2d(path3, p3f44, [1, 7], padding='same',activation=tf.nn.relu)
    path3 = tf.layers.conv2d(path3, p3f55, [7, 1], padding='same',activation=tf.nn.relu)
    path4 = tf.layers.average_pooling2d(input, pool_size=3, strides=1, padding='same')
    path4 = tf.layers.conv2d(path4, p4f11, 1, padding='same',activation=tf.nn.relu)
    out = tf.concat((path1, path2, path3, path4), axis=-1)
    return out


# network structure: Reduction_B
def reduction_B( input):
    p2f11 = 192
    p2f22 = 192
    p3f11 = 256
    p3f22 = 256
    p3f33 = 320
    p3f44 = 320
    path1 = tf.layers.max_pooling2d(input, pool_size=3, strides=2, padding='same')
    path2 = tf.layers.conv2d(input, p2f11, 1,  padding='same', activation=tf.nn.relu)
    path2 = tf.layers.conv2d(path2, p2f22, 3, strides=2, padding='same', activation=tf.nn.relu)
    path3 = tf.layers.conv2d(input, p3f11, 1, padding='same', activation=tf.nn.relu)
    path3 = tf.layers.conv2d(path3, p3f22, [1, 7], padding='same', activation=tf.nn.relu)
    path3 = tf.layers.conv2d(path3, p3f33, [7, 1], padding='same', activation=tf.nn.relu)
    path3 = tf.layers.conv2d(path3, p3f44, 3, strides=2, padding='same', activation=tf.nn.relu)
    out = tf.concat((path1, path2, path3), axis=-1)
    return out


# network structure: inception_C
def inception_C(input):
    p1f11 = 256
    p2f11 = 384
    p2f11_1 = 256
    p2f11_2 = 256
    p3f11 = 384
    p3f22 = 448
    p3f33 = 512
    p3f33_1 = 256
    p3f33_2 = 256
    p4f11 = 256
    path1 = tf.layers.conv2d(input, p1f11, 1, padding='same', activation=tf.nn.relu)
    path2 = tf.layers.conv2d(input, p2f11, 1, padding='same',activation=tf.nn.relu)
    path2_1 = tf.layers.conv2d(path2, p2f11_1, [1, 3], padding='same', activation=tf.nn.relu)
    path2_2 = tf.layers.conv2d(path2, p2f11_2, [3, 1], padding='same', activation=tf.nn.relu)
    path3 = tf.layers.conv2d(input, p3f11, 1, padding='same', activation=tf.nn.relu)
    path3 = tf.layers.conv2d(path3, p3f22, [1, 3], padding='same', activation=tf.nn.relu)
    path3 = tf.layers.conv2d(path3, p3f33, [3, 1], padding='same', activation=tf.nn.relu)
    path3_1 = tf.layers.conv2d(path3, p3f33_1, [3, 1], padding='same', activation=tf.nn.relu)
    path3_2 = tf.layers.conv2d(path3, p3f33_2, [1, 3], padding='same', activation=tf.nn.relu)
    path4 = tf.layers.average_pooling2d(input, pool_size=3, strides=1, padding='same')
    path4 = tf.layers.conv2d(path4, p4f11, 1, padding='same', activation=tf.nn.relu)
    out = tf.concat((path1, path2_1, path2_2, path3_1, path3_2, path4), axis=-1)
    return out


# 网络结构定义
# 输入参数：images，image batch、4D tensor、tf.float32、[batch_size, width, height, channels]
# 返回参数：logits, float、 [batch_size, n_classes]
def inference(images, batch_size,  n_classes):
    w_conv1 = define_variable([3, 3, 3, 192], name="W")
    b_conv1 = define_variable([192], name="B")
    conv1 = tf.nn.conv2d(images, w_conv1, strides=[1, 1, 1, 1], padding='SAME')
    relu1 = tf.nn.relu(conv1 + b_conv1)

    pool1 = tf.nn.max_pool(relu1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],padding='SAME')

    # inception_A
    # 35*35 grid
    inception_a1 = inception_A(pool1)
    inception_a2 = inception_A(inception_a1)
    inception_a3 = inception_A(inception_a2)
    inception_a4 = inception_A(inception_a3)

    # reduction_A
    # from 35*35 to 17*17
    reduction_a = reduction_A(inception_a4)

    # inception_B
    # 17*17 grid
    inception_b1 = inception_B(reduction_a)
    inception_b2 = inception_B(inception_b1)
    inception_b3 = inception_B(inception_b2)
    inception_b4 = inception_B(inception_b3)
    inception_b5 = inception_B(inception_b4)
    inception_b6 = inception_B(inception_b5)
    inception_b7 = inception_B(inception_b6)

    # reduction_B
    # from 17*17 to 8*8
    reduction_b = reduction_B(inception_b7)

    # inception_C
    # 8*8 grid
    inception_c1 = inception_C(reduction_b)
    inception_c2 = inception_C(inception_c1)
    inception_c3 = inception_C(inception_c2)

    net = tf.layers.average_pooling2d(inception_c3, 7, 1, name="avgpool")  # -> [batch, 1, 1, 768]

    # dropout层
    with tf.variable_scope('dropout') as scope:
        drop_out = tf.nn.dropout(net, 0.8)
        print('最后一层的卷积层的形状是:', drop_out.shape)

    # 原始数据的形状是(64, 7, 7, 1536)  经过修改之后的数据形状是(64, -1)
    reshape = tf.reshape(drop_out, shape=[batch_size, -1])
    dim = reshape.get_shape()[1].value
    weights1 = tf.Variable(tf.truncated_normal(shape=[dim, 1024], stddev=0.005, dtype=tf.float32),name='weights', dtype=tf.float32)
    biases1 = tf.Variable(tf.constant(value=0.1, dtype=tf.float32, shape=[1024]),name='biases', dtype=tf.float32)
    local6 = tf.nn.relu(tf.matmul(reshape, weights1) + biases1)

    weights = tf.Variable(tf.truncated_normal(shape=[1024, n_classes], stddev=0.005, dtype=tf.float32), name='softmax_linear', dtype=tf.float32)
    biases = tf.Variable(tf.constant(value=0.1, dtype=tf.float32, shape=[n_classes]), name='biases', dtype=tf.float32)
    logits = tf.add(tf.matmul(local6, weights), biases, name='softmax_linear')

    return logits