TensorFlow入门教程(10)两层CNN对CIFAR-10图像识别

最新推荐文章于 2022-06-09 19:37:57 发布

__Fang Wei__

最新推荐文章于 2022-06-09 19:37:57 发布

阅读量2.5k

点赞数 2

分类专栏： tensorflow 文章标签： tensorflow CIFAR10

本文链接：https://blog.csdn.net/rookie_wei/article/details/80409533

版权

tensorflow 专栏收录该内容

70 篇文章 127 订阅

订阅专栏

#
#作者：韦访
#博客：https://blog.csdn.net/rookie_wei
#微信：1007895847
#添加微信的备注一下是CSDN的
#欢迎大家一起学习
#

1、概述

上一讲我们已经对官方的CIFAR10图像识别代码进行分析，但如果只做到这一步感觉还是不够，没能做到举一反三以及对之前学的知识的巩固，所以这一讲，我打算结合之前学的双层卷积神经网络和上一讲的知识自己写一个demo。

环境配置：

操作系统：Win10 64位

显卡：GTX 1080ti

Python：Python3.7

TensorFlow：1.15.0

2、代码解析

下载数据集

第一步，还是先下载数据集，使用上一讲下载数据集的maybe_download_and_extract方法，代码如下，

# 查看CIFAR-10数据是否存在，如果不存在则下载并解压
def maybe_download_and_extract(dir):
    DATA_URL = 'https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz'
    """Download and extract the tarball from Alex's website."""
    if not os.path.exists(dir):
        os.makedirs(dir)
    filename = DATA_URL.split('/')[-1]
    filepath = os.path.join(dir, filename)
    if not os.path.exists(filepath):
        def _progress(count, block_size, total_size):
            sys.stdout.write('\r>> Downloading %s %.1f%%' % (filename,
                                                             float(count * block_size) / float(total_size) * 100.0))
            sys.stdout.flush()

        filepath, _ = urllib.request.urlretrieve(DATA_URL, filepath, _progress)
        print()
        statinfo = os.stat(filepath)
        print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')
    extracted_dir_path = os.path.join(dir, 'cifar-10-batches-bin')
    if not os.path.exists(extracted_dir_path):
        tarfile.open(filepath, 'r:gz').extractall(dir)

获取图片和标签

接着，就是获取图片和标签的代码，跟上一讲的内容一样，代码如下，

#获取每个样本数据，样本由一个标签+一张图片数据组成
def get_record(queue):
    print('get_record')
    #定义label大小，图片宽度、高度、深度，图片大小、样本大小
    label_bytes = 1
    image_width = 32
    image_height = 32
    image_depth = 3
    image_bytes = image_width * image_height * image_depth
    record_bytes = label_bytes + image_bytes

    #根据样本大小读取数据
    reader = tf.FixedLengthRecordReader(record_bytes)
    key, value = reader.read(queue)

    #将获取的数据转变成一维数组
    #例如
    # source = 'abcde'
    # record_bytes = tf.decode_raw(source, tf.uint8)
    #运行结果为[ 97  98  99 100 101]
    record_bytes = tf.decode_raw(value, tf.uint8)

    #获取label，label数据在每个样本的第一个字节
    label_data = tf.cast(tf.strided_slice(record_bytes, [0], [label_bytes]), tf.int32)

    #获取图片数据，label后到样本末尾的数据即图片数据，
    # 再用tf.reshape函数将图片数据变成一个三维数组
    depth_major = tf.reshape(
        tf.strided_slice(record_bytes, [label_bytes],[label_bytes + image_bytes]),
        [3, 32, 32])

    #矩阵转置，上面得到的矩阵形式是[depth, height, width]，即红、绿、蓝分别属于一个维度的，
    #假设只有3个像素，上面的格式就是RRRGGGBBB
    #但是我们图片数据一般是RGBRGBRGB，所以这里要进行一下转置
    #注：上面注释都是我个人的理解，不知道对不对
    image_data = tf.transpose(depth_major, [1, 2, 0])

    return label_data, image_data

def _generate_image_and_label_batch(image, label, min_queue_examples,
                                    batch_size, shuffle):
  num_preprocess_threads = 1
  if shuffle:
    images, label_batch = tf.train.shuffle_batch(
        [image, label],
        batch_size=batch_size,
        num_threads=num_preprocess_threads,
        capacity=min_queue_examples + 3 * batch_size,
        min_after_dequeue=min_queue_examples)
  else:
    images, label_batch = tf.train.batch(
        [image, label],
        batch_size=batch_size,
        num_threads=num_preprocess_threads,
        capacity=min_queue_examples + 3 * batch_size)

  # Display the training images in the visualizer.
  tf.summary.image('images', images)

  return images, tf.reshape(label_batch, [batch_size])

#检测CIFAR-10数据是否存在，如果不存在则返回False
def check_cifar10_data_files(filenames):
    for file in filenames:
        if os.path.exists(file) == False:
            print('Not found cifar10 data.')
            return False
    return True

#获取图片前的预处理，检测CIFAR10数据是否存在，如果不存在直接退出
#如果存在，用string_input_producer函数创建文件名队列，
# 并且通过get_record函数获取图片标签和图片数据，并返回
def get_image(data_path):
    filenames = [os.path.join(data_path, "data_batch_%d.bin" % i) for i in range(1, 6)]
    print(filenames)
    if check_cifar10_data_files(filenames) == False:
        exit()

    #创建文件名队列
    queue = tf.train.string_input_producer(filenames)
    # 获取图像标签和图像数据
    label, image = get_record(queue)
    #将图像数据转成float32类型
    reshaped_image = tf.cast(image, tf.float32)

    #下面是数据增强操作
    #将图片随机裁剪成24*24像素
    distorted_image = tf.random_crop(reshaped_image, [height, width, 3])

    # 将图片随机左右翻转
    distorted_image = tf.image.random_flip_left_right(distorted_image)

    #随机调整图片亮度
    distorted_image = tf.image.random_brightness(distorted_image,
                                                 max_delta=63)
    #随机改变图片对比度
    distorted_image = tf.image.random_contrast(distorted_image,
                                               lower=0.2, upper=1.8)

    # 对图片标准化处理
    float_image = tf.image.per_image_standardization(distorted_image)

    # Set the shapes of tensors.
    float_image.set_shape([height, width, 3])
    label.set_shape([1])

    min_fraction_of_examples_in_queue = 0.4
    min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *
                             min_fraction_of_examples_in_queue)
    return _generate_image_and_label_batch(float_image, label,
                                           min_queue_examples, batch_size,
                                           shuffle=True)

函数封装

接下来就是卷积层、全连接层、损失函数等等的函数封装，需要注意的是，我们这里设置学习率时，使用指数衰减法，代码如下，

NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 50000
batch_size = 128
height = 24
width = 24

# Constants describing the training process.
MOVING_AVERAGE_DECAY = 0.9999     # The decay to use for the moving average.
NUM_EPOCHS_PER_DECAY = 350.0      # Epochs after which learning rate decays.
LEARNING_RATE_DECAY_FACTOR = 0.1  # Learning rate decay factor.
INITIAL_LEARNING_RATE = 0.0001       # Initial learning rate.

# 初始化过滤器
def weight_variable(shape):
    return tf.Variable(tf.truncated_normal(shape, stddev=0.1))

# 初始化偏置
def bias_variable(shape):
    return tf.Variable(tf.constant(0.1, shape=shape))

# 卷积运算
def conv2d(x, W):
    # strides表示每一维度滑动的步长，一般strides[0] = strides[3] = 1
    # 第四个参数可选"Same"或"VALID"，“Same”表示边距使用全0填充
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding="SAME")

# 池化运算
def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")

# 卷积层
def conv_layer(input, filter_shape, bias_shape):
    W = weight_variable(filter_shape)
    b = bias_variable(bias_shape)
    # 使用conv2d函数进行卷积计算，然后再用ReLU作为激活函数
    h = tf.nn.relu(conv2d(input, W) + b)
    # 卷积以后再经过池化操作
    return max_pool_2x2(h)

# 全连接层
def dense(input, weight_shape, bias_shape, reshape):
    W = weight_variable(weight_shape)
    b = bias_variable(bias_shape)
    # 将输入数据还原成向量的形式
    h = tf.reshape(input, reshape)
    # 使用ReLU作为激活函数
    return tf.nn.relu(tf.matmul(h, W) + b)

# dropout
def dropout(input):
    # 为了防止过拟合，使用dropout正则化
    keep_prob = tf.placeholder(tf.float32)
    return keep_prob, tf.nn.dropout(input, keep_prob)

# Softmax输出
def softmax(input, weight_shape, bias_shape):
    W = weight_variable(weight_shape)
    b = bias_variable(bias_shape)
    # 最后都要经过Softmax函数将输出转化为概率问题
    return tf.nn.softmax(tf.matmul(input, W) + b)

# 定义损失函数和优化器
def optimizer(label, y):
    global_step = tf.train.get_or_create_global_step()
    num_batches_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN / batch_size
    decay_steps = int(num_batches_per_epoch * NUM_EPOCHS_PER_DECAY)

    # Decay the learning rate exponentially based on the number of steps.
    lr = tf.train.exponential_decay(INITIAL_LEARNING_RATE,
                                    global_step,
                                    decay_steps,
                                    LEARNING_RATE_DECAY_FACTOR,
                                    staircase=True)
    loss = tf.reduce_mean(-tf.reduce_sum(label * tf.log(y)))
    return tf.train.AdamOptimizer(lr).minimize(loss), loss

# 计算模型预测准确率
def accuracy(label, y):
    pred = tf.equal(tf.argmax(y, 1), tf.argmax(label, 1))
    return tf.reduce_mean(tf.cast(pred, tf.float32))

网络结构

网络结构跟以前一样，代码如下，

# 网络结构
def net(input, label):

    # 第一层卷积
    # 将过滤器设置成5×5×1的矩阵，
    # 其中5×5表示过滤器大小，1表示深度，因为MNIST是黑白图片只有一层。所以深度为1
    # 32表示我们要创建32个大小5×5×1的过滤器，经过卷积后算出32个特征图（每个过滤器得到一个特征图）
    c1 = conv_layer(input, [5, 5, 3, 32], [32])

    # 第二层卷积
    # 因为经过第一层卷积运算后，输出的深度为32,所以过滤器深度也为32，64是指经过第二层卷积运算以后的深度
    c2 = conv_layer(c1, [5, 5, 32, 64], [64])

    # 全连接层
    # 经过两层卷积后，图片的大小为6×6（第一层池化后输出为（24/2）×（24/2），
    # 第二层池化后输出为（12/2）×（12/2））,深度为64，
    # 我们在这里加入一个有1024个神经元的全连接层，所以权重W的尺寸为[6 * 6 * 64, 1024]
    f1 = dense(c2, [6 * 6 * 64, 1024], [1024], [-1, 6 * 6 * 64])

    # dropout
    keep_prob, h = dropout(f1)

    # Softmax
    y = softmax(h, [1024, 10], [10])

    # 定义损失函数和优化器
    op, loss = optimizer(label, y)

    # 计算预测准确率
    acc = accuracy(label, y)

    return acc, op, keep_prob, loss

main函数

main函数代码如下，

def main(argv=None):
    dir = "./cifar10_dataset/"
    #查看CIFAR-10数据是否存在，如果不存在则下载并解压
    maybe_download_and_extract(dir)

    #获取图片数据
    value, key = get_image(os.path.join(dir, 'cifar-10-batches-bin/'))

    # 创建x占位符，用于临时存放CIFAR10图片的数据，
    # [None, height , width , 3]中的None表示不限长度
    x = tf.placeholder(tf.float32, [None, height , width , 3])
    # y_存的是实际图像的标签，即对应于每张输入图片实际的值，
    # 为了方便对比，我们获得标签后，将起转成one-hot格式
    label = tf.placeholder(tf.float32, [None, 10])

    # 搭建神经网络结构
    acc, op, keep_prob, loss = net(x, label)


    with tf.Session() as sess:
        # 初始化变量
        sess.run(tf.global_variables_initializer())
        coord = tf.train.Coordinator()
        try:
            # 这里才真的启动队列
            threads = tf.train.start_queue_runners(sess=sess, coord=coord)
            for i in range(190000):
                labels, images = sess.run([key, value])
                # print(labels)
                # 下面的方法可以很方便的将label转换成one-hot的形式
                labels = np.eye(10, dtype=float)[labels]

                # 将数据传入神经网络，开始训练
                sess.run(op, feed_dict={x: images, label: labels, keep_prob: 1.0})
                if i % 100 == 0:
                    train_accuracy = sess.run(acc, feed_dict={x: images, label: labels, keep_prob: 1.0})
                    print("step %d, training accuracy %g" % (i, train_accuracy))
        except:
            print('Done..')


        coord.request_stop()
        coord.join()

首先是下载数据集，然后获取图片数据和标签，接着创建占位符，值得注意的是，这里的占位符形状跟以前是不一样的。运行结果如下，