TensorFlow训练CIFAR10图像识别模型

最新推荐文章于 2024-08-01 14:05:32 发布

文科升

最新推荐文章于 2024-08-01 14:05:32 发布

阅读量8k

点赞数 6

分类专栏：机器学习 Tensorflow 文章标签： CIFAR10图像识别 TensorFlow数据读取机制 TensorFlow数据增强神经网络

本文链接：https://blog.csdn.net/moyu123456789/article/details/83963973

版权

机器学习同时被 2 个专栏收录

10 篇文章 4 订阅

订阅专栏

Tensorflow

9 篇文章 3 订阅

订阅专栏

1.目标

2.CIFAR10数据集和相关方法介绍

3.Tensorflow中数据的读取机制

4.用TensorFlow训练CIFAR10识别模型

本文为笔者学习《21个项目玩转深度学习：基于TensorFlow的实践详解》这本书第二章的学习笔记。

1.目标

本篇学习笔记主要学习：1.TensorFlow中数据读取机个制；2.数据增强；3.使用TensorFlow训练CIFAR10图像识别模型。

2.CIFAR10数据集和相关方法介绍

CIFAR-10数据集是大小为32*32的彩色图片集，数据集一共包括50000张训练图片和10000张测试图片，共有10个类别，分别是飞机（airplane）、汽车（automobile）、鸟（bird）、猫（cat）、鹿（deer）、狗（dog）、蛙类（frog）、马（horse）、船（ship）、卡车（truck）。

TensorFlow官方example为CIFAR10提供的代码文件如下：

我们可以借助以上文件来操作cifar10图片，比如下载CIFAR-10数据集，代码如下：

# 引入当前目录中的已经编写好的cifar10模块
import cifar10
# 引入tensorflow
import tensorflow as tf

# tf.app.flags.FLAGS是TensorFlow内部的一个全局变量存储器，同时可以用于命令行参数的处理
FLAGS = tf.app.flags.FLAGS
# 在cifar10模块中预先定义了f.app.flags.FLAGS.data_dir为CIFAR-10的数据路径
# 我们把这个路径改为cifar10_data
FLAGS.data_dir = 'cifar10_data/'

# 如果不存在数据文件，就会执行下载并进行解压
cifar10.maybe_download_and_extract()

以上代码执行完后，就会在当前目录下生成cifar10_data文件夹，里边为下载到的CIFAR-10数据。进入cifar10_data可以看到两个文件，一个是数据集原始文件cifar-10binary.tar.gz，另一个文件夹cifar-10-batches-bin是压缩包解压后的数据。打开cifar10_data/cifar-10-batches-bin文件夹，可以看到8个文件，至此，已经拿到了cifar10的数据，用途介绍如下:

3.Tensorflow中数据的读取机制

什么事数据读取？以图像数据为例，读取数据过程如下图所示：

数据读取就是将数据从硬盘中读取到内存里进行计算，而负责计算的是CPU或者GPU。因此，只有先将数据放到内存中，才能进行计算，即加入读入用时0.1s，计算用时0.9s，那么意味着每过1s，CPU或者GPU都会有0.1s无事可干，这就极大地降低了运算效率。

为了解决这个问题，将读入数据和计算分别放在两个线程中，将数据读入到内存的一个队列中。如下图所示：

这样，读取线程不断地从硬盘空间中读取数据到内存队列存放，需要计算是CPU/GPU直接从内存中获取数据进行计算，从而解决了I/O空闲的问题。

有了以上概念，就可以说清楚TensorFlow的文件读写原理了。TensorFlow在内存队列前又添加了一层“文件名队列”，为什么要添加这么一层文件名队列呢？首先要说清楚一个机器学习中的概念：epoch，对于机器学习中的数据集来说，运行一个epoch就是将所有数据集计算一遍，机器学习训练过程中往往都要进行许多轮epoch训练。如果训练两个epoch，那么就是将数据集计算两遍。

TensorFlow使用“文件名队列+内存队列”双队列的方式读取文件，可以很好的管理epoch。如下图所示，以图片的形式来说明这个运行机制。假定要运行一个epoch，那么就在文件名队列中把A、B、C各放入一次，并在之后标注队列结束。

程序运行后，内存队列首先读入A（此时A从文件名队列中出队），如下图所示：

再依次读入B和C，如下图所示：

此时，如果再尝试读入，系统就会检测到“结束”，程序就会自动抛出一个OutOfRange的异常，外部捕捉到这个异常后就会结束。这就是TensorFlow中读取数据的机制。如果要运行n个epoch，那么则要在文件名队列中将A、B、C依次放入n次再标记结束就可以了。

那么，如何在TensorFlow中创建上述的两个队列呢？

对于文件名队列，使用tf.train.string_input_producer函数，这个函数需要传入一个文件名list，系统会自动将它转为一个文件名队列。tf.train.string_input_producer有两个重要的参数，一个是num_epochs，它就是上文中提到的epoch数，另一个是shuffle，shuffle是指在一个epoch内文件的顺序是否被打乱。若设置shuffle=False，每个epoch内数据仍然按照A、B、C的顺序进入文件名队列，这个顺序不会改变。如果设置shuffle=True，那么在一个epoch内，数据的前后顺序就会被打乱。

对于内存队列，在TensorFlow中不需要自己建立，只需要使用reader对象从文件名队列中读取数据就可以了。

在使用tf.train.string_input_producer创建文件名队列后，系统其实还处于“停滞状态”，也就是说，文件名并没有真正被加入队列。而使用tf.train.start_queue_runners之后，才会启动填充队列的线程，这时候系统不再“停滞”。此后，计算单元就可以拿到数据并进行计算。

用两个个例子来体会TensorFlow的数据读取：

第一个例子代码如下，使用ABC三张图片来读取。

import os
if not os.path.exists('read'):
    os.makedirs('read/')

# 导入TensorFlow
import tensorflow as tf 

# 新建一个Session
with tf.Session() as sess:
    # 我们要读三幅图片A.jpg, B.jpg, C.jpg
    filename = ['A.jpg', 'B.jpg', 'C.jpg']
    # string_input_producer会产生一个文件名队列
    filename_queue = tf.train.string_input_producer(filename, shuffle=False, num_epochs=5)
    # reader从文件名队列中读数据。对应的方法是reader.read
    reader = tf.WholeFileReader()
    key, value = reader.read(filename_queue)
    # tf.train.string_input_producer定义了一个epoch变量，要对它进行初始化
    tf.local_variables_initializer().run()
    # 使用start_queue_runners之后，才会开始填充队列
    threads = tf.train.start_queue_runners(sess=sess)
    i = 0
    while True:
        i += 1
        # 获取图片数据并保存
        image_data = sess.run(value)
        with open('read/test_%d.jpg' % i, 'wb') as f:
            f.write(image_data)
# 程序最后会抛出一个OutOfRangeError，这是epoch跑完，队列关闭的标志

代码运行后，在read目录下可以看到图片顺序如下图所示，可以看到每个epoch内三张图片都是按照A、B、C顺序排列的。

修改tf.train.string_input_producer(filename, shuffle=False, num_epochs=5)中的参数shuffle=True后，重新运行程序，得到以下结果，可以看到每个epoch内三张图片的顺序是打乱的。

第二个例子使用TensorFlow来读取CIFAR10数据集，并保存为图片格式。共分为下面4个步骤：

第一步：建立文件名队列，使用tf.train.string_input_producer函数；

第二步：读取数据，使用reader.read函数；当一个文件就是一张图片时，使用tf.WholeFileReader()，当一个文件就是固定字节数的文件时，需要使用tf.FixedLengthRecordReader()来读取。

第三步：填充文件名队列，使用tf.train.start_queue_runner函数；

第四步：通过sess.run()取出图片。

按照以上步骤，在cifar10_extract.py中实现代码如下：

# 导入当前目录的cifar10_input，这个模块负责读入cifar10数据
import cifar10_input
# 导入TensorFlow和其他一些可能用到的模块。
import tensorflow as tf
import os
import scipy.misc


def inputs_origin(data_dir):
  # filenames一共5个，从data_batch_1.bin到data_batch_5.bin
  # 读入的都是训练图像
  filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)
               for i in range(1, 6)]
  # 判断文件是否存在
  for f in filenames:
    if not tf.gfile.Exists(f):
      raise ValueError('Failed to find file: ' + f)
  # 将文件名的list包装成TensorFlow中queue的形式
  filename_queue = tf.train.string_input_producer(filenames)
  # cifar10_input.read_cifar10是事先写好的从queue中读取文件的函数
  # 返回的结果read_input的属性uint8image就是图像的Tensor
  read_input = cifar10_input.read_cifar10(filename_queue)
  # 将图片转换为实数形式
  reshaped_image = tf.cast(read_input.uint8image, tf.float32)
  # 返回的reshaped_image是一张图片的tensor
  # 我们应当这样理解reshaped_image：每次使用sess.run(reshaped_image)，就会取出一张图片
  return reshaped_image

if __name__ == '__main__':
  # 创建一个会话sess
  with tf.Session() as sess:
    # 调用inputs_origin。cifar10_data/cifar-10-batches-bin是我们下载的数据的文件夹位置
    reshaped_image = inputs_origin('cifar10_data/cifar-10-batches-bin')
    # 这一步start_queue_runner很重要。
    # 我们之前有filename_queue = tf.train.string_input_producer(filenames)
    # 这个queue必须通过start_queue_runners才能启动
    # 缺少start_queue_runners程序将不能执行
    threads = tf.train.start_queue_runners(sess=sess)
    # 变量初始化
    sess.run(tf.global_variables_initializer())
    # 创建文件夹cifar10_data/raw/
    if not os.path.exists('cifar10_data/raw/'):
      os.makedirs('cifar10_data/raw/')
    # 保存30张图片
    for i in range(30):
      # 每次sess.run(reshaped_image)，都会取出一张图片
      image_array = sess.run(reshaped_image)
      # 将图片保存
      scipy.misc.toimage(image_array).save('cifar10_data/raw/%d.jpg' % i)

代码运行后，会在cifar10_data/raw/目录下看到以下图片：

以上代码中调用的inputs_origin是获取图片数据的函数，包括两个前两个步骤使用tf.train.string_input_producer创建文件名队列和使用 reader进行读取，函数返回值reshaped_image是一个Tensor，对应一张训练图。在inputs_origin函数中，需要重点关注一下cifar10_input.read_cifar10函数，看看是怎么读取的，代码如下：

def read_cifar10(filename_queue):
  """Reads and parses examples from CIFAR10 data files.

  Recommendation: if you want N-way read parallelism, call this function
  N times.  This will give you N independent Readers reading different
  files & positions within those files, which will give better mixing of
  examples.

  Args:
    filename_queue: A queue of strings with the filenames to read from.

  Returns:
    An object representing a single example, with the following fields:
      height: number of rows in the result (32)
      width: number of columns in the result (32)
      depth: number of color channels in the result (3)
      key: a scalar string Tensor describing the filename & record number
        for this example.
      label: an int32 Tensor with the label in the range 0..9.
      uint8image: a [height, width, depth] uint8 Tensor with the image data
  """

  class CIFAR10Record(object):
    pass
  result = CIFAR10Record()

  # Dimensions of the images in the CIFAR-10 dataset.
  # See http://www.cs.toronto.edu/~kriz/cifar.html for a description of the
  # input format.
  label_bytes = 1  # 2 for CIFAR-100
  result.height = 32
  result.width = 32
  result.depth = 3
  #计算一张图片的字节数
  image_bytes = result.height * result.width * result.depth
  # Every record consists of a label followed by the image, with a
  # fixed number of bytes for each.
  #图片数据的总体大小：label值+图片所占字节数
  record_bytes = label_bytes + image_bytes

  # Read a record, getting filenames from the filename_queue.  No
  # header or footer in the CIFAR-10 format, so we leave header_bytes
  # and footer_bytes at their default of 0.
  #使用FixedLengthRecordReader读取固定大小的数据
  reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
  result.key, value = reader.read(filename_queue)

  # Convert from a string to a vector of uint8 that is record_bytes long.
  record_bytes = tf.decode_raw(value, tf.uint8)

  # The first bytes represent the label, which we convert from uint8->int32.
  result.label = tf.cast(
      tf.strided_slice(record_bytes, [0], [label_bytes]), tf.int32)

  # The remaining bytes after the label represent the image, which we reshape
  # from [depth * height * width] to [depth, height, width].
  depth_major = tf.reshape(
      tf.strided_slice(record_bytes, [label_bytes],
                       [label_bytes + image_bytes]),
      [result.depth, result.height, result.width])
  # Convert from [depth, height, width] to [height, width, depth].
  result.uint8image = tf.transpose(depth_major, [1, 2, 0])

  return result

4.用TensorFlow训练CIFAR10识别模型

1）数据增强

数据增强（data augmentation）是一种增加样本数量的方法，深度学习一般要求有有充足数量的训练样本，这样训练得到的模型效果会越好。对于图像类型的训练数据，数据增强是指利用平移、缩放、颜色等变换，人工增大训练集样本的个数，从而获得更充足的训练数据。

使用数据增强方法的前提是：这些数据增强方法不会改变图像原有的标签。

训练CIFAR10识别模型也用到了数据增强来提高模型性能。实现数据增强的代码在cifar10_input.py的distorted_inputs()函数中，distorted_inputs()函数代码如下：

def distorted_inputs(data_dir, batch_size):
  """Construct distorted input for CIFAR training using the Reader ops.

  Args:
    data_dir: Path to the CIFAR-10 data directory.
    batch_size: Number of images per batch.

  Returns:
    images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
    labels: Labels. 1D tensor of [batch_size] size.
  """
  filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)
               for i in xrange(1, 6)]
  for f in filenames:
    if not tf.gfile.Exists(f):
      raise ValueError('Failed to find file: ' + f)

  # Create a queue that produces the filenames to read.
  filename_queue = tf.train.string_input_producer(filenames)

  # Read examples from files in the filename queue.
  read_input = read_cifar10(filename_queue)
  reshaped_image = tf.cast(read_input.uint8image, tf.float32)

  height = IMAGE_SIZE
  width = IMAGE_SIZE

  # Image processing for training the network. Note the many random
  # distortions applied to the image.

  # Randomly crop a [height, width] section of the image.随机裁剪
  distorted_image = tf.random_crop(reshaped_image, [height, width, 3])

  # Randomly flip the image horizontally.随机翻转图片
  distorted_image = tf.image.random_flip_left_right(distorted_image)

  # Because these operations are not commutative, consider randomizing
  # the order their operation.随机改变亮度和对比度
  distorted_image = tf.image.random_brightness(distorted_image,
                                               max_delta=63)
  distorted_image = tf.image.random_contrast(distorted_image,
                                             lower=0.2, upper=1.8)

  # Subtract off the mean and divide by the variance of the pixels.
  float_image = tf.image.per_image_standardization(distorted_image)

  # Set the shapes of tensors.
  float_image.set_shape([height, width, 3])
  read_input.label.set_shape([1])

  # Ensure that the random shuffling has good mixing properties.
  min_fraction_of_examples_in_queue = 0.4
  min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *
                           min_fraction_of_examples_in_queue)
  print('Filling queue with %d CIFAR images before starting to train. '
        'This will take a few minutes.' % min_queue_examples)

  # Generate a batch of images and labels by building up a queue of examples.
  return _generate_image_and_label_batch(float_image, read_input.label,
                                         min_queue_examples, batch_size,
                                         shuffle=True)

2）建立CIFAR10识别模型

建立模型的代码在cifar10.py文件的inference()函数中，这个函数代码如下。模型使用了两层卷积层和三层全连接层。

#函数的入参images表示图像，这里的图像是做过数据增强后的图像信息
#函数输出是图像属于各个类别的Logit
def inference(images):
  """Build the CIFAR-10 model.

  Args:
    images: Images returned from distorted_inputs() or inputs().

  Returns:
    Logits.
  """
  # We instantiate all variables using tf.get_variable() instead of
  # tf.Variable() in order to share variables across multiple GPU training runs.
  # If we only ran this model on a single GPU, we could simplify this function
  # by replacing all instances of tf.get_variable() with tf.Variable().
  #
  # 建立第一层卷积层conv1
  with tf.variable_scope('conv1') as scope:
    kernel = _variable_with_weight_decay('weights',
                                         shape=[5, 5, 3, 64],
                                         stddev=5e-2,
                                         wd=0.0)
    conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0))
    pre_activation = tf.nn.bias_add(conv, biases)
    conv1 = tf.nn.relu(pre_activation, name=scope.name)
    _activation_summary(conv1)

  # 第一层卷积层的池化pool1
  pool1 = tf.nn.max_pool(conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],
                         padding='SAME', name='pool1')
  # 局部响应归一化norm1
  norm1 = tf.nn.lrn(pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
                    name='norm1')

  # 第二层卷积层conv2
  with tf.variable_scope('conv2') as scope:
    kernel = _variable_with_weight_decay('weights',
                                         shape=[5, 5, 64, 64],
                                         stddev=5e-2,
                                         wd=0.0)
    conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME')
    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1))
    pre_activation = tf.nn.bias_add(conv, biases)
    conv2 = tf.nn.relu(pre_activation, name=scope.name)
    _activation_summary(conv2)

  # 局部响应归一化norm2
  norm2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
                    name='norm2')
  # 第二层卷积层的池化pool2
  pool2 = tf.nn.max_pool(norm2, ksize=[1, 3, 3, 1],
                         strides=[1, 2, 2, 1], padding='SAME', name='pool2')

  # 全连接层一local3
  with tf.variable_scope('local3') as scope:
    # Move everything into depth so we can perform a single matrix multiply.
    reshape = tf.reshape(pool2, [FLAGS.batch_size, -1])
    dim = reshape.get_shape()[1].value
    weights = _variable_with_weight_decay('weights', shape=[dim, 384],
                                          stddev=0.04, wd=0.004)
    biases = _variable_on_cpu('biases', [384], tf.constant_initializer(0.1))
    local3 = tf.nn.relu(tf.matmul(reshape, weights) + biases, name=scope.name)
    _activation_summary(local3)

  # local4
  with tf.variable_scope('local4') as scope:
    weights = _variable_with_weight_decay('weights', shape=[384, 192],
                                          stddev=0.04, wd=0.004)
    biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1))
    local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name)
    _activation_summary(local4)

  # linear layer(WX + b),
  # We don't apply softmax here because
  # tf.nn.sparse_softmax_cross_entropy_with_logits accepts the unscaled logits
  # and performs the softmax internally for efficiency.
  with tf.variable_scope('softmax_linear') as scope:
    weights = _variable_with_weight_decay('weights', [192, NUM_CLASSES],
                                          stddev=1/192.0, wd=0.0)
    biases = _variable_on_cpu('biases', [NUM_CLASSES],
                              tf.constant_initializer(0.0))
    softmax_linear = tf.add(tf.matmul(local4, weights), biases, name=scope.name)
    _activation_summary(softmax_linear)

  return softmax_linear

3）训练模型

使用以下命令训练模型：

python cifar10_train.py --train_dir cifar10_train/ --data_dir cifar10_data/

--data_dir cifar10_data/的含义是指定cifar10数据的保存位置，--train_dir cifar10_train/的作用是另外指定一个训练文件夹，用来保存模型的参数和训练时的日志信息。

命令执行后，就开始训练了，窗口显示以下信息，每间隔10步打印一次loss值。

4）在TensorFlow中查看训练进度

在另一个命令行窗口，切换到当前目录，输入以下命令打开启动tensorBoard：tensorboard --logdir cifar10_train/，本地服务启动后，可以在浏览器中打开http://localhost:6006/进行访问。下图是在tensorboard页面点击total_loss_1查看损失值变化的页面情况，图中显示为训练到7000多步的时候损失值的变化情况。

TensorBoard显示训练信息的原理简介：在指定训练文件夹cifar10_train下，可以找到一个以events.out开头的文件，在训练模型时，程序会源源不断地将日志信息写入到这个文件中，运行Tensorboard时，只要指定训练文件夹，TensorBoard会自动搜索到这个文件，并在网页中显示响应信息。

5）测试模型效果

在训练文件夹 cifar10_train/下，还会发现一个 checkpoint 文件和一些以model.ckpt 开头的文件。 TensorFlow 会将训练得到的模型参数保存到“checkpoint”里。在训练程序中，已经设定好每隔 10min 保存一次 checkpoint，并且只保留最新的 5 个 checkpoint ，保存时如果已经有了 5 个 checkpoint 就会删除最旧的那个。

用记事本打开checkpoint可看到以下内容：

其中model_checkpoint_path: 表示最新的模型是model.ckpt-50164，这个是第50164步的训练结果。

使用cifar10_eval.py来检测模型在测试数据集上的准确率。执行指令如下：

python cifar10_eval.py --data_dir cifar10_data/ -eval_dir cifar10_eval/ --checkpoint_dir cifar10_train/

执行结果如下，可以看出在训练了50164步的模型上，准确率达到了85.5%。

代码GitHub路径：https://github.com/zhuwsh/21DeepLearningProjects/tree/master/Ch2_Cifar10

文科升

关注

6
点赞
踩
64

收藏

觉得还不错? 一键收藏
3
评论
TensorFlow训练CIFAR10图像识别模型

目录1.目标2.CIFAR10数据集和相关方法介绍3.Tensorflow中数据的读取机制4.用TensorFlow训练CIFAR10识别模型1）数据增强2）建立CIFAR10识别模型3）训练模型4）在TensorFlow中查看训练进度5）测试模型效果本文为笔者学习《21个项目玩转深度学习：基于TensorFlow的实践详解》这本书第二章的学习笔记。1....
复制链接

扫一扫

专栏目录