试试kaggle竞赛：辨别猫狗-CSDN博客

在上一篇文章《深度学习中超大规模数据集的处理》中讲到采用HDF5文件处理大规模数据集。有朋友问到：HDF5文件是一次性读入内存中，然后通过键进行访问吗？答案当然不是，在前面的文章中也提到过，最后生成的train.hdf5文件高达30G，如果全部加载到内存，内存会撑爆。实际上，由于HDF5采用了特殊的文件格式，这样我们可以在一次读操作中加载一个批量(比如128）的图片，而不用一个个的读取。也就是说采用这种方式，只是减少了IO操作次数，另外加载的图片是RAW图像数据，减少了解码时间。

在这篇文章中，我们将说明如何读取HDF5文件，从头实现一个AlexNet网络模型。掌握了这些知识，你也可以去挑战一下Kaggle竞赛。需要指出的是，在ImageNet超大规模数据集上训练，特别是深度模型，非常耗时！！！周末在我的机器（配置为CPU: i7 6700, GPU: GTX960, MEM: 8G）上从头训练AlexNet模型，一个Epoch下来，都要花上一天的时间，结果搭上整个周末，也就跑了两个Epoch，更别提对比使用HDF5文件前后的效果。看样子该升级一下机器配置:(

图像预处理

在实现AlexNet网络模型之前，先介绍几个图像预处理方法，这些预处理在计算机视觉深度学习中应用十分广泛，可以有效的提高图像分类的准确率。

均值减法(mean subtraction)

在上一篇文章中，在处理图像文件的过程中，计算了所有图像的RGB通道的均值。而所谓的均值减法预处理就是将图像的每个像素点RGB通道值减去对应通道的均值，简单说，公式如下：

比如一张狗狗的图片，经过均值预处理，得到了右边的图像。
复制代码

均值减法是一种数据归一化技巧，可以减少光线变化造成的影响。

借助于opencv的处理函数，实现均值减法非常简单：
复制代码

    (B, G, R) = cv2.split(image.astype("float32"))

    # subtract the means for each channel
    R -= self.r_mean
    G -= self.g_mean
    B -= self.b_mean

    # merge the channels back and return the image
    return cv2.merge([B, G, R])
复制代码

切片(patch)

切片预处理就是在训练过程中随机截取图像M x N区域内的像素值。我们知道，CNN模型要求图像输入尺寸是一个固定值，如果我们使用的图像大小和输入尺寸不一致，通常的处理方法是对图像进行缩放。但是，如果使用的图像比输入尺寸大，还有一种更好的方法就是进行随机截取部分图像，这可以有效的降低过拟合。

上图中，随机裁剪256x256的图像到227x227大小。因为是随机裁剪，所以网络每次训练的图像不同，相当于一种数据扩充技术，可以减少过拟合。

我们也不需要从头实现，借助与sklearn中的实用函数，一句话就可以搞定：
复制代码

  def preprocess(self, image):
    # extract a random crop from the image with the target width and height
    return extract_patches_2d(image, (self.width, self.height), max_patches=1)[0]
复制代码

裁切(cropp)

裁切预处理有点类似上面的切片预处理。不过有两点不同：
1. 本预处理应用于验证数据集，而切片预处理应用在训练数据上。
2. 本预处理固定截取4个角及正中间区域，在加上水平翻转，这样每张图片可以得到10张采样。

还记得《[提高模型准确率：组合模型](https://juejin.im/user/57a3337979bc440054b081af/posts)》这篇文章讲到，通过组合多个网络的输出可以提高分类准确率，这里就是计算10张采样的分类概率平均值，从而达到提高分类准确率的效果。

该预处理没有现成的函数可用，不过写起来也不难：
复制代码

  def preprocess(self, image):
    crops = []

    # grab the width and height of the image then use these dimensions to
    # define the corners of the image based
    (h, w) = image[:2]
    coords = [
      [0, 0, self,width, self.height],
      [w - self.width, 0, w, self.height],
      [w - self.width, h - self.height, w, h],
      [0, h - self.height, self.width, h]
    ]

    # compute the center crop of the image as well
    dw = int(0.5 * (w - self.width))
    dh = int(0.5 * (h - self.height))
    coords.append([dw, dh, w - dw, h - dh])

    for (startx, starty, endx, endy) in coords:
      crop = image[startx:endx, starty:endy]
      crop = cv2.resize(crop, (self.width, self.height), interpolation=self.inter)
      crops.append(crop)

    if self.horiz:
      # compute the horizontal mirror flips for each crop
      mirrors = [cv2.flip(c, 1) for c in crops]
      crops.extend(mirrors)

    return np.array(crops)
复制代码

HDF5数据集生成器

《深度学习中超大规模数据集的处理》中，我们将数据集存成HDF5文件格式，这里，我们需要从HDF5文件中按照批次读取图像数据及类别标签。

      # loop over the HDF5 dataset
      for i in np.arange(0, self.num_images, self.batch_size):
        # extract the images and labels from HDF5 dataset
        images = self.db["images"][i : i + self.batch_size]
        labels = self.db["labels"][i : i + self.batch_size]

        if self.binarize:
          labels = np_utils.to_categorical(labels, self.classes)

        if self.preprocessors is not None:
          proc_images = []
          for image in images:
            for p in self.preprocessors:
              image = p.preprocess(image)

            proc_images.append(image)

          images = np.array(proc_images)

复制代码

每次读取batch_size个图像数据和类别标签，并进行预处理，代码中对images和labels的访问有点类似数组，[i : i + self.batch_size]读取第i到第(i+batch_size)个元素。

AlexNet

相对于我们之前实现的深度学习模型，AlexNet相当复杂，图的层如下表所示：

当AlexNet首次提出来时，还没有出现批量归一化等技术。在实现中，我们将在激活后加入批量归一化，对于使用卷积神经网络的大多数图像分类任务而言，这是非常标准的处理。另外，我们还会在每次POOL操作后做一些dropout，以进一步减少过拟合。

在前面的文章中，我们已经见识过keras的简洁之处，即使是AlexNet这样复杂的网络，对keras而言只是多几行代码而已。

class AlexNet:
  @staticmethod
  def build(width, height, depth, classes, reg=0.0002):
    model = Sequential()
    input_shape = (width, height, depth)
    channel_dim = -1

    if K.image_data_format() == "channels_first":
      input_shape = (depth, width, height)
      channel_dim = 1

    # block #1: CONV => RELU => POOL
    model.add(Conv2D(96, (11, 11), strides=(4, 4), input_shape=input_shape,
                     padding="same", kernel_regularizer=l2(reg)))
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dim))
    model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
    model.add(Dropout(0.25))

    # block #2: CONV => RELU => POOL
    model.add(Conv2D(256, (5, 5),
                     padding="same", kernel_regularizer=l2(reg)))
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dim))
    model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
    model.add(Dropout(0.25))

    # block #3: CONV => RELU => CONV => RELU => CONV => RELU => POOL
    model.add(Conv2D(384, (3, 3),
                     padding="same", kernel_regularizer=l2(reg)))
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dim))
    model.add(Conv2D(384, (3, 3),
                     padding="same", kernel_regularizer=l2(reg)))
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dim))
    model.add(Conv2D(256, (3, 3),
                     padding="same", kernel_regularizer=l2(reg)))
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dim))
    model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
    model.add(Dropout(0.25))

    # block #4: FC => RELU
    model.add(Flatten())
    model.add(Dense(4096, kernel_regularizer=l2(reg)))
    model.add(Activation("relu"))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))

    # block #5: FC => RELU
    model.add(Dense(4096, kernel_regularizer=l2(reg)))
    model.add(Activation("relu"))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))

    # softmax classifier
    model.add(Dense(classes, kernel_regularizer=l2(reg)))
    model.add(Activation("softmax"))

    return model
复制代码

接下来就是训练和测试模型，对于深度学习而言，不管是复杂还是简单的模型，其训练和测试过程都是大同小异，所以在这里我也不再罗嗦，有兴趣的同学可以参考我在github上的完整代码。但是需要注意，这个训练非常耗时，如果没有极其牛的显卡，还是不要轻易尝试。其实对于图片分类任务来说，最好还是采用迁移学习，站在巨人的肩膀上，不仅省力，效果还更好一些。

以上实例均有完整的代码，点击阅读原文，跳转到我在github上建的示例代码。另外，我在阅读《Deep Learning for Computer Vision with Python》这本书，在微信公众号后台回复“计算机视觉”关键字，可以免费下载这本书的电子版。