Building powerful image classification models using very little data

最新推荐文章于 2024-08-22 10:28:39 发布

水野与小太郎

最新推荐文章于 2024-08-22 10:28:39 发布

阅读量272

点赞数

分类专栏：机器学习

原文链接：https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

版权

机器学习专栏收录该内容

71 篇文章

订阅专栏

本文探讨了如何在仅有2000张猫狗图片的小数据集上，使用深度学习技术构建高效的分类器。介绍了从头开始训练小网络、利用预训练网络的瓶颈特征以及微调预训练网络顶层的方法，展示了在数据增强、模型熵容量控制和权重正则化等技巧下，如何避免过拟合，达到90%以上的分类精度。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、主要内容

在下面的教程中，我们会搭建一个有效的分类器，只有小数据量的数据，其中要学习到以下方面的知识：

training a small network from scratch (as a baseline) #从头开始训练一个小的网络
using the bottleneck features of a pre-trained network #使用上层的、预训练的、网络的特征
fine-tuning the top layers of a pre-trained network #微调预训练网络的顶层

在上述过程中会用到以下keras的特征：

fit_generator for training Keras a model using Python data generators
ImageDataGenerator for real-time data augmentation
layer freezing and model fine-tuning
...and more.

二、只有2000张样本训练

从https://www.kaggle.com/c/dogs-vs-cats/data下载猫狗图片，从样本上看只有1000张猫的图片和1000张狗的图片，另外还有一些测试机，

我们经常听到“深度学习仅在拥有大量数据时才有意义”，这并不是完全正确。当然，深度学习需要具有从数据中自动学习特征的能力，这通常仅在有大量训练数据时才可能实现-特别是对于输入样本非常高维的问题（例如图像）。但是，卷积神经网络（深度学习的一种基础算法）在设计上是可用于大多数“感知”问题（例如图像分类）的最佳模型之一，即使学习的数据很少。在小的图像数据集上从头开始训练convnet仍会产生合理的结果，并且无需任何自定义特征工程。

但是更重要的是，深度学习模型本质上具有高度的可重用性：例如，您可以采用在大规模数据集上训练的图像分类或语音到文本模型，然后仅需很小的更改就可以将其重用于一个明显不同的问题，例如我们将在这篇文章中看到。特别是在计算机视觉的情况下，许多预先训练的模型（通常在ImageNet数据集上进行训练）现在可以公开下载，并可用于从很少的数据中引导功能强大的视觉模型（迁移学习）。

三、数据预处理和数据扩充

也就是平时所说的数据增强，这将有助于我们更好的利用小数据（这步是必不可少的，为了防止数据拟合或预测时的过拟合等情况，很少的样本很有可能造成过拟合，所以如果想要通过少量的数据构建好的分类器数据增强是十分必要的）

# Keras API
# 图像增强 生成器
keras.preprocessing.image.ImageDataGenerator(
    featurewise_center=False,                   #布尔值，使输入数据集去中心化（均值为0）
    samplewise_center=False,                    #布尔值，使输入数据的每个样本均值为0
    featurewise_std_normalization=False,        #布尔值，将输入除以数据集的标准差以完成标准化
    samplewise_std_normalization=False,         #布尔值，将输入的每个样本除以其自身的标准差
    zca_whitening=False,                        #布尔值，对输入数据施加ZCA白化
    zca_epsilon=1e-06,                          
    rotation_range=0,                           #整数，数据提升时图片随机转动的角度
    width_shift_range=0.0,                      #浮点数，图片宽度的某个比例，数据增强时图片水平偏移的幅度
    height_shift_range=0.0,                     #浮点数，图片高度的某个比例，数据增强时图片竖直偏移的幅度
    brightness_range=None, 
    shear_range=0.0, 
    zoom_range=0.0, 
    channel_shift_range=0.0, 
    fill_mode='nearest', 
    cval=0.0, 
    horizontal_flip=False,                     #布尔值，进行随机水平翻转
    vertical_flip=False,                       #布尔值，进行随机竖直翻转
    rescale=None,                              #放缩因子
    preprocessing_function=None,                
    data_format='channels_last', 
    validation_split=0.0, 
    interpolation_order=1, 
    dtype='float32'
)
# https://keras.io/preprocessing/image/
# Method
#1 计算依赖于数据的变换所需要的统计信息(均值方差等),只有使用featurewise_center，featurewise_std_normalization或zca_whitening时需要此函数。
fit(x, augment=False, rounds=1, seed=None)
#2 接收numpy数组和标签为参数,生成经过数据提升或标准化后的batch数据,并在一个无限循环中不断的返回batch数据
flow(
    x, 
    y=None, 
    batch_size=32, 
    shuffle=True, 
    sample_weight=None, 
    seed=None, 
    save_to_dir=None, 
    save_prefix='', 
    save_format='png', 
    subset=None
)
#3 以文件夹路径为参数,生成经过数据提升/归一化后的数据,在一个无限循环中无限产生batch数据
flow_from_directory(
    directory, 
    target_size=(256, 256), 
    color_mode='rgb', 
    classes=None, 
    class_mode='categorical', 
    batch_size=32, 
    shuffle=True, 
    seed=None, 
    save_to_dir=None, 
    save_prefix='', 
    save_format='png', 
    follow_links=False, 
    subset=None, 
    interpolation='nearest'
)

## 4 example
import cv2
from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
	rotation_range=40,
	width_shift_range=0.2,
	height_shift_range=0.2,
	rescale=1./255,
	shear_range=0.2,
	zoom_range=0.2,
	horizontal_flip=True,
	fill_mode='nearest'
)
x = cv2.imread("1.jpg")
x = x.reshape((1,) + x.shape)
dataflow = datagen.flow(
    x, 
    y=None, 
    batch_size=1,
    shuffle=True, 
    sample_weight=None,
    save_to_dir="img_save", 
    save_prefix='cat', 
    save_format='jpg', 
)
i = 0
for epoch in dataflow:
	if i==20:
		break
	i = i + 1

四、用极少的代码训练强大分类器

（a）Training a small convnet from scratch: 80% accuracy in 40 lines of code

卷积网络是在处理图像分类工作的“可以依赖的”工具，所以让我们尝试在我们的数据上训练它，这将是本次工作（博客）的第一个实验。由于我们只有很少的例子，因此我们的头等大事应该是过度拟合。数据扩充是解决过度拟合的一种方法，但还远远不够，因为我们的扩充的样本仍然相互之间高度相关。应对过度拟合的主要重点应该是模型的熵容量也就是说你需要关注你的模型可以存储多少信息。一个模型可以通过存储更多信息从而导致分类变得更加准确，但是存储很多不相关的功能也有可能带来更大的风险。同时，如果一个只能存储少量特征的模型将必须要求开发者关注数据中最重要的特征，从而使模型更加的良好。

有不同的方式来调节熵容量。第一个是：选择模型中参数的数量，比如层数和每层的大小。另一种方法是使用权重正则化，例如L1或L2正则化。

在我们的案例中，我们将使用（1）非常小的卷积网络，每层只有少量的过滤器（2）数据的增加（3）droupout；使用以上三种trick都是为了防止过拟合
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras import backend as K


# dimensions of our images.
img_width, img_height = 150, 150

train_data_dir = 'data/train'
validation_data_dir = 'data/validation'
nb_train_samples = 2000
nb_validation_samples = 800
epochs = 50
batch_size = 16

if K.image_data_format() == 'channels_first':
    input_shape = (3, img_width, img_height)
else:
    input_shape = (img_width, img_height, 3)

model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

# this is the augmentation configuration we will use for testing:
# only rescaling
test_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
    validation_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='binary')

model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples // batch_size,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=nb_validation_samples // batch_size)

model.save_weights('first_try.h5')
（b）Using the bottleneck features of a pre-trained network: 90% accuracy in a minute

一种更完善的方法是利用在大型数据集上进行预训练的网络。这样的网络将已经学习了对于大多数计算机视觉问题有用的功能，这样能达到更好的准确性。

我们将使用在ImageNet数据集上经过预训练的VGG16架构。因为ImageNet数据集在其总共1000个类别中包含几个“猫”类（波斯猫，折耳猫...）和许多“狗”类，所以该模型将已经学习了与我们的分类问题相关的功能。实际上，仅在我们的数据上进行预测（a），而不是（b）就足以很好地解决我们的狗对猫分类问题。但是，我们在这里提出的方法更可能将问题广泛推广，包括ImageNet所不包含的类的问题（比如说预测古代巨猿）。

我们的策略如下：我们将只实例化模型的卷积部分，直到完全连接的层为止。然后，我们将在训练和验证数据上运行此模型一次，将输出（VGG16模型的“瓶颈特征”：在完全连接的层之前的最后一个激活图）记录在两个numpy数组中。然后，我们将在存储的要素之上训练一个小型的全连接模型。

我们之所以要离线存储特征，而不是直接在冻结的卷积基础之上添加完全连接的模型并运行整个过程，是因为计算效率高。运行VGG16非常昂贵，尤其是在使用CPU的情况下，我们只希望这样做一次。请注意，这阻止了我们使用数据增强。
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dropout, Flatten, Dense
from keras import applications

# dimensions of our images.
img_width, img_height = 150, 150

top_model_weights_path = 'bottleneck_fc_model.h5'
train_data_dir = 'data/train'
validation_data_dir = 'data/validation'
nb_train_samples = 2000
nb_validation_samples = 800
epochs = 50
batch_size = 16


def save_bottlebeck_features():
    datagen = ImageDataGenerator(rescale=1. / 255)

    # build the VGG16 network
    model = applications.VGG16(include_top=False, weights='imagenet')

    generator = datagen.flow_from_directory(
        train_data_dir,
        target_size=(img_width, img_height),
        batch_size=batch_size,
        class_mode=None,
        shuffle=False)
    bottleneck_features_train = model.predict_generator(
        generator, nb_train_samples // batch_size)
    np.save(open('bottleneck_features_train.npy', 'w'),
            bottleneck_features_train)

    generator = datagen.flow_from_directory(
        validation_data_dir,
        target_size=(img_width, img_height),
        batch_size=batch_size,
        class_mode=None,
        shuffle=False)
    bottleneck_features_validation = model.predict_generator(
        generator, nb_validation_samples // batch_size)
    np.save(open('bottleneck_features_validation.npy', 'w'),
            bottleneck_features_validation)


def train_top_model():
    train_data = np.load(open('bottleneck_features_train.npy'))
    train_labels = np.array(
        [0] * (nb_train_samples / 2) + [1] * (nb_train_samples / 2))

    validation_data = np.load(open('bottleneck_features_validation.npy'))
    validation_labels = np.array(
        [0] * (nb_validation_samples / 2) + [1] * (nb_validation_samples / 2))

    model = Sequential()
    model.add(Flatten(input_shape=train_data.shape[1:]))
    model.add(Dense(256, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(optimizer='rmsprop',
                  loss='binary_crossentropy', metrics=['accuracy'])

    model.fit(train_data, train_labels,
              epochs=epochs,
              batch_size=batch_size,
              validation_data=(validation_data, validation_labels))
    model.save_weights(top_model_weights_path)


save_bottlebeck_features()
train_top_model()
（c）Fine-tuning the top layers of a a pre-trained network

为了进一步改善之前的结果，我们可以尝试“微调” VGG16模型的最后一个卷积块以及顶级分类器。微调包括从训练好的网络开始，然后在新的数据集上重新训练他。大概分3个步骤完成：

实例化VGG16的卷积框架并加载其权重
在顶部添加我们先前定义的全连接模型，并加载其权重
冻结VGG16的各层模型直到最后一个卷积块
from keras import applications
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers
from keras.models import Sequential
from keras.layers import Dropout, Flatten, Dense

# path to the model weights files.
weights_path = '../keras/examples/vgg16_weights.h5'
top_model_weights_path = 'fc_model.h5'
# dimensions of our images.
img_width, img_height = 150, 150

train_data_dir = 'cats_and_dogs_small/train'
validation_data_dir = 'cats_and_dogs_small/validation'
nb_train_samples = 2000
nb_validation_samples = 800
epochs = 50
batch_size = 16

# build the VGG16 network
model = applications.VGG16(weights='imagenet', include_top=False)
print('Model loaded.')

# build a classifier model to put on top of the convolutional model
top_model = Sequential()
top_model.add(Flatten(input_shape=model.output_shape[1:]))
top_model.add(Dense(256, activation='relu'))
top_model.add(Dropout(0.5))
top_model.add(Dense(1, activation='sigmoid'))

# note that it is necessary to start with a fully-trained
# classifier, including the top classifier,
# in order to successfully do fine-tuning
top_model.load_weights(top_model_weights_path)

# add the model on top of the convolutional base
model.add(top_model)

# set the first 25 layers (up to the last conv block)
# to non-trainable (weights will not be updated)
for layer in model.layers[:25]:
    layer.trainable = False

# compile the model with a SGD/momentum optimizer
# and a very slow learning rate.
model.compile(loss='binary_crossentropy',
              optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
              metrics=['accuracy'])

# prepare data augmentation configuration
train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
    validation_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='binary')

# fine-tune the model
model.fit_generator(
    train_generator,
    samples_per_epoch=nb_train_samples,
    epochs=epochs,
    validation_data=validation_generator,
    nb_val_samples=nb_validation_samples)