keras ImageDataGenerator 实现批量数据增强（少量图片生成大量图片）

最新推荐文章于 2024-07-29 09:54:05 发布

liuyunshengsir

最新推荐文章于 2024-07-29 09:54:05 发布

阅读量1.7k

点赞数

分类专栏： opencv 深度学习

本文链接：https://blog.csdn.net/liuyunshengsir/article/details/108078286

版权

opencv 同时被 2 个专栏收录

6 篇文章 0 订阅

订阅专栏

深度学习

2 篇文章 2 订阅

订阅专栏

1、数据不均衡问题

在大部分情况下，我们认为不同类别的数据是均匀分布的，很多算法也是基于这个假设，但是在真实的情况下，往往都不是如此的。例如，机器发送故障的情况是我们想要预测的，但实际上故障的概率是很低的，所以导致故障的样本量很少，即使你将所有的预测结果都设置为正常，准确率依然很高，但这个模型是一个没有用的模型，这种类似的例子是非常常见的。

2、常见的解决方法

解决的方案很多，主要从两个方面考虑（面试的时候可能会问）

1）数据层面

2）算法层面

在项目中，我们可能没那么多时间去思考从算法方面去解决，更多的时候想的是能用就行，但是网上很多的例子很多是基于内置的数据，这是非常让人难受的，或者是基于一张图片进行数据增强，很痛苦。更一般的情况是，对训练集下的某一个文件夹的所有图片进行数据增强，这就是我写这个的理由。

3、解决方式

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img

datagen = ImageDataGenerator(
        rotation_range=40,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        fill_mode='nearest')

img = load_img('E:/person_management/gitspace/darknet/data/dog.jpg')  # this is a PIL image
x = img_to_array(img)  # this is a Numpy array with shape (3, 150, 150)
x = x.reshape((1,) + x.shape)  # this is a Numpy array with shape (1, 3, 150, 150)

# the .flow() command below generates batches of randomly transformed images
# and saves the results to the `preview/` directory
i = 0
for batch in datagen.flow(x, batch_size=1,
                          save_to_dir='preview', save_prefix='dog', save_format='jpeg'):
    i += 1
    if i > 20:
        break  # otherwise the generator would loop indefinitely