CV数据预处理

最新推荐文章于 2023-10-01 16:33:47 发布

我是天才很好

最新推荐文章于 2023-10-01 16:33:47 发布

阅读量1.7k

点赞数

分类专栏：计算机视觉

原文链接：https://blog.csdn.net/weixin_44791964/article/details/106871010

版权

计算机视觉专栏收录该内容

17 篇文章 7 订阅

订阅专栏

文章目录

1. 学习前言

进行训练的话，如果直接用原图进行训练，也是可以的（就如我们最喜欢Mnist手写体），但是大部分图片长和宽不一样，直接resize的话容易出问题。
除去resize的问题外，有些时候数据不足该怎么办呢，当然要用到数据增强啦。

2. 处理长宽不同的图片

对于很多分类、目标检测算法，输入的图片长宽是一样的，如 $224 * 224$ 、 $416 * 416$ 等。

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-2pxuUP3B-1596948434526)(C:\Users\Administrator\Desktop\CV数据预处理\1.jpg)]$

直接resize的话，图片就会失真。

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IvgVYWZz-1596948434529)(C:\Users\Administrator\Desktop\CV数据预处理\2.png)]$

但是我们可以采用如下的代码，使其用padding的方式不失真。

from PIL import Image


def letterbox_image(image, size):
    # 对图片进行resize，使图片不失真。在空缺的地方进行padding
    iw, ih = image.size
    w, h = size
    scale = min(w/iw, h/ih)
    nw = int(iw*scale)
    nh = int(ih*scale)

    image = image.resize((nw,nh), Image.BICUBIC)
    new_image = Image.new('RGB', size, (128,128,128))
    new_image.paste(image, ((w-nw)//2, (h-nh)//2))
    return new_image

img = Image.open("2007_000039.jpg")
new_image = letterbox_image(img,[416,416])
new_image.show()

得到图片为：

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-6zy9rDlJ-1596948434531)(C:\Users\Administrator\Desktop\CV数据预处理\3.png)]$

3. 数据增强

3.1、在数据集内进行数据增强

这个的意思就是可以直接增加图片的方式进行数据增强。
其主要用到的函数是：

ImageDataGenerator(featurewise_center=False,  
                    samplewise_center=False, 
                    featurewise_std_normalization=False, 
                    samplewise_std_normalization=False, 
                    zca_whitening=False, 
                    zca_epsilon=1e-06, 
                    rotation_range=0, 
                    width_shift_range=0.0, 
                    height_shift_range=0.0, 
                    brightness_range=None, 
                    shear_range=0.0, 
                    zoom_range=0.0, 
                    channel_shift_range=0.0, 
                    fill_mode='nearest', 
                    cval=0.0, 
                    horizontal_flip=False, 
                    vertical_flip=False, 
                    rescale=None, 
                    preprocessing_function=None, 
                    data_format=None, 
                    validation_split=0.0, 
                    dtype=None)

对于我而言，常用的方法如下：

datagen = ImageDataGenerator(
        rotation_range=10,
        width_shift_range=0.1,
        height_shift_range=0.1,
        shear_range=0.2,
        zoom_range=0.1,
        horizontal_flip=False,
        brightness_range=[0.1, 2],
        fill_mode='nearest')

其中，参数的意义为：
1、rotation_range：旋转范围
2、width_shift_range：水平平移范围
3、height_shift_range：垂直平移范围
4、shear_range：float, 透视变换的范围
5、zoom_range：缩放范围
6、horizontal_flip：水平反转
7、brightness_range：图像随机亮度增强，给定一个含两个float值的list，亮度值取自上下限值间
8、fill_mode：‘constant’，‘nearest’，‘reflect’或‘wrap’之一，当进行变换时超出边界的点将根据本参数给定的方法进行处理。

实际使用时可以利用如下函数生成图像：

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
import os 


datagen = ImageDataGenerator(
        rotation_range=10,
        width_shift_range=0.1,
        height_shift_range=0.1,
        shear_range=0.2,
        zoom_range=0.1,
        horizontal_flip=False,
        brightness_range=[0.1, 2],
        fill_mode='nearest')


trains = os.listdir("./train/")
for index,train in enumerate(trains):
    img = load_img("./train/" + train)
    x = img_to_array(img)
    x = x.reshape((1,) + x.shape)
    i = 0
    for batch in datagen.flow(x, batch_size=1,
                            save_to_dir='./train_out', save_prefix=str(index), save_format='jpg'):
        i += 1
        if i > 20:
            break

生成效果为：

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-icBlmUiT-1596948434540)(C:\Users\Administrator\Desktop\CV数据预处理\4.png)]$

3.2、在读取图片的时候数据增强

ImageDataGenerator是一个非常nice的增强方式，不过如果不想生成太多的图片，然后想要直接在读图的时候处理，也是可以的。
我们用到PIL中的ImageEnhance库。
1、亮度增强ImageEnhance.Brightness(image)
2、色度增强ImageEnhance.Color(image)
3、对比度增强ImageEnhance.Contrast(image)
4、锐度增强ImageEnhance.Sharpness(image)

在如下的函数中，可以通过改变Ehance函数中的参数实现不同的增强方式。

import os
import numpy as np
from PIL import Image
from PIL import ImageEnhance


def Enhance_Brightness(image):
    # 变亮，增强因子为0.0将产生黑色图像,为1.0将保持原始图像。
    # 亮度增强
    enh_bri = ImageEnhance.Brightness(image)
    brightness = np.random.uniform(0.6,1.6)
    image_brightened = enh_bri.enhance(brightness)
    return image_brightened

def Enhance_Color(image):
    # 色度,增强因子为1.0是原始图像
    # 色度增强
    enh_col = ImageEnhance.Color(image)
    color = np.random.uniform(0.4,2.6)
    image_colored = enh_col.enhance(color)
    return image_colored


def Enhance_contrasted(image):
    # 对比度，增强因子为1.0是原始图片
    # 对比度增强
    enh_con = ImageEnhance.Contrast(image)
    contrast = np.random.uniform(0.6,1.6)
    image_contrasted = enh_con.enhance(contrast)
    return image_contrasted

def Enhance_sharped(image):
    # 锐度，增强因子为1.0是原始图片
    # 锐度增强
    enh_sha = ImageEnhance.Sharpness(image)
    sharpness = np.random.uniform(0.4,4)
    image_sharped = enh_sha.enhance(sharpness)
    return image_sharped

def Add_pepper_salt(image):
    # 增加椒盐噪声
    img = np.array(image)
    rows,cols,_=img.shape
    
    random_int = np.random.randint(500,1000)
    for _ in range(random_int):
        x=np.random.randint(0,rows)
        y=np.random.randint(0,cols)
        if np.random.randint(0,2):
            img[x,y,:]=255
        else:
            img[x,y,:]=0
    img = Image.fromarray(img)
    return img


def Enhance(image_path, change_bri=1, change_color=1, change_contras=1, change_sha=1, add_noise=1):
    #读取图片
    image = Image.open(image_path)

    if change_bri==1:
        image = Enhance_Brightness(image)
    if change_color==1:
        image = Enhance_Color(image)
    if change_contras==1:
        image = Enhance_contrasted(image)
    if change_sha==1:
        image = Enhance_sharped(image)
    if add_noise==1:
        image = Add_pepper_salt(image)
    image.save("0.jpg")

Enhance("2007_000039.jpg")

原图：

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ASjWKSZ8-1596948434541)(C:\Users\Administrator\Desktop\CV数据预处理\5.jpg)]$

效果如下：

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-LEJ84BfB-1596948434546)(C:\Users\Administrator\Desktop\CV数据预处理\6.jpg)]$

3.3、目标检测中的数据增强

在目标检测中如果要增强数据，并不是直接增强图片就好了，还要考虑到图片扭曲后框的位置。
也就是框的位置要跟着图片的位置进行改变。
原图：

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-zsznLBd3-1596948434547)(C:\Users\Administrator\Desktop\CV数据预处理\7.jpg)]$

增强后：

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-0eGgitMg-1596948434550)(C:\Users\Administrator\Desktop\CV数据预处理\8.png)]$

from PIL import Image, ImageDraw
import numpy as np
from matplotlib.colors import rgb_to_hsv, hsv_to_rgb


def rand(a=0, b=1):
    return np.random.rand()*(b-a) + a

def get_random_data(annotation_line, input_shape, random=True, max_boxes=20, jitter=.3, hue=.1, sat=1.5, val=1.5, proc_img=True):
    '''random preprocessing for real-time data augmentation'''
    line = annotation_line.split()
    image = Image.open(line[0])
    iw, ih = image.size  # 原图片尺寸
    h, w = input_shape   # 网络输入图片尺寸
    box = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]])

    # resize image   jitter抖动
    new_ar = w/h * rand(1-jitter,1+jitter)/rand(1-jitter,1+jitter)
    scale = rand(.7, 1.3)
    if new_ar < 1:
        nh = int(scale*h)
        nw = int(nh*new_ar)
    else:
        nw = int(scale*w)
        nh = int(nw/new_ar)
    image = image.resize((nw,nh), Image.BICUBIC)

    # place image
    dx = int(rand(0, w-nw))
    dy = int(rand(0, h-nh))
    new_image = Image.new('RGB', (w,h), (128,128,128))
    new_image.paste(image, (dx, dy))  # 将图像填充为中间图像，两侧为灰色的样式
    image = new_image

    # flip image or not
    flip = rand()<.5
    if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT)
    # 图像翻转
    # tf.image.flip_up_down：上下翻转
    # tf.image.flip_left_right：左右翻转
    # tf.image.transpose_image：对角线翻转


    # distort image  色调（H, Hue），饱和度（S,Saturation），明度（V, Value）
    hue = rand(-hue, hue) 
    sat = rand(1, sat) if rand()<.5 else 1/rand(1, sat)
    val = rand(1, val) if rand()<.5 else 1/rand(1, val)
    x = rgb_to_hsv(np.array(image)/255.)
    x[..., 0] += hue
    x[..., 0][x[..., 0]>1] -= 1
    x[..., 0][x[..., 0]<0] += 1
    x[..., 1] *= sat
    x[..., 2] *= val
    x[x>1] = 1
    x[x<0] = 0
    image_data = hsv_to_rgb(x) # numpy array, 0 to 1

    # correct boxes
    box_data = np.zeros((max_boxes,5))
    if len(box)>0:
        np.random.shuffle(box)
        box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx  # 原图中的坐标w转化为resize后的w坐标
        box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy
        if flip: box[:, [0,2]] = w - box[:, [2,0]]
        box[:, 0:2][box[:, 0:2]<0] = 0
        box[:, 2][box[:, 2]>w] = w
        box[:, 3][box[:, 3]>h] = h
        box_w = box[:, 2] - box[:, 0]
        box_h = box[:, 3] - box[:, 1]
        box = box[np.logical_and(box_w>1, box_h>1)] # discard invalid box 删掉无效的框
        if len(box)>max_boxes: box = box[:max_boxes]
        box_data[:len(box)] = box
    
    return image_data, box_data

if __name__ == "__main__":
    line = r"F:\Collection\yolo_Collection\keras-yolo3-master\VOCdevkit/VOC2007/JPEGImages/00001.jpg 738,279,815,414,0"

    image_data, box_data = get_random_data(line,[416,416])

    left, top, right, bottom  = box_data[0][0:4]

    img = Image.fromarray((image_data*255).astype(np.uint8))

    draw = ImageDraw.Draw(img)

    draw.rectangle([left, top, right, bottom])
    img.show()