保姆级 Keras 实现 Faster R-CNN 四

Mr-MegRob

已于 2023-09-09 11:46:23 修改

阅读量919

点赞数 2

分类专栏： Keras # Faster R-CNN 文章标签：深度学习 keras faster_rcnn

于 2021-10-11 14:19:19 首次发布

本文链接：https://blog.csdn.net/yx123919804/article/details/120683511

版权

Keras 同时被 2 个专栏收录

22 篇文章 13 订阅

订阅专栏

Faster R-CNN

15 篇文章 2 订阅

订阅专栏

本文详细介绍了如何通过保姆级Keras实现Faster R-CNN的预处理步骤，包括数据增强技术（如翻转和尺度变换），以及如何使用Generator来高效读取和处理VOC数据集的训练数据，确保模型性能的提升。

摘要由CSDN通过智能技术生成

上一篇文章中讲了如何为每一个 anchor box 打标签, 那打完了标签是不是就能马上训练了呢, 别急, 还有一些准备工作还没有完成

一. 数据增强

VOC 数据集数据集虽然图像感觉有那么多, 但是如果不做增强处理的话, 训练的效果还是没有那么好. 先做简单点的, 包括左右翻转, 上下翻转, 左右上下一起翻转, 至于旋转变形什么的, 你就自己处理了, 只是要注意数据增强的时候, 标签也是要一起做相应处理的

# 数据增强函数, 包括左右, 上下, 左右上翻转
# data_pair: data_set_path 返回的数据元素
# train_num: 一次参数训练的 anchor 的数量
def data_augment(data_pair, train_num):
    augmented = [] # 返回增强后的数据
    
    img_src = cv.imread(data_pair[0])
    img_new, scale = new_size_image(img_src, SHORT_SIZE)
    feature_size = (img_new.shape[0] // FEATURE_STRIDE, img_new.shape[1] // FEATURE_STRIDE)
    anchors = create_train_anchors(feature_size, base_anchors, FEATURE_STRIDE)
    
    # 原始图像与标签------------------------------------------------------
    ground_truth = get_ground_truth(data_pair[1], data_pair[2], CATEGORIES)    
    # ground_truth 要做相应的缩放
    for gt in ground_truth:
        gt[0][0] = round(gt[0][0] * scale)
        gt[0][1] = round(gt[0][1] * scale)
        gt[0][2] = round(gt[0][2] * scale)
        gt[0][3] = round(gt[0][3] * scale)
    
    rpn_cls_label, gt_boxes = get_rpn_cls_label(img_new.shape, anchors, ground_truth, train_num = train_num)
    augmented.append([img_new, rpn_cls_label, gt_boxes])
    # 原始图像与标签------------------------------------------------------
    
    # 左右翻转与标签------------------------------------------------------
    # 复制一份,后面的操作在备份上操作
    gt_copy = copy.deepcopy(ground_truth)
    x_flip = cv.flip(img_new, 1) # 左右翻转图像
    for gt in gt_copy: # 左右翻转标签
        gt[0][0] = x_flip.shape[1] - 1 - gt[0][0]
        gt[0][2] = x_flip.shape[1] - 1 - gt[0][2]
        gt[0][0], gt[0][2] = gt[0][2], gt[0][0]
        
    rpn_cls_label, gt_boxes = get_rpn_cls_label(x_flip.shape, anchors, gt_copy, train_num = train_num)
    augmented.append([x_flip, rpn_cls_label, gt_boxes])
    # 左右翻转与标签------------------------------------------------------
    
    # 上下翻转与标签------------------------------------------------------
    # 复制一份,后面的操作在备份上操作
    gt_copy = copy.deepcopy(ground_truth)
    y_flip = cv.flip(img_new, 0) # 左右翻转图像
    for gt in gt_copy: # 上下翻转标签
        gt[0][1] = y_flip.shape[0] - 1 - gt[0][1]
        gt[0][3] = y_flip.shape[0] - 1 - gt[0][3]
        gt[0][1], gt[0][3] = gt[0][3], gt[0][1]
        
    rpn_cls_label, gt_boxes = get_rpn_cls_label(y_flip.shape, anchors, gt_copy, train_num = train_num)
    augmented.append([y_flip, rpn_cls_label, gt_boxes])
    # 上下翻转与标签------------------------------------------------------
    
    # 左右上下翻转与标签--------------------------------------------------
    # 复制一份,后面的操作在备份上操作
    gt_copy = copy.deepcopy(ground_truth)
    xy_flip = cv.flip(img_new, -1) # 左右翻转图像
    for gt in gt_copy: # 左右上下翻转标签
        gt[0][0] = xy_flip.shape[1] - 1 - gt[0][0]
        gt[0][1] = xy_flip.shape[0] - 1 - gt[0][1]
        gt[0][2] = xy_flip.shape[1] - 1 - gt[0][2]
        gt[0][3] = xy_flip.shape[0] - 1 - gt[0][3]
        
        gt[0][0], gt[0][2] = gt[0][2], gt[0][0]
        gt[0][1], gt[0][3] = gt[0][3], gt[0][1]
        
    rpn_cls_label, gt_boxes = get_rpn_cls_label(xy_flip.shape, anchors, gt_copy, train_num = train_num)
    augmented.append([xy_flip, rpn_cls_label, gt_boxes])
    # 左右上下翻转与标签--------------------------------------------------
    
    return augmented

现在增强一张图像来看效果

# 测试 data_augment
titles = ["original", "x_filip", "y_flip", "xy_flip"]
plt.figure("augmented", figsize = (12, 8))

print(train_set[idx]) # idx 是 保姆级 Keras 实现 Faster R-CNN 二 中生成的随机数
augmented = data_augment(train_set[idx], train_num = 32)

for i, data in enumerate(augmented):
    img_copy = data[0].copy()
    feature_size = (img_copy.shape[0] // FEATURE_STRIDE, img_copy.shape[1] // FEATURE_STRIDE)
    anchors = create_train_anchors(feature_size, base_anchors, FEATURE_STRIDE)
    
    for j, a in enumerate(anchors):
        if POS_VAL == data[1][j]:
            gt = data[2][j]
            # 测试 get_rpn_cls_label 带出来的 gt 是否正确
            cv.rectangle(img_copy, (gt[0], gt[1]), (gt[2], gt[3]), (255, 55, 55), 2)
            cv.rectangle(img_copy, (a[0], a[1]), (a[2], a[3]), (0, 255, 0), 2)
        
        elif NEG_VAL == data[1][j]:
            cv.rectangle(img_copy, (a[0], a[1]), (a[2], a[3]), (0, 0, random.randint(128, 256)), 1)
        
    plt.subplot(2, 2, i + 1)
    plt.title(titles[i], color = 'gray')
    plt.imshow(img_copy[..., : : -1]) # 这里的通道要反过来显示才正常
plt.show()

('data_set\\007152.jpg', 'data_set\\007152.xml', 'xml')

augment
因为负样本比较多, 是随机生成的, 所以负样本不是对称的. 正样本少, 每次都会取到, 所以是对称的

二. 读入训练数据 Generator

数据增强已完成, 是时候考虑如何向网络送入训练的数据了

VOC 数据集数据集图像还是比较多的, 加上还要做一些增强处理, 一下子读到内存的话, 训练起来可能你的机器受不了. 所以需要用 Generator 的方式来读取训练数据, 需要多少读多少. 其次是要理解 Generator 的话, 一定要先理解并掌握 yield 这个神奇的关键字

# 网络输入数据 generator
# data_set: 训练或测试数据列表
# categories: 类别列表
# train_num: 参加训练的 anchor 的数量
# batch_size: 一次输入训练的图像数量
# augment_fun: 数据增强函数
# train_mode: True: 训练模式, False: 测试模式
# shuffle_enable: 打乱标记
# 返回图像和标签
def input_reader(data_set, categories, batch_size = 1, train_num = TRAIN_NUM,
                 augment_fun = None, train_mode = True, shuffle_enable = True):
    assert(isinstance(data_set, tuple) or isinstance(data_set, list))
    
    stop_now = False
    data_nums = len(data_set)
    index_list = [x for x in range(data_nums)] # 用这个列表序号来打乱 data_set 排序
    
    x = []       # 返回图像
    rpn_cls = [] # 返回分类标签

    max_rows = 0 # 记录一个 batch 中图像的最大行数
    max_cols = 0 # 记录一个 batch 中图像的最大列数
        
    while False == stop_now:
        if train_mode and shuffle_enable:
            shuffle(index_list)
            
        for i in index_list:
            is_with_label = 3 == len(data_set[i]) # 如果 3 == data_set[i], 表示带标签输入, 否则只有图像
            data_list = [] # 图像与标签 list
            
            if is_with_label:
                if augment_fun and train_mode:
                    data_list.extend(augment_fun(data_set[i], train_num))
                else:
                    # 这里的代码和 augment_fun 中的开始部分一样, 就不解释了
                    img_src = cv.imread(data_set[i][0])
                    img_new, scale = new_size_image(img_src, SHORT_SIZE)
                    feature_size = (img_new.shape[0] // FEATURE_STRIDE, img_new.shape[1] // FEATURE_STRIDE)
                    anchors = create_train_anchors(feature_size, base_anchors, FEATURE_STRIDE)
                    ground_truth = get_ground_truth(data_set[i][1], data_set[i][2], CATEGORIES)
                    for gt in ground_truth:
                        gt[0][0] = round(gt[0][0] * scale)
                        gt[0][1] = round(gt[0][1] * scale)
                        gt[0][2] = round(gt[0][2] * scale)
                        gt[0][3] = round(gt[0][3] * scale)

                    rpn_cls_label, gt_boxes = get_rpn_cls_label(img_new.shape, anchors,
                                                                ground_truth, train_num = train_num)
                    data_list.append([img_new, rpn_cls_label, gt_boxes])
            else:
                train_mode = False
                img_src = cv.imread(data_set[i])
                img_new, scale = new_size_image(img_src, SHORT_SIZE)
                data_list.append([img_new, [], []]) # 为了保持和时候相同的形状
                    
            for data in data_list:
                x.append(data[0])
                rpn_cls.append(data[1])
                max_rows = max(max_rows, x[-1].shape[0])
                max_cols = max(max_cols, x[-1].shape[1])

                if len(x) >= batch_size:
                    # 一个 batch 中图像的尺寸不一样是不能一起训练的, 所以要将其统一到相同的尺寸
                    # 行数小于最大行数在图像下方填充 0, 列数小于最大列数在图像右方填充 0
                    # 图像填充的同时标签也要填充
                    new_shape = (max_rows // FEATURE_STRIDE, max_cols // FEATURE_STRIDE)

                    for j, img in enumerate(x):
                        # 原图对应的特征图尺寸
                        old_shape = (img.shape[0] // FEATURE_STRIDE, img.shape[1] // FEATURE_STRIDE)
                        # 这里 = 号前要用 x[j] 不能用 img
                        x[j] = cv.copyMakeBorder(img,
                                                 0, max_rows - img.shape[0], 0, max_cols - img.shape[1],
                                                 cv.BORDER_CONSTANT, (0, 0, 0))

                        if is_with_label:
                            # 行方向填充数据
                            if new_shape[0] - old_shape[0] > 0:
                                pad_num = (new_shape[0] - old_shape[0]) * old_shape[1] * ANCHOR_NUM
                                y_pad = [NEUTRAL] * pad_num
                                rpn_cls[j].extend(y_pad)

                            # 列方向填充
                            # 行方向时直接加在末尾, 而列方向是不连续的, 所以一行一行加在末尾
                            if new_shape[1] - old_shape[1] > 0:
                                pad_pos = old_shape[1] * ANCHOR_NUM 

                                pad_num = (new_shape[1] - old_shape[1]) * ANCHOR_NUM
                                y_pad = [NEUTRAL] * pad_num
                                for r in range(new_shape[0]):
                                    # 这里不能用 insert 函数, insert 会把 y_pad 整体当成一个元素
                                    rpn_cls[j][pad_pos: pad_pos] = y_pad
                                    pad_pos += (pad_num + old_shape[1] * ANCHOR_NUM)

                    # 返回数据
                    x = np.array(x).astype(np.float32) / 255.0
                    rpn_cls = np.array(rpn_cls).astype(np.float32)
                    if is_with_label:
                        rpn_cls = rpn_cls.reshape((-1, new_shape[0], new_shape[1], ANCHOR_NUM))
                        
                    yield x, rpn_cls
                    
                    x = []
                    rpn_cls = []
                    max_rows = 0
                    max_cols = 0
                    
        if False == train_mode:
            stop_now = True

input_reader 中要注意的要点是当 batch_size > 4 时, 读入的图像尺寸不一定一样, 所以要将其填充到相同的尺寸, 标签也要做相应的填充, 结合下面的测试输出图像更容易明白

# 测试 input_reader
# 这里设置成 32 方向显示, 要不然密密麻麻的框
show_reader = input_reader(train_set, CATEGORIES, batch_size = 8, train_num = 32, augment_fun = data_augment)

# 测试 input_reader
x, y = next(show_reader)
batch_size = x.shape[0]
print("train image shape: ", x.shape)
print("label shape: ", y.shape)

SHOW_COLUMNS = 4
SHOW_ROWS = max(1, batch_size // SHOW_COLUMNS) + 1
plt.figure("batch_images", figsize = (12, SHOW_ROWS * 3))

for i in range(batch_size):
    feature_size = (x[0].shape[0] // FEATURE_STRIDE, x[0].shape[1] // FEATURE_STRIDE)
    anchors = create_train_anchors(feature_size, base_anchors, FEATURE_STRIDE)
    
    if 0 == i:
        print("\nanchrors in single image: ", len(anchors))
        
    positives = 0
    idxs = tf.where(K.not_equal(y[i], NEUTRAL))
    for idx in idxs:
        idx = (i, int(idx[0]), int(idx[1]), int(idx[2]))
        rgb = (0.0, 1.0, 0.0) if POS_VAL == y[idx] else (0.0, 0.0, 1.0)
        positives = positives + 1 if POS_VAL == y[idx] else positives

        idx = int(idx[1] * feature_size[1] * ANCHOR_NUM + idx[2] * ANCHOR_NUM + idx[3])
        a = anchors[idx]
        cv.rectangle(x[i], (a[0], a[1]), (a[2], a[3]), rgb, 2)
        
    plt.subplot(SHOW_ROWS, SHOW_COLUMNS, i + 1)
    plt.title("positive = " + str(positives), color = 'gray')
    plt.imshow(x[i][..., : : -1])
plt.show()

train image shape:  (8, 400, 400, 3)
label shape:  (8, 25, 25, 9)

anchrors in single image:  5625

reader_show
输出的图像中只在下方和右方进行了填充, 这样方便标签的处理, 当然你要有强迫症可以在四边都填充

这里有一个问题是 anchor box 画到了填充的黑色区域, 是因为在测试的循环中我们重新生成了 anchor box, 并没有做舍去或者截断处理, 真正训练的时候是没有的

三. 代码下载

示例代码可下载 Jupyter Notebook 示例代码

上一篇: 保姆级 Keras 实现 Faster R-CNN 三
下一篇: 保姆级 Keras 实现 Faster R-CNN 五

Mr-MegRob

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
4
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录