不失真resize与结果还原

半片青柠

已于 2024-03-22 16:59:14 修改

阅读量465

点赞数 5

分类专栏：深度学习文章标签：计算机视觉人工智能深度学习 opencv 目标检测

于 2024-03-22 16:55:50 首次发布

本文链接：https://blog.csdn.net/sinat_40587853/article/details/136933514

版权

深度学习专栏收录该内容

3 篇文章 0 订阅

订阅专栏

本文介绍了在目标检测任务中如何通过不失真resize方法调整图像大小，保持原始比例，同时讨论了letterbox_image函数实现以及如何根据输入图像调整模型返回的目标点位。

摘要由CSDN通过智能技术生成

1、写在前面

目标检测类任务一般要求输入图像的shape为(A, A, 3)，即方形图像。事实上大部分数据集及部署现场的图像都不会是方形图，因此在进行推理前有必要对图像进行resize操作。

2、失真resize(不推荐)

直接对图像进行强制比例调整，会破坏图像的原始比例，十分不推荐这种失真resize
失真的resize

3、不失真resize(推荐)

不失真的resize的思路十分简单，首先按照按照较长边获取放缩因子scale，使用scale对整个图像进行比例调整，其中较长边被调整到目标尺寸，接着对短边两端进行添加灰条操作，使总长度也来到目标尺寸，至此不失真resize完成。

def resize_with_proportion(image, target_shape):
    target_shape = np.array(target_shape)
    scale = np.min(target_shape / np.shape(image)[:-1])
    new_image = cv2.resize(image, None, None, fx=scale, fy=scale, interpolation=cv2.INTER_LINEAR)
    return new_image


def letterbox_image_pad(image, target_shape):
    # 在图像外围添加填充区域
    target_shape = np.array(target_shape)
    new_image = resize_with_proportion(image, target_shape)
    nih, niw = np.shape(new_image)[:-1]
    th, tw = target_shape
    # th, tw = np.clip(target_shape)
    top = (th - nih) // 2
    bottom = th - top - nih
    left = (tw - niw) // 2
    right = tw - left - niw
    pad_img = cv2.copyMakeBorder(new_image, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(0, 0, 0))
    return pad_img


def letterbox_image(image, target_shape):
    # 将图像复制到目标尺寸的'画布'上的指定区域
    target_shape = np.array(target_shape)
    new_image = resize_with_proportion(image, target_shape)
    nih, niw = np.shape(new_image)[:-1]
    th, tw = target_shape
    # th, tw = np.clip(target_shape)
    top = (th - nih) // 2
    left = (tw - niw) // 2

    pad_img = np.ones([th, tw, 3], dtype=new_image.dtype) * 128
    pad_img[top:nih + top, left:niw + left] = new_image

    return pad_img

不失真的resize

4、题外话-关于目标点位还原

模型返回的目标点位是基于输入图像尺寸的偏移量。输入图像如果采用不失真resize进行比例放缩，那么返回的结果需要进行去填充边处理，才可以应用到原图上。

def correct_boxes(result, input_shape, image_shape):
    new_shape = image_shape * np.min(input_shape / image_shape)

    offset = (input_shape - new_shape) / 2. / input_shape
    scale = input_shape / new_shape * image_shape

    scale_for_boxs = [scale[1], scale[0], scale[1], scale[0]]
    offset_for_boxs = [offset[1], offset[0], offset[1], offset[0]]
   
    result = (result - np.array(offset_for_boxs)) * np.array(scale_for_boxs)

    return result

首先减去左上侧的填充偏移量，得到目标直接基于原点的偏移量，（也可以理解为有效区域new_image整体进行了平移）接着乘上input_shape，得到对应数值，这个数值在偏移后的input_image和new_image上指向的是同一个点，同一个pixel。input_shape和new_shape具有相同的尺寸维度，接着只需要将得到数值从new_shape维度转换到image_shape维度即可得到基于原图的尺寸。可能一些弯弯绕，需要静下心来理解一下。