使用letter_box操作优化图像数据预处理

Lunar*

已于 2024-06-19 10:52:20 修改

阅读量693

点赞数 14

文章标签：目标检测 yolo

于 2024-06-19 10:47:44 首次发布

本文链接：https://blog.csdn.net/qq_45141261/article/details/139796005

版权

引言

在计算机视觉和深度学习领域，数据预处理是模型训练过程中的关键步骤之一。正确的图像预处理不仅可以提高模型的训练效率，还可以显著提高最终模型的性能。本文将介绍一种常用的图像预处理技术——letter_box操作，解释其原理，展示其优势，并通过Python代码实现该操作。

letter_box操作的原理

letter_box操作是一种图像预处理方法，主要用于在保持图像原始纵横比的同时调整图像的尺寸。这种方法通常涉及将图像缩放到目标尺寸的最长边，同时对短边进行填充，确保整个图像符合模型所需的尺寸要求，如确保每个边长是特定数字（如32）的倍数。

letter_box的优势

减少信息损失和几何变形：通过保持原始纵横比，letter_box操作减少了因尺寸调整导致的信息损失和图像变形，这对于保持图像内容的真实性至关重要。
最小填充策略：通过尽可能减少填充区域，letter_box减轻了过度填充带来的负面影响，例如噪声的增加和模型训练效率的降低。
提高数据读取速度：通过预处理图像数据集并保存处理后的图像和标签，可以减少训练时多次参预处理的计算负担，从而加快数据读取速度。

Python实现

以下是一个Python脚本，展示了如何实现letter_box图像处理及其对应标签的调整：

import cv2
import numpy as np
from pathlib import Path
import os
from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm

def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):
    # 调整图像大小并添加填充以满足步长倍数要求
    shape = im.shape[:2]  # 当前形状 [高度, 宽度]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # 缩放比例 (新 / 旧)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scaleup:  # 只缩小尺寸，不放大（为了更好的验证mAP）
        r = min(r, 1.0)

    # 计算填充
    ratio = r, r  # 宽度、高度比例
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # 宽高填充
    if auto:  # 最小矩形
        dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # 宽高填充
    elif scaleFill:  # 拉伸
        dw, dh = 0.0, 0.0
        new_unpad = (new_shape[1], new_shape[0])
        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # 宽度、高度比例

    dw /= 2  # 将填充分到两边
    dh /= 2

    if shape[::-1] != new_unpad:  # 如果尺寸有变，进行缩放
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)

    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))  
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # 添加边界

    return im, ratio, (dw, dh)

def adjust_labels(label_file, ratio, pad, new_shape):
    # 读取标签文件并根据图像缩放和填充调整坐标
    lines = label_file.read_text().splitlines()
    new_lines = []
    for line in lines:
        parts = line.split()
        class_id, x, y, w, h = map(float, parts)
        # 调整坐标和尺寸
        x = (x * ratio[0] + pad[0] / new_shape[1]) * 2
        y = (y * ratio[1] + pad[1] / new_shape[0]) * 2
        w *= ratio[0]
        h *= ratio[1]
        new_lines.append(f"{int(class_id)} {x:.6f} {y:.6f} {w:.6f} {h:.6f}")
    return new_lines

def letterbox_file(img_file, label_file, res_img_path, res_label_path, new_shape):
    # 处理单个图像和标签文件
    img_file_path = Path(img_file)
    label_file_path = Path(label_file)
    
    img = cv2.imdecode(np.fromfile(str(img_file_path), dtype=np.uint8), -1)
    res_img, ratio, wh = letterbox(img, new_shape)
    
    res_img_file_path = Path(res_img_path) / img_file_path.name
    res_label_file_path = Path(res_label_path) / label_file_path.name
    cv2.imencode('.jpg', res_img)[1].tofile(str(res_img_file_path))
    
    # 读取标签文件并调整标签
    new_labels = adjust_labels(label_file_path, ratio, wh, new_shape)
    res_label_file_path.write_text("\n".join(new_labels))

if __name__ == "__main__":
    ori_img_path = '/data/temp/temp-tes-data/res-img/'
    ori_label_path = '/data/temp/temp-tes-data/res-labels/'
    res_img_path = '/data/temp/temp-tes-data/resize-img/'
    res_label_path = '/data/temp/temp-tes-data/resize-labels/'
    new_shape = (1280, 1280)

    img_list = [str(path) for path in Path(ori_img_path).rglob('*.jpg')]
    label_list = [str(path) for path in Path(ori_label_path).rglob('*.txt')]
    for img_file, label_file in zip(img_list, label_list):
        letterbox_file(img_file, label_file, res_img_path, res_label_path, new_shape)

代码的实际应用

多线程处理和进度条显示
为了提高处理大量图像数据的效率，我们可以使用Python的concurrent.futures模块来实现多线程处理。此外，使用 tqdm库可以为处理过程添加一个可视化的进度条，这样用户可以实时监控处理进度。

以下是实现多线程处理图像和标签以及进度条功能的代码示例：

from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm

def multi_letterbox(img_list, label_list, res_img_path, res_label_path, new_shape, n_works=10):
    # 使用线程池处理图像和标签
    with ThreadPoolExecutor(max_workers=n_works) as executor:
        futures = []
        for img_file, label_file in zip(img_list, label_list):
            # 提交任务到线程池
            futures.append(executor.submit(letterbox_file, img_file, label_file, res_img_path, res_label_path, new_shape))
        
        # 使用tqdm显示进度条
        for future in tqdm(as_completed(futures), total=len(img_list), desc="Processing images and labels"):
            pass  # tqdm将自动更新进度

if __name__ == "__main__":
    ori_img_path = '/data/temp/temp-tes-data/res-img/'
    ori_label_path = '/data/temp/temp-tes-data/res-labels/'
    res_img_path = '/data/temp/temp-tes-data/resize-img/'
    res_label_path = '/data/temp/temp-tes-data/resize-labels/'
    new_shape = (1280, 1280)
    n_works = 20  # 线程数量

    img_list = [str(path) for path in Path(ori_img_path).rglob('*.jpg')]
    label_list = [str(path) for path in Path(ori_label_path).rglob('*.txt')]
    multi_letterbox(img_list, label_list, res_img_path, res_label_path, new_shape, n_works)