【目标检测之数据集预处理】继承Dataset定义自己的数据集【附代码】

爱吃肉的鹏

已于 2022-02-24 15:56:20 修改

阅读量3.3k

点赞数 9

分类专栏：搭建自己的目标检测文章标签：深度学习 pytorch 神经网络目标检测

于 2022-02-24 14:25:49 首次发布

本文链接：https://blog.csdn.net/z240626191s/article/details/123108750

版权

搭建自己的目标检测专栏收录该内容

4 篇文章 5 订阅

订阅专栏

在深度学习训练中，除了设计有效的卷积神经网络框架外，更重要的是数据的处理。在训练之前需要对训练数据进行预处理。比如在目标检测网络训练中，首先需要划分训练集和测试集，然后对标签、边界框等进行处理后才能送入网络进行训练，本文章以VOC数据集格式为例，对数据集进行预处理后送入目标检测网络进行训练。【附代码】

划分训练集和测试集

annotation_path = r'2007_train.txt'
    with open(annotation_path, encoding='utf-8') as f:
        lines = f.readlines()
    np.random.seed(10101)
    np.random.shuffle(lines)
    np.random.seed(None)
    val = 0.1
    num_val = int(len(lines)*val)
    num_train = len(lines) - num_val

定义自己的数据集处理代码，数据处理主要继承torch.utils.data中的Dataset函数【注意：这里只是预处理数据集，还不是加载数据集，加载预处理后的数据集需要用torch.utils.data中的DataLoader进行加载】

class MyDatasets(Dataset):
    def __init__(self, train_line, image_size, is_train):
        super(MyDatasets, self).__init__()
        self.train_line = train_line  # 训练集的长度
        self.train_batches = len(train_line)  # 总batch数量
        self.image_size = image_size  # 图像大小
        self.is_train = is_train  # 是否训练

继承Dataset函数

继承Dataset父类函数主要修改两个成员函数，def __len__(self):【用来返回数据集长度】和def __getitem__(self, index):【用来返回数据集和标签】。注意后者是一个可迭代的【看形参index就可知道】。

    def __len__(self):  # 返回数据集的长度
        return self.train_batches

    def __getitem__(self, index):  # 返回数据集和标签
        lines = self.train_line

        if self.is_train:
            img, y = self.get_data(lines[index], self.image_size[0:2], random=False)

我用自己的数集来演示一下上述两个函数的输出结果【前提是已经划分好数据集了，将标签文件转为txt文件】

self.train_batches输出结果

In [1]: self.train_batches
Out[1]: 799

self.train_line输出结果

In [2]: lines = self.train_line

Out[3]:
['E:\\VOCdevkit/VOC2007/JPEGImages/23.jpg 3
7,27,134,209,0 233,1,387,210,0 20,220,167,414,0 258,226,348,413,0\n',
'E:\\VOCdevkit/VOC2007/JPEGImages/263.jpg
110,83,401,407,0\n',
'E:\\VOCdevkit/VOC2007/JPEGImages/117.jpg
205,115,342,329,0\n',
···········································

可以看出self.train_line是将txt文件中的数据进行进行了读取VOCdevkit/VOC2007/JPEGImages为路径23.jpg图片名称，37,27,134,209,0 233,1,387,210,0 20,220,167,414【bbox信息】,0【类，该图像中的类别】，代码中的lines类型是列表

定义数据集处理函数：

def get_data(self, annotation_line, input_shape)：
    line = annotation_line.split()
    image = Image.open(line[0])  # line[0]是图片路径，line[1:]是框和标签信息
    iw, ih = image.size  # 真实输入图像大小
    h, w = input_shape  # 网络输入大小
    box = np.array([np.array(list(map(int, box.split(',')))) for box in line[1:]])  # 将box信息转为数组

将上述lines【txt文件中数据集】放入get_data函数中，即对应的形参annotation_line。这里我只加从lines列表中载一张图像进行演示。

annotation_line.split()是将图像路径、图像名称、bbox和类别进行划分

所以line = annotation_line.split()的输出为：

In [5]: line
Out[5]:
['E:\\VOCdevkit/VOC2007/JPEGImages/23.jpg',

'37,27,134,209,0',
'233,1,387,210,0',
'20,220,167,414,0',
'258,226,348,413,0']

打开图片

image = Image.open(line[0])  # line[0]是图片路径，line[1:]是框和标签信息

可以对图片进行显示一下给大家看看

image.show()

读取图像和bbox信息

iw, ih = image.size  # 真实输入图像大小
h, w = input_shape  # 网络输入大小
box = np.array([np.array(list(map(int, box.split(',')))) for box in line[1:]])  # 将box信息转为数组

iw,ih指的是输入图像的真实大小

h,w指的是网络输入大小，比如SSD是300*300，yolo是416*416，这里我以网络输入300*300为例，即h = w = 300

In [9]: iw,ih=image.size

In [10]: iw,ih
Out[10]: (416, 416)
------------------------------------------------------------------

h=w=300

box输出信息：注意看下面的输出，这个数组或者说矩阵中，一共5列，从第一列到第四列是bbox信息，最后一列是类标签。并且在前四列中的bbox信息中，1和2列是bbox左上角坐标，3和4列是右下角坐标。这里非常重要，后面还要用到。

In [13]: box
Out[13]:
array([[ 37, 27, 134, 209, 0],
[233, 1, 387, 210, 0],
[ 20, 220, 167, 414, 0],
[258, 226, 348, 413, 0]])

因为我们的网络输入是300*300，但这里的图像实际大小为416*416，那么我们需要对原始图像进行缩放，使其适合我们的网络输入。

图像缩放

先计算一下缩放比例，也就是原始图像相对于输入图像大小，缩放后的图像不产生形变：

scale = min(w / iw, h / ih)

保证长或宽，符合目标图像的尺寸
nw = int(iw * scale)
nh = int(ih * scale)
dx = (w - nw) // 2  
dy = (h - nh) // 2

In [15]: scale
Out[15]: 0.7211538461538461

In [18]: nw,nh
Out[18]: (300, 300)

image = image.resize((nw, nh), Image.BICUBIC) # 采用双三次插值算法缩小图像

创建一个新的灰度图像，缩放后的图像可能不能满足网络大小，所以可以给周边补充一些灰度条。

new_image = Image.new('RGB', (w, h), (128, 128, 128))

可以看一下这个新创造的图，new_image.show()

再将灰度条和裁剪后图合在一起

new_image.paste(image, (dx, dy))
image_data = np.array(new_image, np.float32)

bbox处理

box_data = np.zeros((len(box), 5)) # 创建一个和bbox shape一样的全零矩阵

Out[30]:
array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])

因为前面已经对图像进行了缩放，那么相应的，缩放后的bbox也会发生改变，所以计算一下缩放后的bbox 【分别院box对前4列坐标信息进行缩放】

box[:, [0, 2]] = box[:, [0, 2]] * nw / iw + dx  # 对原框x坐标缩放
box[:, [1, 3]] = box[:, [1, 3]] * nh / ih + dy  # 对原y坐标进行缩放

得到缩放后图像的bbox信息

array([[ 26, 19, 96, 150, 0],
[168, 0, 279, 151, 0],
[ 14, 158, 120, 298, 0],
[186, 162, 250, 297, 0]])

进一步处理坐标信息，防止缩放后坐标的溢出或者说出现负坐标

# 处理左上坐标,防止负坐标
box[:, 0:2][box[:, 0:2] < 0] = 0

# 处理右下坐标，防止超过输入边界
box[:, 2][box[:, 2] > w] = w   # box[:, 2] > w是条件语句，意思是判断第2列坐标是否超过了w
box[:, 3][box[:, 3] > h] = h

# 计算缩放后的框的尺寸
box_w = box[:, 2] - box[:, 0]  # 第2列坐标-第0列坐标，可以得出box的w
box_h = box[:, 3] - box[:, 1]

我这里的输出box_w,box_h各有4个值，是因为我原来标签中有4个边界框

In [40]: box_w,box_h
Out[40]: (array([ 70, 111, 106, 64]), array([131, 151, 140, 135]))

计算一下有效的边界框【有效的边界框是指box长度大于1】

box = box[np.logical_and(box_w > 1, box_h > 1)] # 逻辑与判断有效的边界框
box_data = np.zeros((len(box), 5))
# 将有效的边界框赋值给前面定义的全零box_data
box_data[:len(box)] = box

现在我们再返回

def __getitem__(self, index):

        if self.is_train:
            img, y = self.get_data(lines[index], self.image_size[0:2], random=False)

现在的img,y就是通过我们定义def get_data返回的结合，即上面输出的image_data和box_data

取出box坐标【不包含类的那一列】

boxes = np.array(y[:, :4], dtype=np.float32)

In [49]: boxes
Out[49]:
array([[ 26., 19., 96., 150.],
[168., 0., 279., 151.],
[ 14., 158., 120., 298.],
[186., 162., 250., 297.]], dtype=float32)

进一步对box坐标进行处理，归一化处理

boxes[:, 0] = boxes[:, 0] / self.image_size[1]
boxes[:, 1] = boxes[:, 1] / self.image_size[0]
boxes[:, 2] = boxes[:, 2] / self.image_size[1]
boxes[:, 3] = boxes[:, 3] / self.image_size[0]

In [55]: boxes
Out[55]:
array([[0.08666667, 0.06333333, 0.32 , 0.5 ],
[0.56 , 0. , 0.93 , 0.50333333],
[0.04666667, 0.52666664, 0.4 , 0.99333334],
[0.62 , 0.54 , 0.8333333 , 0.99 ]], dtype=float32)

获取boxes坐标比1小比0大的有效坐标

boxes = np.maximum(np.minimum(boxes, 1), 0)

再将处理以后的box坐标矩阵和类别这一列进行拼接，得到完整的bbox信息【包含类标签】

y = np.concatenate([boxes, y[:, -1:]], axis=-1)

In [59]: y
Out[59]:
array([[0.08666667, 0.06333333, 0.31999999, 0.5 , 0. ],
[0.56 , 0. , 0.93000001, 0.50333333, 0. ],
[0.04666667, 0.52666664, 0.40000001, 0.99333334, 0. ],
[0.62 , 0.54000002, 0.83333331, 0.99000001, 0. ]])

上面的y就是最终得到的bbox，可以看出前4列是边界框坐标信息，最后一列是类

img = np.array(img, dtype=np.float32) # 将图像转为数组

tmp_inp = np.transpose(img - MEANS, (2, 0, 1))  # tmp_inp的shape为（3，300，300）

tmp_targets = np.array(y, dtype=np.float32) # 标签转数组

--------------------------------------------------------------------------------------------------------------------------------

现在我们就得到了最终数据处理后的图像信息(包含了边界框坐标)和标签信息

完整的代码：

class MyDatasets(Dataset):
    def __init__(self, train_line, image_size, is_train):
        super(MyDatasets, self).__init__()
        self.train_line = train_line
        self.train_batches = len(train_line)
        self.image_size = image_size
        self.is_train = is_train
        embed()

    def get_data(self, annotation_line, input_shape, random=True):

        line = annotation_line.split()
        image = Image.open(line[0])  # line[0]是图片路径，line[1:]是框和标签信息
        iw, ih = image.size  # 真实输入图像大小
        h, w = input_shape  # 网络输入大小
        box = np.array([np.array(list(map(int, box.split(',')))) for box in line[1:]])  # 将box信息转为数组
        if not random:
            # 裁剪图像
            scale = min(w / iw, h / ih)
            nw = int(iw * scale)
            nh = int(ih * scale)
            dx = (w - nw) // 2  # 取商（应该是留部分条状）
            dy = (h - nh) // 2
            image = image.resize((nw, nh), Image.BICUBIC) # 采用双三次插值算法缩小图像
            new_image = Image.new('RGB', (w, h), (128, 128, 128))
            new_image.paste(image, (dx, dy))
            image_data = np.array(new_image, np.float32)

            # 处理真实框
            box_data = np.zeros((len(box), 5))
            if (len(box) > 0):
                np.random.shuffle(box)
                box[:, [0, 2]] = box[:, [0, 2]] * nw / iw + dx  # 对原框x坐标缩放
                box[:, [1, 3]] = box[:, [1, 3]] * nh / ih + dy  # 对原y坐标进行缩放

                # 处理左上坐标,防止负坐标
                box[:, 0:2][box[:, 0:2] < 0] = 0

                # 处理右下坐标，防止超过输入边界
                box[:, 2][box[:, 2] > w] = w
                box[:, 3][box[:, 3] > h] = h

                # 计算缩放后的框的尺寸
                box_w = box[:, 2] - box[:, 0]
                box_h = box[:, 3] - box[:, 1]

                box = box[np.logical_and(box_w > 1, box_h > 1)]
                box_data = np.zeros((len(box), 5))
                box_data[:len(box)] = box

            return image_data, box_data

    def __len__(self):  # 返回数据集的长度

        return self.train_batches

    def __getitem__(self, index):  # 返回数据集和标签
        lines = self.train_line

        if self.is_train:
            img, y = self.get_data(lines[index], self.image_size[0:2], random=False)
        else:
            img, y = self.get_data(lines[index], self.image_size[0:2], random=False)

        boxes = np.array(y[:, :4], dtype=np.float32)

        boxes[:, 0] = boxes[:, 0] / self.image_size[1]
        boxes[:, 1] = boxes[:, 1] / self.image_size[0]
        boxes[:, 2] = boxes[:, 2] / self.image_size[1]
        boxes[:, 3] = boxes[:, 3] / self.image_size[0]

        boxes = np.maximum(np.minimum(boxes, 1), 0)
        y = np.concatenate([boxes, y[:, -1:]], axis=-1)

        img = np.array(img, dtype=np.float32)

        tmp_inp = np.transpose(img - MEANS, (2, 0, 1))
        tmp_targets = np.array(y, dtype=np.float32)

        return tmp_inp, tmp_targets

爱吃肉的鹏

关注

9
点赞
踩
20

收藏

觉得还不错? 一键收藏
打赏
23
评论
【目标检测之数据集预处理】继承Dataset定义自己的数据集【附代码】

在深度学习训练中，除了设计有效的卷积神经网络框架外，更重要的是数据的处理。在训练之前需要对训练数据进行预处理。比如在目标检测网络训练中，首先需要划分训练集和测试集，然后对标签、边界框等进行处理后才能送入网络进行训练，本文章以VOC数据集格式为例，对数据集进行预处理后送入目标检测网络进行训练。【附代码】
复制链接

扫一扫