pascal voc 数据集_【SSD算法】史上最全代码解析-数据篇

前言

SSD代码详解数据篇,旨在全方位介绍数据从下载到数据增强,最后封装为pytorch的data_loader过程。

其中,涉及了目标检测领域绝大部分的数据增强方式,亮度、对比度、色调、裁剪、扩充等等方法。

结合之前的【SSD算法】史上最全代码解析-核心篇,相信针对SSD算法,一定能够有一个全新、全面的认识,同时也有助于对其他检测算法的学习和理解,比较基础的东西是一样的,只是在算法的设计和网络的设计有所不同。

⛳️ 两篇一起阅读,必然效果更好,希望能对大家能有所帮助!

目录

  • 下载数据
  • 数据dataset
  • 数据增强
    • 1. 数据类型转换
    • 2. Transform Compose
    • 3. IOU计算
    • 4. bbox坐标变化
    • 5. 图片 Resize
    • 6. 图片色彩转换
    • 7. 色调Hue变化
    • 8. 饱和度变化
    • 9. 亮度变化
    • 10. 对比度变化
    • 11. 颜色通道变化
    • 12. 图片镜像
    • 13. 图片随机裁剪
    • 14. 图片扩充
    • 汇总

下载数据

进入到自己的data文件夹,执行下面的脚本即可下载并解压好 VOC2017 & VOC2012 的数据。

脚本代码:

cd ./data
echo "Downloading VOC2007 trainval ..."
# 下载数据
curl -LO http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
echo "Downloading VOC2007 test data ..."
curl -LO http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
echo "Done downloading."

# 解压数据
echo "Extracting trainval ..."
tar -xvf VOCtrainval_06-Nov-2007.tar
echo "Extracting test ..."
tar -xvf VOCtest_06-Nov-2007.tar
echo "removing tars ..."
# 删除压缩包
rm VOCtrainval_06-Nov-2007.tar
rm VOCtest_06-Nov-2007.tar

然后稍微整理,把VOC2007, VOC2012放在同一个目录下,如下:

├── data
│    ├── VOC
│         ├── VOCdevkit
│              ├── VOC2017
│              ├── VOC2012

数据dataset

1. Annotation Tranform:

✔️ VOCAnnotationTransform() 需要把VOC的xml数据提取并且转化,提取bbox坐标进行归一化,并且把类别转化为字典格式,最后把数据组合为: [[xmin, ymin, xmax, ymax, label_ind],...]

  • VOC数据类别和目录:
VOC_CLASSES = (
    'aeroplane', 'bicycle', 'bird', 'boat',
    'bottle', 'bus', 'car', 'cat', 'chair',
    'cow', 'diningtable', 'dog', 'horse',
    'motorbike', 'person', 'pottedplant',
    'sheep', 'sofa', 'train', 'tvmonitor')

VOC_ROOT = "E:/TZ_WK/VOC/VOCdevkit"
  • VOCAnnotationTransform:
class VOCAnnotationTransform(object):
    """
    把VOC的annotation中bbox的坐标转化为归一化的值;
    将类别转化为用索引来表示的字典形式;
    Args:
        class_to_ind: (dict)类别的索引字典
        keep_difficult: 是否保留difficult=1的物体
    """
    def __init__(self, class_to_ind=None, keep_difficult=False):
        self.class_to_ind = class_to_ind or dict(
                zip(VOC_CLASSES, range(len(VOC_CLASSES))))
        self.keep_difficult = keep_difficult

    def __call__(self, target, width, height):
        """
        Args:
            target: xml被读取的一个ET.Element
            width: 图片宽度
            height: 图片高度
        Return:
            res: list, [bbox coords, class name]
                -->eg: [[xmin, ymin, xmax, ymax, label_ind],...]
        """
        res = []
        for obj in target.iter('object'):
            # 判断difficult
            difficult = int(obj.find('difficult').text) == 1
            if not self.keep_difficult and difficult:
                continue

            # 读取xml中所需的信息
            name = obj.find('name').text.lower().strip()
            bbox = obj.find('bndbox')
            # bbox的表示
            pts = ['xmin', 'ymin', 'xmax', 'ymax']
            bndbox = []
            for i, pt in enumerate(pts):
                cur_pt = int(bbox.find(pt).text) - 1
                # 归一化,x/w, y/h
                cur_pt = cur_pt / width if i % 2 == 0 else cur_pt / height

                bndbox.append(cur_pt)

            # 提取类别名称对应的 index    
            label_idx = self.class_to_ind[name]

            bndbox.append(label_idx)

            res += [bndbox] 

        return res

# 代码调试
if __name__ == "__main__":
    vocan = VOCAnnotationTransform()
    res = vocan(target, width, height)
    print('The transform res:')
    print(res)
输出:
The transform res:
[[0.13314447592067988, 0.478, 0.5495750708215298, 0.74, 11],
 [0.019830028328611898, 0.022, 0.9943342776203966, 0.994, 14]]

2. VOC Detection Dataset:

✔️ 根据Annotation Transform 和 VOC的数据结构,读取图片, bbox和label,构建VOC的数据集。

class VOCDetection(data.Dataset):
    def __init__(self, root, 
                 image_sets = [('2007', 'trainval'), ('2012', 'trainval')],
                 transform = None, 
                 target_transform = VOCAnnotationTransform(),):

        self.root = root
        self.image_set = image_sets
        self.transform = transform
        self.target_transform = target_transform
        # bbox和label
        self._annopath = os.path.join('%s', 'Annotations', '%s.xml')
        # 图片path
        self._imgpath = os.path.join('%s', 'JPEGImages', '%s.jpg')

        self.ids = list()
        for (year, name) in image_sets:
            rootpath = os.path.join(self.root, 'VOC' + year)
            for line in open(os.path.join(rootpath, 'ImageSets', 'Main', name + '.txt')):
                self.ids.append((rootpath, line.strip()))  

    def __getitem__(self, index):
        img_id = self.ids[index]
        # label 信息
        target = ET.parse(self._annopath % img_id).getroot()
        # 读取图片信息
        img = cv2.imread(self._imgpath % img_id)
        h, w, c = img.shape

        # Annotation transform
        if self.target_transform is not None:
            target = self.target_transform(target, w, h)

        # transform, 数据增强
        if self.transform is not None:
            target = np.array(target)
            # transform
            img, boxes, labels = self.transform(img, target[:, :4], target[:, 4])

            # 把图片转化为RGB
            img = img[:, :,(2, 1, 0)]

            # 把 bbox和label合并为 shape(N, 5)
            target = np.hstack(boxes, np.expand_dims(labels, axis=1))

        else:
            target = np.array(target)

        return torch.from_numpy(img).permute(2, 0, 1), target, h, w

    def __len__(self):
        return len(self.ids)
调试代码:
Data = VOCDetection(VOC_ROOT)
data_loader = data.DataLoader(Data, batch_size=1,
                                  num_workers=0,
                                  shuffle=True,
                                  pin_memory=True)
print('the data length is:', len(data_loader))

# 类别 to index
class_to_ind = dict(zip(VOC_CLASSES, range(len(VOC_CLASSES))))

# index to class,转化为类别名称
ind_to_class = ind_to_class ={v:k for k, v in class_to_ind.items()}

# 加载数据
for datas in data_loader:
    img, target,h, w = datas
    img = img.squeeze(0).permute(1,2,0).numpy().astype(np.uint8)
    target = target[0].float()

    # 把bbox的坐标还原为原图的数值
    target[:,0] *= w.float()
    target[:,2] *= w.float()
    target[:,1] *= h.float()
    target[:,3] *= h.float()

    # 取整
    target = np.int0(target.numpy())
    # 画出图中类别名称
    for i in range(target.shape[0]):
        # 画矩形框
        img =cv2.rectangle(img, (target[i,0],target[i,1]),(target[i, 2], target[i, 3]), (0,0,255), 2)
        # 标明类别名称
        img =cv2.putText(img, ind_to_class[target[i,4]],(target[i,0], target[i,1]-25),
                    cv2.FONT_HERSHEY_SIMPLEX, .5, (255, 255, 0), 1)
    # 显示
    cv2.imshow('imgs', img)
    cv2.waitKey(0);
    cv2.destroyAllWindows()
    break
输出
the data length is: 16551

v2-a04169ceb23d43b9def054c1883c4a50_b.jpg
image

数据增强

1. 数据类型转换

✔️ 在针对图像进行变化的过程中,需要把图片的 uint8 格式转化为 np.float32,方便计算。

class ConvertFromInts(object):
    """
    把图片的uint8转化为float型
    """
    def __call__(self, image, boxes=None, labels=None):

        return image.astype(np.float32), boxes, labels

2. Transform Compose

✔️ 我们有很多图片增强的方式,比如对比度,亮度,色度等等,因此会有很多的transform, Compose()函数的作用是把这些transform合并在一起。

class Compose(object):
    """
    把不同的数据增强方法组合在一起
    Args:
        transforms: (list[Transform]):transforms的列表
    Example:
        >>> augmentations.Compose([
        >>>     transforms.CenterCrop(10),
        >>>     transforms.ToTensor(),])
    """
    def __init__(self, transform):
        self.transform = transform

    def __call__(self, img, boxes=None, labels=None):
        for t in self.transform:
            img, boxes, labels = t(img, boxes, labels)

        return img, boxes, labels

3. IOU计算

✔️ 在进行裁剪图片的时候,我们需要考虑裁剪框和图片bbox的iou,这样确保裁剪出的都是有效区域。

def iou_numpy(box_a, box_b):
    '''
    计算一个框和一些框之间的iou值;
    Args:
        box_a: 多个bounding boxes,shape[N,4]
        box_b: 裁剪矩形,单个bounding box, shape[4]
    Reture:
        iou: shape[N]
    '''
    lt = np.maximum(box_a[:, :2], box_b[:2])
    rb = np.minimum(box_a[:, 2:], box_b[2:])
    wh = np.clip((rb - lt), a_min=0, a_max=np.inf)
    inter = wh[:, 0]*wh[:, 1]

    area_a = ((box_a[:, 2] - box_a[:, 0]) * 
              (box_a[:, 3] - box_a[:, 1]))

    area_b = ((box_b[2] - box_b[0]) * 
              (box_b[3] - box_b[1]))

    iou = area_a + area_b - inter

    return iou

4. bbox坐标变化

✔️ 在图片增强的过程中,有时候需要原图的绝对坐标,确保bbox的变化,有时候需要归一化后的坐标,例如在resize时候。

  • 归一化 --> 原图 size
class ToAbsoluteCoords(object):
    """
    把归一化后的box变回原图
    """
    def __call__(self, image, boxes=None, labels=None):
        h, w, c = image.shape
        boxes[:, 0] *= w
        boxes[:, 2] *= w
        boxes[:, 1] *= h
        boxes[:, 3] *= h

        return image, boxes, labels
  • 原图 size --> 归一化
class ToPercentCoords(object):
    """
    把原图的box进行归一化
    """
    def __call__(self, image, boxes=None, labels=None):

        h, w, c = image.shape

        boxes[:, 0] = boxes[:, 0] / w
        boxes[:, 2] = boxes[:, 2] / w
        boxes[:, 1] = boxes[:, 1] / h
        boxes[:, 3] = boxes[:, 3] / h

        return image, boxes, labels

5. 图片 Resize

✔️ 输入的图片大小各异,在输入网络前,需要进行统一的resize。

class Resize(object):
    """
    图片 Resize
    """
    def __init__(self, size=300):
        self.size = size

    def __call__(self, image, boxes=None, labels=None):
        image = cv2.resize(image, (self.size, self.size))

        return image, boxes, labels

v2-bd99f4c9754ba291814ecdb040accf4c_b.jpg
Resize

6. 图片色彩转换

✔️ 在进行亮度,饱和度等变化时,需要把色彩空间转换为HSV。

class ConvertColor(object):
    """
    BGR 和 HSV 之间的转换
    """
    def __init__(self, current='BGR', transform='HSV'):
        self.current = current
        self.transform = transform

    def __call__(self, image, boxes=None, labels=None):
        # BGR TO HSV
        if self.current == 'BGR' and self.transform =='HSV':
            image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
         # HSV TO BGR  
        elif self.current == 'HSV' and self.transform == 'BGR':
            image = cv2.cvtColor(image, cv2.COLOR_HSV2BGR)

        else:
            raise NotImplementedError

        return image, boxes, labels

7. 色调Hue变化

  • Hue变化需要在 HSV 空间下,改变H的数值;
  • 图像IPL_DEPTH_32F类型时,H取值范围是0-360
class RandomHue(object):
    """
    随机变换色度(在np.float32 type 和 HSV 空间下, H的范围(0, 360));
    需要输入图片格式为HSV;
    """
    def __init__(self, delta=18.0):
        assert delta >= 0.0 and delta <= 360.0
        self.delta = delta

    def __call__(self, image, boxes=None, labels=None):
        if random.randint(2):
            print('hue')
            # 改变 h的值
            image[:, :, 0] += random.uniform(-self.delta, self.delta)

            # 已知 h 的范围是在 (0, 360)之间
            image[:, :, 0][image[:, :, 0] > 360.0] -= 360.0
            image[:, :, 0][image[:, :, 0] < 0.0] += 360.0

        return image, boxes, labels

v2-471894501aa41d1def827ec4734c10aa_b.jpg
RandomHue

8. 饱和度变化

  • 饱和度变化需要在 HSV 空间下,改变S的数值;
  • 图像IPL_DEPTH_32F类型时,S取值范围是0-1
class RandomSaturation(object):
    """
    随机饱和度变化,需要输入图片格式为HSV
    """
    def __init__(self, lower=0.5, upper=1.5):
        self.lower = lower
        self.upper = upper
        assert self.upper >= self.lower, "contrast upper must be >= lower."
        assert self.lower >= 0, "contrast lower must be non-negative."

    def __call__(self, image, boxes=None, labels=None):
        if random.randint(2):
            print('saturation')
            image[:, :, 1] *= random.uniform(self.lower, self.upper)

            # 已知 S 的范围是在 (0, 1)之间
            image[:, :, 1] = np.clip(image[:, :, 1], 0., 1.0)

        return image, boxes, labels

v2-9edd605bd7807542ac71b46a9a6188ae_b.jpg
RandomSaturation

9. 亮度变化

  • 图片的亮度变化,只需要在RGB空间下,加上一个delta值;
  • 要设置变化后数值在0~255之间
class RandomBrightness(object):
    """
    图片亮度的随机变化;
    变化公式:img(x) = img(x)+b
    """
    def __init__(self, delta=32):
        assert delta >= 0.0
        assert delta <= 255.0
        self.delta = delta

    def __call__(self, image, boxes=None, labels=None):
        if random.randint(2):
            delta = random.uniform(-self.delta, self.delta)
            image += delta

            # 限制image的范围[0, 255.0]
            image = np.clip(image, 0, 255)

        return image, boxes, labels

v2-a736dc415c48dafef5c8ab5150d58eea_b.jpg
RandomBrightness

10. 对比度变化

  • 图片的对比度变化,只需要在RGB空间下,乘上一个alpha值;
  • 要设置变化后数值在0~255之间
class RandomContrast(object):
    """
    图片对比度的随机变化;
    变化公式:img(x) = a*img(x)
    """
    def __init__(self, lower=0.5, upper=1.5):
        self.lower = lower
        self.upper = upper
        assert self.upper >= self.lower, "contrast upper must be >= lower."
        assert self.lower >= 0, "contrast lower must be non-negative."

    # expects float image
    def __call__(self, image, boxes=None, labels=None):
        if random.randint(2):
            alpha = random.uniform(self.lower, self.upper)
            image *= alpha

            # 限制image的范围[0, 255.0]
            image = np.clip(image, 0, 255)

        return image, boxes, labels

v2-be9cbb6df6c6db684f2ad49ba2f23059_b.jpg
RandomContrast

11. 颜色通道变化

✔️ 针对图片的RGB空间,随机调换各通道的位置,实现不同灯光效果

class SwapChannels(object):
    """
    图片通道变换
    Args:
        swaps: (int triple),变化的通道元组
                eg: (2, 1, 0)
    """

    def __init__(self, swaps):
        self.swaps = swaps

    def __call__(self, image):
        image = image[:, :, self.swaps]
        return image  

class RandomLightingNoise(object):
    """
    图片更换通道,形成的颜色变化
    """
    def __init__(self):
        self.perms = ((0, 1, 2), (0, 2, 1),
                      (1, 0, 2), (1, 2, 0),
                      (2, 0, 1), (2, 1, 0))

    def __call__(self, image, boxes=None, labels=None):
        if random.randint(2):
            print('RandomLightingNoise')
            swap = self.perms[random.randint(len(self.perms))]
            shuffle = SwapChannels(swap) 
            image = shuffle(image)

        return image, boxes, labels

v2-64017843faa538b46829138805549937_b.jpg
RandomLightingNoise

12. 图片镜像

✔️ 图片镜像是指图片的左右翻转,实现图片增广。

class RandomMirror(object):
    """
    随机镜像图片
    """    
    def __call__(self, image, boxes, labels):
        w = image.shape[1]
        if random.randint(2):
            # 图片翻转
            image = image[:, ::-1]

            # boxes的坐标也需要相应改变
            boxes = boxes.copy()
            boxes[:, 0::2] = w - boxes[:, 2::-2]

        return image, boxes, labels

v2-115257981c815fce8e8ab01f638edbc5_b.jpg
原图

v2-31a3c5640bb9b4d78b59acb44ce6ccf8_b.jpg
镜像图

13. 图片随机裁剪

✔️ 图片的随机裁剪在图片增强有着很大的应用,在考虑图片裁剪的过程中,裁剪的过程为:

  • 随机选取裁剪框的大小;
  • 根据大小确定裁剪框的坐标;
  • 分析裁剪框和图片内部bounding box的iou;
  • 筛选掉iou不符合要求的裁剪框
  • 裁剪图片,重新更新bounding box 的位置坐标

v2-e8427caa605f609be1789d1d4350fb6f_b.jpg
裁剪示意图
class RandomSampleCrop(object):
    """
    随机切割图片
    """
    def __init__(self):
        self.sample_options = (
            # 原图
            None,
            # min_iou 和 max_iou
            (0.1, None),
            (0.3, None),
            (0.7, None),
            (0.9, None),
            # randomly sample a patch
            (None, None),
        )
    def __call__(self, image, boxes=None, labels=None):
        print('crop now...')
        h, w, _ = image.shape
        while True:
            mode = random.choice(self.sample_options)
            if mode is None:
                return image, boxes, labels

            min_iou, max_iou = mode
            if min_iou is None:
                min_iou = float('-inf')
            if max_iou is None:
                max_iou = float('inf')

            # 迭代 n 次
            for i in range(50):
                current_image = image

                ww = random.uniform(0.3 * w, w)
                hh = random.uniform(0.3 * h, h)

                # 判断长宽比在一定范围
                if hh / ww < 0.5 or hh / ww > 2:
                    continue

                left = random.uniform(w - ww)
                top = random.uniform(h - hh)

                # 切割的矩形大小
                rect = np.array([int(left), int(top), int(left+ww), int(top+hh)])

                # 计算切割的矩形和 gt 框的iou大小
                overlap = iou_numpy(boxes, rect)

                # 筛选掉不满足 overlap条件的
                if overlap.min() < min_iou and max_iou < overlap.max():
                    continue

                current_image = current_image[rect[1]:rect[3], rect[0]:rect[2]]

                centers = (boxes[:, :2] + boxes[:, 2:]) / 2.0

                # 切割矩形 在所有的 gt box的中心点的左上方
                m1 = (rect[0] < centers[:, 0]) * (rect[1] < centers[:, 1])

                # 切割矩形 在所有的 gt box的中心点的右下方
                m2 = (rect[2] > centers[:, 0]) * (rect[3] > centers[:, 1])

                mask = m1 * m2

                if not mask.any():
                    continue

                current_boxes = boxes[mask, :].copy()
                current_labels = labels[mask]

                # 获取box和切割矩形的交点(左上角)  A点
                current_boxes[:, :2] = np.maximum(current_boxes[:, :2],
                                                  rect[:2])

                # 调节坐标系,让boxes的左上角坐标变为切割后的坐标
                current_boxes[:, :2] -= rect[:2]

                current_boxes[:, 2:] = np.minimum(current_boxes[:, 2:],
                                                  rect[2:])
                # 调节坐标系,让boxes的左上角坐标变为切割后的坐标
                current_boxes[:, 2:] -= rect[:2]

                return current_image, current_boxes, current_labels

v2-da184fc0e2639514e26e77724dc03be3_b.jpg
RandomSampleCrop

14. 图片扩充

✔️ 设置一个大于原图Size的随机size,填充指定的像素值,然后把原图随机放入这个图片中,实现原图的扩充。

class Expand(object):
    """
    随机扩充图片,expand
    """
    def __init__(self, mean):
        self.mean = mean

    def __call__(self ,image, boxes, labels):
        if random.randint(2):
            return image, boxes, labels
        h, w, c = image.shape
        ratio = random.uniform(1, 4)
        left = random.uniform(0, w*ratio - w)
        top = random.uniform(0, h*ratio - h)

        expand_image = np.zeros((int(h*ratio), int(w*ratio), c), 
                                dtype=image.dtype)

        # 填充 mean值
        expand_image[:,:,:] = self.mean
        # 放入原图
        expand_image[int(top):int(top+h), int(left):int(left+w)] = image

        image = expand_image

        # 同样相应的变化boxes的坐标
        boxes = boxes.copy()
        boxes[:, :2] += (int(left), int(top))
        boxes[:, 2:] += (int(left), int(top))

        return image, boxes, labels

v2-886f144c0ff68c369523efd71fa90f11_b.jpg
Expand

汇总

✔️ 最后,根据上面所有的方法,合并为数据增强的一个python类。

class PhotometricDistort(object):
    """
    图片亮度,对比度和色调变化的方式合并为一个类
    """
    def __init__(self):
        self.pd = [
            RandomContrast(),
            ConvertColor(transform='HSV'),
            RandomSaturation(),
            RandomHue(),
            ConvertColor(current='HSV', transform='BGR'),
            RandomContrast()
        ]
        self.rand_brightness = RandomBrightness()
        self.rand_light_noise = RandomLightingNoise()

    def __call__(self, image, boxes, labels):
        im = image.copy()
        im, boxes, labels = self.rand_brightness(im, boxes, labels)
        if random.randint(2):
            distort = Compose(self.pd[:-1])
        else:
            distort = Compose(self.pd[1:])

        im, boxes, labels = distort(im, boxes, labels)

        return self.rand_light_noise(im, boxes, labels)


# 结合所有的图片增广方法形成的类   
class SSDAugmentation(object):
    def __init__(self, size=300, mean=(104, 117, 123)):
        self.mean = mean
        self.size = size
        self.augment = Compose([
            ConvertFromInts(),  # 转化为float32
            ToAbsoluteCoords(), # 转化为原图坐标
            PhotometricDistort(), # 图片增强方式
            Expand(self.mean),  # 扩充
            RandomSampleCrop(), # 裁剪
            RandomMirror(), # 镜像
            ToPercentCoords(), # 转化为归一化后的坐标
            Resize(self.size), # Resize
            ToAbsoluteCoords(), # 转为原图坐标
            #SubtractMeans(self.mean), # 减去均值
        ])

    def __call__(self, image, boxes, labels):
        return self.augment(image, boxes, labels)
输出样图:

v2-80515ea3387357ab497a0787b405be8a_b.jpg
数据增强样图
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值