Pascal VOCdata数据集读取（pytorch)

最新推荐文章于 2024-04-29 22:01:40 发布

骚年，你渴望力量嘛？

最新推荐文章于 2024-04-29 22:01:40 发布

阅读量3.2k

点赞数 4

分类专栏：神经网络与深度学习理论基础文章标签： python 深度学习列表

本文链接：https://blog.csdn.net/weixin_45199303/article/details/109046457

版权

神经网络与深度学习理论基础专栏收录该内容

8 篇文章 0 订阅

订阅专栏

本文介绍了如何使用PyTorch加载PASCAL VOC数据集，包括数据集的目录结构、读取数据的方式，以及如何显示带有标签的图片。在读取数据集中，通过`voc_trainset`获取样本，展示图片尺寸、模式，并解析目标信息，特别是对物体边界框的显示。最后，代码遍历数据集并显示每个图片中的物体及其边界框。

摘要由CSDN通过智能技术生成

数据集介绍

数据集读取

加载带有标签数据的图片

数据集介绍

数据集下载链接

下载完成后的数据集如图所示：

文件夹结构
└── VOCdevkit     #根目录
    └── VOC2007 #不同年份的数据集，这里只下载了2007的
        ├── Annotations        #存放标签文件，与JPEGImages中的图片一一对应
        ├── ImageSets          #该目录下存放的都是txt文件，txt文件中每一行包含一个图片的名称
        ├── JPEGImages         #存放源图片

数据集读取

使用如下语句对train数据进行读取，返回的是一个元组，包含了标签(target)，图片（img)等信息。

voc_trainset = datasets.VOCDetection('H:/pytorch_exercise/data', year='2007', image_set='train', download=False)

输出voc_trainset的样例个数：

print('Number of samples: ', len(voc_trainset))

获得图片信息和标签信息：返回target为dict字典

img, target = voc_trainset[0]  # 获取第一个sample

在此可以对img类和target字典中的信息进行读取

img的值为：

<PIL.Image.Image image mode=RGB size=353x500 at 0x1FF63FEC828>

读取图片的大小和编码格式

print("picture size：", img.size)
print("picture mode：", img.mode)

target的值为：

{'annotation': {'folder': 'VOC2007', 'filename': '000001.jpg', 'source': {'database': 'The VOC2007 Database', 'annotation': 'PASCAL VOC2007', 'image': 'flickr', 'flickrid': '341012865'}, 'owner': {'flickrid': 'Fried Camels', 'name': 'Jinky the Fruit Bat'}, 'size': {'width': '353', 'height': '500', 'depth': '3'}, 'segmented': '0', 'object': [{'name': 'dog', 'pose': 'Left', 'truncated': '1', 'difficult': '0', 'bndbox': {'xmin': '48', 'ymin': '240', 'xmax': '195', 'ymax': '371'}}, {'name': 'person', 'pose': 'Left', 'truncated': '1', 'difficult': '0', 'bndbox': {'xmin': '8', 'ymin': '12', 'xmax': '352', 'ymax': '498'}}]}}

target为字典，可以按照python字典的读取方式进行读取，下面举例：读取第一张图片的第二个物体的名称，返回的是person

print(target["annotation"]["object"][1]["name"])

加载带有标签数据的图片

import torchvision.datasets as datasets
import torchvision
import torch
import numpy as np
import cv2

voc_trainset = datasets.VOCDetection('H:/pytorch_exercise/data', year='2007', image_set='train', download=False)#加载数据集

print('Number of samples: ', len(voc_trainset))


# img, target = voc_trainset[0]  # 获取第一个sample
# print("picture size：", img.size)
# print("picture mode：", img.mode)
# print(target["annotation"]["object"][1]["name"])

# 显示矩形框
def show_object_rect(image: np.ndarray, bndbox):
    pt1 = bndbox[:2]
    pt2 = bndbox[2:]
    image_show = image
    return cv2.rectangle(image_show, pt1, pt2, (0, 255, 255), 2)


# 显示物体名字
def show_object_name(image: np.ndarray, name: str, p_tl):
    return cv2.putText(image, name, p_tl, 1, 1, (255, 0, 0))


# 从第一张图片遍历所有图片
for i, sample in enumerate(voc_trainset, 1):
    image, annotation = sample[0], sample[1]['annotation']
    objects = annotation['object']
    show_image = np.array(image)
    print('{} object:{}'.format(i, len(objects)))  # format函数对两个字符进行拼接
    # isinstance 用来判断一个对象的变量类型
    # 一个object中只有一个物体时，使用以下语句
    if not isinstance(objects, list):
        object_name = objects['name']
        object_bndbox = objects['bndbox']
        x_min = int(object_bndbox['xmin'])
        y_min = int(object_bndbox['ymin'])
        x_max = int(object_bndbox['xmax'])
        y_max = int(object_bndbox['ymax'])
        show_image = show_object_rect(show_image, (x_min, y_min, x_max, y_max))
        show_image = show_object_name(show_image, object_name, (x_min, y_min))
    # 否则object如果有多个变量，使用以下语句
    else:
        for j in objects:
            object_name = j['name']
            object_bndbox = j['bndbox']
            x_min = int(object_bndbox['xmin'])
            y_min = int(object_bndbox['ymin'])
            x_max = int(object_bndbox['xmax'])
            y_max = int(object_bndbox['ymax'])
            show_image = show_object_rect(show_image, (x_min, y_min, x_max, y_max))
            show_image = show_object_name(show_image, object_name, (x_min, y_min))
    show_image = cv2.cvtColor(show_image, cv2.COLOR_BGR2RGB)
    cv2.imshow('image', show_image)
    cv2.waitKey(0)

print(voc_trainset)
print('Down load ok')

一个结果如下图

BUG&&思考：

1.本地VOCdata数据集加载过程中，出现RuntimeError: Dataset not found or corrupted. You can use download=True to download it 。大多是因为加载路径没设置好，加载路径要设置为包含VOCdevkit文件夹的路径，而不是包含VOC2007文件夹的路径

2.程序中if not isinstance(objects, list)的意思是把objects类的类型与list列表类型进行对比，即判断objects是否为列表。为什么要判断呢？当objects中有多个物体时候，则需要进行循环，读取多个物体的名称，当只有单个物体时候，只需要执行一次就行。