pytorch有自带的库,torchvision.datasets.ImageFolder用于读取数据集,将数据集放在相应文件位置下,文件名就是标签,效果如下:
数据读取方式:
train_datasets = datasets.ImageFolder(train_path, transform=train_transforms)
train_data_size = len(train_datasets)
train_data = torch.utils.data.DataLoader(train_datasets, batch_size=batch_size, shuffle=True)
读取的时候会将文件名赋予编号,才能输入模型,使用的时候遇到一个问题,怎么知道这些编号对应哪个文件呢?代码如下:
train_directory = config.TRAIN_DATASET_DIR
train_datasets = datasets.ImageFolder(train_directory)
print('data_size:',len(train_datasets))
print('data_class:',train_datasets.classes)
print('class_id:',train_datasets.class_to_idx)
out:
data_class: ['overground', 'underground']
class_id: {'overground': 0, 'underground': 1}
也可以通过打断点查看每张图片对应的标签: