pytorch自定义数据读取

最新推荐文章于 2024-07-23 11:33:29 发布

CCB_307

最新推荐文章于 2024-07-23 11:33:29 发布

阅读量830

点赞数

分类专栏： python pytorch

本文链接：https://blog.csdn.net/qq_24306353/article/details/82292329

版权

python 同时被 2 个专栏收录

9 篇文章 0 订阅

订阅专栏

pytorch

2 篇文章 0 订阅

订阅专栏

pytorch的数据读取主要跟三个类有关：
1. Dataset
2. DataLoader
3. DataLoaderIter

定义自己的Dataset类

定义自己的Dataset类（继承torch.utils.data.Dataset）并实现两个成员方法
1. __getitem__()
2. __len__()

class MyDataset(data.Dataset):
    # 读取存储image路径的txt文件
    def __init__(self, imagelistfile, labellistfile):
        with open(imagelistfile) as fb:
            self.imagelistpaths = fb.readlines()
        with open(labellistfile) as fb:
            self.labellistpaths = fb.readlines()

    # 读取存储image路径的txt文件
    def __getitem__(self, index):
        img_path = self.imagelistpaths[index]
        label = self.labellistpaths[index]
        img = Image.open(img_path.strip()) #使用PIL读取图片
        img = np.asarray(img,dtype=np.uint8) # Image.open读取后还不是array形式，所以要转换，不然下面转换会报错
        img = np.reshape(img,(1,28,28))
        # 这里需要将img重置为[channel,height,width],因为要对应神经网络的输入，原来的img是[height,width]
        img = torch.tensor(img)
        label = torch.tensor(int(label)) # 转为tensor类型
        return img, label #最后一定要return tensor类型不然会报错

    def __len__(self):
        return len(self.imagelistpaths)

DataLoader类

class torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False, sampler=None, batch_sampler=None, num_workers=0, collate_fn=<function default_collate>, pin_memory=False, drop_last=False)