【PyTorch笔记 03】pytorch使用dataloader读取自定义类

42方生科技

于 2024-03-27 14:08:44 发布

阅读量340

点赞数 7

分类专栏：编程技术文章标签： pytorch 笔记人工智能

本文链接：https://blog.csdn.net/i13270752870/article/details/121328135

版权

编程技术专栏收录该内容

67 篇文章 0 订阅

订阅专栏

本文介绍了如何在PyTorch中自定义数据集，通过继承`torch.utils.data.Dataset`类并重载`__len__`和`__getitem__`方法，实现数据集大小的获取和数据的按索引获取，以及如何结合CSV文件和图像路径进行数据加载，同时演示了`FaceLandmarksDataset`的实例应用。

摘要由CSDN通过智能技术生成

pytorch提供了一个数据读取的方法，其由两个类构成：
（1）torch.utils.data.Dataset
（2）torch.utils.data.DataLoader

自定义类的说明

在定义自己的数据类时，需要继承torch.utils.data.Dataset，并且至少要重载两个方法__len__,和__getitem__，其中
（1）__len__返回的是数据集的大小
（2）__getitem__实现索引数据集中的某一个数据

# 一个自定义类示例
from torch.utils.data import DataLoader, Dataset
import torch

class MyDataset(Dataset): # 继承了Dataset类
     def __init__(self, csv_file, root_dir, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.landmarks_frame = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform
	
	# 负责读取当前batch需要的数据，以避免将所有数据导入内存，导致内存不够用
    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()
		
		# 读取image
        img_name = os.path.join(self.root_dir, self.landmarks_frame.iloc[idx, 0])
        image = io.imread(img_name)
        # 读取annotation
        landmarks = self.landmarks_frame.iloc[idx, 1:]
        landmarks = np.array([landmarks])
        landmarks = landmarks.astype('float').reshape(-1, 2)
        # 
        sample = {'image': image, 'landmarks': landmarks}

        if self.transform:
            sample = self.transform(sample)

        return sample

    def __len__(self):
        return len(self.landmarks_frame)

face_dataset = FaceLandmarksDataset(csv_file='data/faces/face_landmarks.csv', root_dir='data/faces/')
for i in range(len(face_dataset)):
    sample = face_dataset[i]
    print(i, sample['image'].shape, sample['landmarks'].shape)

42方生科技

关注

7
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
【PyTorch笔记 03】pytorch使用dataloader读取自定义类

在定义自己的数据类时，需要继承torch.utils.data.Dataset，并且至少要重载两个方法__len__,和__getitem__，其中（1）__len__返回的是数据集的大小（2）__getitem__实现索引数据集中的某一个数据# 一个自定义类示例class MyDataset(Dataset): # 继承了Dataset类"""Args:"""# 负责读取当前batch需要的数据，以避免将所有数据导入内存，导致内存不够用# 读取image。
复制链接

扫一扫