Pytorch的DataSet和DataLoader类编写套路

DataSet类的说明

An abstract class representing a :class:Dataset.

All datasets that represent a map from keys to data samples should subclass
it. All subclasses should overwrite :meth:`__getitem__`, supporting fetching a
data sample for a given key. Subclasses could also optionally overwrite
:meth:`__len__`, which is expected to return the size of the dataset by many
:class:`~torch.utils.data.Sampler` implementations and the default options
of :class:`~torch.utils.data.DataLoader`.

.. note::
  :class:`~torch.utils.data.DataLoader` by default constructs a index
  sampler that yields integral indices.  To make it work with a map-style
  dataset with non-integral indices/keys, a custom sampler must be provided.

来自Pytorch的官方文档,意思是说Dataset类为一个抽象类,每次编写Dataset类时需要继承(subclass)之后才可继续编写。

其次在编写自己使用的Dataset类时必须重写__getitem__方法和__len__方法。
方法__getitem__:获取指定index的数据

方法__len__:获取Dataset的长度

import os
from torch.utils.data import Dataset
from PIL import Image

class MyData(Dataset):

    def __init__(self, root_dir, label_dir):
        self.root_dir = root_dir
        self.label_dir = label_dir
        self.path = os.path.join(self.root_dir, self.label_dir)
        self.img_path = os.listdir(self.path)


    def __getitem__(self, index):
        image_name = self.img_path[index]
        image_item_path = os.path.join(self.root_dir, self.label_dir, image_name)
        img = Image.open(image_item_path)
        label = self.label_dir
        return img, label

    def __len__(self):
        return len(self.img_path)




root_dir = r"../Pytorch学习/dataset/train"
ants_label_dir = "ants"
bees_label_dir = "bees"

ants_data = MyData(root_dir=root_dir, label_dir=ants_label_dir)
print(len(ants_data))
bees_data = MyData(root_dir=root_dir, label_dir=bees_label_dir)
print(len(bees_data))

# 重载 + 号, 使得训练数据集变成两个小的数据集的和
train_data = ants_data + bees_data
print(len(train_data))

其次在DataLoader类也是在写神经网络时重要的一环,每一个epoch中都有对整个数据集进行iter迭代,对于一个loader中可以实例化成由多个batch_size组成的DataSet,每次epoch中,对于实例化的dataloader中每个batch_size进行如下模板:

开始训练 --> 附上0梯度 --> 计算预测值 --> 计算损失函数 --> 求微分 --> 下一次迭代

伪代码即:

for epoch in range(epochs):
	# 记录损失值、准确度等参数
	for iter in data_loader:"
	# 计算预测值、计算损失函数、求微分
	end
end
from torch.utils.tensorboard import SummaryWriter
from torchvision.datasets import CIFAR10
import torchvision
from torch.utils.data import DataLoader

test_data = CIFAR10("../data", train=False, transform=torchvision.transforms.ToTensor(), download=True)

# batch_size = 4 每次取4个batch
# 即若shuffle = True时,会取4个样本集作为一个batch进入迭代iter
test_loader = DataLoader(dataset=test_data, batch_size=4, shuffle=True, num_workers=0, drop_last=False)


writer = SummaryWriter("dataloader")
step = 0
for data in test_loader:
    imgs, targets = data
    writer.add_images("test_data", imgs, global_step=step)
    step += 1

writer.close()
# terminal中敲:tensorboard --logdir="dataloader"可以查看tensorboard
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值