1.应用
import torch
import torch.utils.data as Data
x = torch.linspace(1, 10, 10)
y = torch.linspace(10, 1, 10)
torch_dataset = Data.TensorDataset(x, y)
loader = Data.DataLoader(
dataset=torch_dataset,
batch_size=5,
shuffle=True,
num_workers=2,
)
for epoch in range(3):
for step, (batch_x, batch_y) in enumerate(loader):
# training
print("epoch:{}, step:{}, batch_x:{}, batch_y:{}".format(epoch, step, batch_x, batch_y))
API
CLASS torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False, sampler=None, batch_sampler=None, num_workers=0, collate_fn=None, pin_memory=False, drop_last=False, timeout=0, worker_init_fn=None, multiprocessing_context=None, generator=None)
参数 | 描述 |
---|---|
dataset (Dataset) | 数据集 |
batch_size (int, optional) | 每个batch包含的样本数量 (default: 1). step = total/batch_size |
shuffle (bool, optional) | 设置为True在每个epoch重新排列数据(默认值:False,训练时打乱较好) |
sampler (Sampler or Iterable, optional) | 定义从数据集中提取样本的策略。如果指定,则忽略shuffle参数。 |
num_workers (int, optional) | 用于数据加载的子进程数。0表示数据将在主进程中加载(默认值:0) |
collate_fn (callable, optional) | 合并样本列表以形成小批量。 |
pin_memory (bool, optional) | 如果为True,数据加载器在返回前将张量复制到CUDA固定内存中。 |
drop_last (bool, optional) | 如果数据集大小不能被batch_size整除,设置为True可删除最后一个不完整的批处理。如果设为False并且数据集的大小不能被batch_size整除,则最后一个batch将更小。(默认: False) |
timeout (numeric, optional) | |
worker_init_fn (callable, optional) |
参考:
https://pytorch.org/docs/stable/data.html?highlight=tensordataset#module-torch.utils.data
https://cloud.tencent.com/developer/article/1592676
https://ptorch.com/docs/1/utils-data