综述
所有dataset继承自torchtext.data.Dataset,torchtext.data.Dataset继承自torch.utils.data.Dataset,他们都必须实现split和iters方法
approach1 splits:
# set up fields
TEXT = data.Field(lower=True, include_lengths=True, batch_first=True)
LABEL = data.Field(sequential=False)
# make splits for data
train, test = datasets.IMDB.splits(TEXT, LABEL)
# build the vocabulary
TEXT.build_vocab(train, vectors=GloVe(name='6B', dim=300))
LABEL.build_vocab(train)
# make iterator for splits
train_iter, test_iter = data.BucketIterator.splits(
(train, test), batch_size=3, device=0)
approach2 iters:
# use default configurations
train_iter, test_iter = datasets.IMDB.iters(batch_size=4)
WikiText-2数据集
CLASStorchtext.datasets.WikiText2(path, text_field, newline_eos=True, encoding='utf-8', **kwargs)
</