正在利用谷歌的 colab 学习 pytorch 关于 torchtext 的官方教程:LANGUAGE TRANSLATION WITH TORCHTEXT
执行下列代码时就遇到了报错
from torchtext.data import Field, BucketIterator
SRC = Field(tokenize = "spacy",
tokenizer_language="de",
init_token = '<sos>',
eos_token = '<eos>',
lower = True)
TRG = Field(tokenize = "spacy",
tokenizer_language="en",
init_token = '<sos>',
eos_token = '<eos>',
lower = True)
train_data, valid_data, test_data = Multi30k.splits(exts = ('.de', '.en'),
fields = (SRC, TRG))
TypeError: init() got an unexpected keyword argument ‘tokenizer_language’
但是查看官方关于torchtext 0.6.0的文档,发现明明有这个函数定义:
CLASS torchtext.data.Field(sequential=True,
use_vocab=True,
init_token=None,
eos_token=None,
fix_length=None,
dtype=torch.int64,
preprocessing=None,
postprocessing=None,
lower=False,
tokenize=None,
tokenizer_language='en',
include_lengths=False,
batch_first=False,
pad_token='<pad>',
unk_token='<unk>',
pad_first=False,
truncate_first=False,
stop_words=None,
is_target=False )
看了一眼colab的 torchtext 版本:
!pip list
发现 torchtext 版本竟然是0.3.0的,于是重启jupyter notebook(如果不重启的话,notebook会提示你已经载入了torchtext库,不能更新版本),并安装较新的版本:
!pip install torchtext==0.4.0
问题解决,代码成功运行。
downloading training.tar.gz
training.tar.gz: 100%|██████████| 1.21M/1.21M [00:02<00:00, 598kB/s]
downloading validation.tar.gz
validation.tar.gz: 100%|██████████| 46.3k/46.3k [00:00<00:00, 167kB/s]
downloading mmt_task1_test2016.tar.gz
mmt_task1_test2016.tar.gz: 100%|██████████| 66.2k/66.2k [00:00<00:00, 162kB/s]