配置《The Annotated Transformer》论文代码运行环境时老是报错,没有找到完整可行的环境配置方法,很多问题更是找不到解决方法,万幸最后终于配了出来,现在把它记录下来,一是防止遗忘,二是希望能够给像我一样的小白提供一些便利
前提说明:本人非常小白,可能部分内容原理无法解释清楚,如有错误欢迎指正~
一、环境说明
win11系统
anaconda虚拟环境(Annotated_Transformer)
使用jupyter notebook
二、搭建虚拟环境
1.激活Anaconda Prompt
2.创建虚拟环境
conda create -n Annotated_Transformer python=3.8.19
3.激活Annotated_Transformer环境:
activate Annotated_Transformer
三、配置python库
(1)导入整个requirements文档
pip install -r "D:\Download\requirements.txt"
(requirements.txt路径要改为自定义保存地址)
注意:原txt中spacy3.2报错:issubclass() arg 1 must be a class,需要改为(spacy3.2.6)
(2)jupyter 交互
pip install ipykernel
python -m ipykernel install --name Annotated_Transformer
(3)安装spacy分词器文件
未安装报错——AttributeError: 'tuple' object has no attribute 'tb_frame'
1.可以使用github下载
https://github.com/explosion/spacy-models/releases/tag/de_core_news_sm-3.2.0
https://github.com/explosion/spacy-models/releases/tag/en_core_web_sm-3.2.0
2.直接下载文件在本地安装,注意:路径需要自行修改
pip install D:\Download\spacy\de_core_news_sm-3.2.0.tar.gz
pip install D:\Download\spacy\en_core_web_sm-3.2.0.tar.gz
(4)multi30k数据集报错
报错:Exception: Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None.
1.尝试解决:更改multi30k.py——没用
(multi30k.py路径:D:\Anaconda\envs\Annotated_Transformer\Lib\site-packages\torchtext\datasets)注意Anaconda下载路径
# URL = {
# "train": r"http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz",
# "valid": r"http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/validation.tar.gz",
# "test": r"http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/mmt16_task1_test.tar.gz",
# }
# MD5 = {
# "train": "20140d013d05dd9a72dfde46478663ba05737ce983f478f960c1123c6671be5e",
# "valid": "a7aa20e9ebd5ba5adce7909498b94410996040857154dab029851af3a866da8c",
# "test": "0681be16a532912288a91ddd573594fbdd57c0fbb81486eff7c55247e35326c2",
# }
URL = {
"train": r"https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/training.tar.gz",
"valid": r"https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/validation.tar.gz",
"test": r"https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/mmt16_task1_test.tar.gz",
}
MD5 = {
"train": "20140d013d05dd9a72dfde46478663ba05737ce983f478f960c1123c6671be5e",
"valid": "a7aa20e9ebd5ba5adce7909498b94410996040857154dab029851af3a866da8c",
"test": "6d1ca1dba99e2c5dd54cae1226ff11c2551e6ce63527ebb072a1f70f72a5cd36",
}
之前报错Exception: Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None.
改完后报错:Could not get the file at https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/validation.tar.gz. [RequestException] None.
2.使用multi30k本地路径
将Mulki30k数据集下载到本地D:\Download\spacy\datasets(自定义下载路径)
datasets文件夹包括Multi30k子文件夹,里面有
training.tar.gz
validation.tar.gz
mmt16_task1_test.tar.gz
修改ipynb代码中报错部分代码
#train, val, test = datasets.Multi30k(language_pair=("de", "en"))修改为
train, val, test = datasets.Multi30k(root=r'D:\Download\spacy\datasets',language_pair=("de", "en"))
词典结果会保存到vocab.pt,所以修改的代码可改回
#train, val, test = datasets.Multi30k(root=r'D:\Download\spacy\datasets',language_pair=("de", "en"))改回
train, val, test = datasets.Multi30k(language_pair=("de", "en"))
(5)其他可能报错
在jupyter中运行ipynb文件,若出现报错,可对应修改
报错from .autonotebook import tqdm as notebook_tqdm
pip install ipywidgets