《The Annotated Transformer》环境配置

配置《The Annotated Transformer》论文代码运行环境时老是报错,没有找到完整可行的环境配置方法,很多问题更是找不到解决方法,万幸最后终于配了出来,现在把它记录下来,一是防止遗忘,二是希望能够给像我一样的小白提供一些便利

前提说明:本人非常小白,可能部分内容原理无法解释清楚,如有错误欢迎指正~

一、环境说明

win11系统

anaconda虚拟环境(Annotated_Transformer)

使用jupyter notebook

二、搭建虚拟环境

1.激活Anaconda Prompt

2.创建虚拟环境

conda create -n Annotated_Transformer python=3.8.19

3.激活Annotated_Transformer环境:

activate Annotated_Transformer

三、配置python库

(1)导入整个requirements文档

pip install -r "D:\Download\requirements.txt"

(requirements.txt路径要改为自定义保存地址)

注意:原txt中spacy3.2报错:issubclass() arg 1 must be a class,需要改为(spacy3.2.6)

(2)jupyter 交互

pip install ipykernel
python -m ipykernel install --name Annotated_Transformer

(3)安装spacy分词器文件

未安装报错——AttributeError: 'tuple' object has no attribute 'tb_frame'

1.可以使用github下载

https://github.com/explosion/spacy-models/releases/tag/de_core_news_sm-3.2.0

https://github.com/explosion/spacy-models/releases/tag/en_core_web_sm-3.2.0

2.直接下载文件在本地安装,注意:路径需要自行修改

pip install D:\Download\spacy\de_core_news_sm-3.2.0.tar.gz
pip install D:\Download\spacy\en_core_web_sm-3.2.0.tar.gz

(4)multi30k数据集报错

报错:Exception: Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None.

1.尝试解决:更改multi30k.py——没用

(multi30k.py路径:D:\Anaconda\envs\Annotated_Transformer\Lib\site-packages\torchtext\datasets)注意Anaconda下载路径

# URL = {
#     "train": r"http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz",
#     "valid": r"http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/validation.tar.gz",
#     "test": r"http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/mmt16_task1_test.tar.gz",
# }

# MD5 = {
#     "train": "20140d013d05dd9a72dfde46478663ba05737ce983f478f960c1123c6671be5e",
#     "valid": "a7aa20e9ebd5ba5adce7909498b94410996040857154dab029851af3a866da8c",
#     "test": "0681be16a532912288a91ddd573594fbdd57c0fbb81486eff7c55247e35326c2",
# }
URL = {
    "train": r"https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/training.tar.gz",
    "valid": r"https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/validation.tar.gz",
    "test": r"https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/mmt16_task1_test.tar.gz",
}

MD5 = {
    "train": "20140d013d05dd9a72dfde46478663ba05737ce983f478f960c1123c6671be5e",
    "valid": "a7aa20e9ebd5ba5adce7909498b94410996040857154dab029851af3a866da8c",
    "test": "6d1ca1dba99e2c5dd54cae1226ff11c2551e6ce63527ebb072a1f70f72a5cd36",
}

之前报错Exception: Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None.
改完后报错:Could not get the file at https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/validation.tar.gz. [RequestException] None.

2.使用multi30k本地路径

将Mulki30k数据集下载到本地D:\Download\spacy\datasets(自定义下载路径)
datasets文件夹包括Multi30k子文件夹,里面有
training.tar.gz
validation.tar.gz
mmt16_task1_test.tar.gz

修改ipynb代码中报错部分代码

#train, val, test = datasets.Multi30k(language_pair=("de", "en"))修改为
train, val, test = datasets.Multi30k(root=r'D:\Download\spacy\datasets',language_pair=("de", "en"))

词典结果会保存到vocab.pt,所以修改的代码可改回

#train, val, test = datasets.Multi30k(root=r'D:\Download\spacy\datasets',language_pair=("de", "en"))改回
train, val, test = datasets.Multi30k(language_pair=("de", "en"))

(5)其他可能报错

在jupyter中运行ipynb文件,若出现报错,可对应修改

报错from .autonotebook import tqdm as notebook_tqdm

pip install ipywidgets

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值