复现数据增强实验（2）--Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations

最新推荐文章于 2024-08-12 21:37:19 发布

NA_QUEEN

最新推荐文章于 2024-08-12 21:37:19 发布

阅读量774

点赞数

分类专栏：数据增广文章标签： data augmentation

本文链接：https://blog.csdn.net/NANA170110/article/details/99741138

版权

数据增广专栏收录该内容

2 篇文章 0 订阅

订阅专栏

上一篇记录到第四步，接下来记录接下来的几步，目前为止没有遇到问题。

（5） # train a bi-directional language model
命令：python -u train.py -g 0 --train datasets/wikitext-103-raw/spacy_wikitext-103-raw.train --valid datasets/wikitext-103-raw/spacy_wikitext-103-raw.valid --vocab datasets/wikitext-103-raw/spacy_wikitext-103-raw.train.vocab.t50 -u 1024 --layer 1 --dropout 0.1 --batchsize 64 --out trained_bilm

提示：提示有错误。

Traceback (most recent call last):

File "train.py", line 9, in <module>

import chain_utils

File "/dnn4_added/wanglina/contextual_augmentation-master/chain_utils.py", line 7, in <module>

import progressbar

ModuleNotFoundError: No module named 'progressbar'

解决：没有progressbar模块，需安装。

$ pip install progressbar

Collecting progressbar

Downloading https://files.pythonhosted.org/packages/a3/a6/b8e451f6cff1c99b4747a2f7235aa904d2d49e8e1464e0b798272aa84358/progressbar-2.5.tar.gz

Building wheels for collected packages: progressbar

Building wheel for progressbar (setup.py) ... done

Stored in directory: /sunj/wanglina/.cache/pip/wheels/c0/e9/6b/ea01090205e285175842339aa3b491adeb4015206cda272ff0

Successfully built progressbar

Installing collected packages: progressbar

Successfully installed progressbar-2.5

继续运行，又有错误。

提示：

Traceback (most recent call last):

File "train.py", line 13, in <module>

from text_classification import text_datasets

File "/dnn4_added/wanglina/contextual_augmentation-master/text_classification/text_datasets.py", line 13, in <module>

from nlp_utils import make_vocab

ModuleNotFoundError: No module named 'nlp_utils'

解决：因为没有nlp_utils模块，需安装。

$ pip install nlp_utils

Collecting nlp_utils

Downloading https://files.pythonhosted.org/packages/4b/c6/d73700ba68b7d894345334d37e3e2d27028c38ae1fa7c61e8c8b9a368c5b/nlp_utils-0.1-py3-none-any.whl

Collecting tensorflow (from nlp_utils)

Downloading https://files.pythonhosted.org/packages/de/f0/96fb2e0412ae9692dbf400e5b04432885f677ad6241c088ccc5fe7724d69/tensorflow-1.14.0-cp36-cp36m-manylinux1_x86_64.whl (109.2MB)

|████████████████████████████████| 109.2MB 28kB/s

Collecting nltk (from nlp_utils)

Downloading https://files.pythonhosted.org/packages/87/16/4d247e27c55a7b6412e7c4c86f2500ae61afcbf5932b9e3491f8462f8d9e/nltk-3.4.4.zip (1.5MB)

|████████████████████████████████| 1.5MB 1.2MB/s

Ignoring singledispatch: markers 'python_version < "3.4"' don't match your environment

Requirement already satisfied: numpy in /*****************/python-3.6/lib/python3.6/site-packages (from nlp_utils) (1.16.4)

Requirement already satisfied: absl-py>=0.7.0 in //*****************/python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (0.7.1)

Requirement already satisfied: gast>=0.2.0 in /*****************/python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (0.2.2)

Collecting google-pasta>=0.1.6 (from tensorflow->nlp_utils)

Downloading https://files.pythonhosted.org/packages/d0/33/376510eb8d6246f3c30545f416b2263eee461e40940c2a4413c711bdf62d/google_pasta-0.1.7-py3-none-any.whl (52kB)

|████████████████████████████████| 61kB 315kB/s

Requirement already satisfied: wheel>=0.26 in /*****************/python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (0.33.4)

Requirement already satisfied: protobuf>=3.6.1 in /*****************//python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (3.7.1)

Requirement already satisfied: six>=1.10.0 in //*****************/python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (1.12.0)

Requirement already satisfied: termcolor>=1.1.0 in /*****************/python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (1.1.0)

Requirement already satisfied: grpcio>=1.8.6 in /*****************/python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (1.22.0)

Requirement already satisfied: keras-applications>=1.0.6 in /*****************/python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (1.0.8)

Requirement already satisfied: keras-preprocessing>=1.0.5 in /*****************/python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (1.1.0)

Requirement already satisfied: astor>=0.6.0 in /*****************/python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (0.8.0)

Collecting wrapt>=1.11.1 (from tensorflow->nlp_utils)

Downloading https://files.pythonhosted.org/packages/23/84/323c2415280bc4fc880ac5050dddfb3c8062c2552b34c2e512eb4aa68f79/wrapt-1.11.2.tar.gz

Collecting tensorflow-estimator<1.15.0rc0,>=1.14.0rc0 (from tensorflow->nlp_utils)

Downloading https://files.pythonhosted.org/packages/3c/d5/21860a5b11caf0678fbc8319341b0ae21a07156911132e0e71bffed0510d/tensorflow_estimator-1.14.0-py2.py3-none-any.whl (488kB)

|████████████████████████████████| 491kB 1.1MB/s

Collecting tensorboard<1.15.0,>=1.14.0 (from tensorflow->nlp_utils)

Downloading https://files.pythonhosted.org/packages/91/2d/2ed263449a078cd9c8a9ba50ebd50123adf1f8cfbea1492f9084169b89d9/tensorboard-1.14.0-py3-none-any.whl (3.1MB)

|████████████████████████████████| 3.2MB 1.0MB/s

Requirement already satisfied: setuptools in //*****************/python-3.6/lib/python3.6/site-packages (from protobuf>=3.6.1->tensorflow->nlp_utils) (28.8.0)

Requirement already satisfied: h5py in /*****************/python-3.6/lib/python3.6/site-packages (from keras-applications>=1.0.6->tensorflow->nlp_utils) (2.9.0)

Requirement already satisfied: werkzeug>=0.11.15 in /*****************/python-3.6/lib/python3.6/site-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->nlp_utils) (0.15.5)

Requirement already satisfied: markdown>=2.6.8 in /*****************//python-3.6/lib/python3.6/site-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->nlp_utils) (3.1.1)

Building wheels for collected packages: nltk, wrapt

Building wheel for nltk (setup.py) ... done

Stored in directory: /sunj/wanglina/.cache/pip/wheels/41/c8/31/48ace4468e236e0e8435f30d33e43df48594e4d53e367cf061

Building wheel for wrapt (setup.py) ... done

Stored in directory: /sunj/wanglina/.cache/pip/wheels/d7/de/2e/efa132238792efb6459a96e85916ef8597fcb3d2ae51590dfd

Successfully built nltk wrapt

ERROR: nltk 3.4.4 requires singledispatch, which is not installed.

ERROR: tensorflow-gpu 1.11.0 has requirement tensorboard<1.12.0,>=1.11.0, but you'll have tensorboard 1.14.0 which is incompatible.

ERROR: tensorboard 1.14.0 has requirement setuptools>=41.0.0, but you'll have setuptools 28.8.0 which is incompatible.

Installing collected packages: google-pasta, wrapt, tensorflow-estimator, tensorboard, tensorflow, nltk, nlp-utils

Found existing installation: tensorboard 1.11.0

Uninstalling tensorboard-1.11.0:

Successfully uninstalled tensorboard-1.11.0

Successfully installed google-pasta-0.1.7 nlp-utils-0.1 nltk-3.4.4 tensorboard-1.14.0 tensorflow-1.14.0 tensorflow-estimator-1.14.0 wrapt-1.11.2

WARNING: You are using pip version 19.1.1, however version 19.2.1 is available.

You should consider upgrading via the 'pip install --upgrade pip' command.

提示：

Traceback (most recent call last):

File "train.py", line 13, in <module>

from text_classification import text_datasets

File "/dnn4_added/wanglina/contextual_augmentation-master/text_classification/text_datasets.py", line 13, in <module>

from nlp_utils import make_vocab

ImportError: cannot import name 'make_vocab'

原因：查看是否有对应文件，import nlp_utils;print(nlp_utils) ， nlp_utils 中没有make_vocab文件。

解决：将text_classification/nlp_utils.py文件拷贝到/wln_install/python-3.6/lib/python3.6/site-packages/nlp_utils.py

继续运行：python -u train.py -g 0 --train datasets/wikitext-103-raw/spacy_wikitext-103-raw.train --valid datasets/wikitext-103-raw/spacy_wikitext-103-raw.valid --vocab datasets/wikitext-103-raw/spacy_wikitext-103-raw.train.vocab.t50 -u 1024 --layer 1 --dropout 0.1 --batchsize 64 --out trained_bilm

报错：

{

"gpu": 0,

"out": "trained_bilm",

"batchsize": 64,

"epoch": 5,

"gradclip": 10,

"lr": 0.0001,

"unit": 1024,

"layer": 1,

"dropout": 0.1,

"vocab": "datasets/wikitext-103-raw/spacy_wikitext-103-raw.train.vocab.t50",

"train_path": "datasets/wikitext-103-raw/spacy_wikitext-103-raw.train",

"valid_path": "datasets/wikitext-103-raw/spacy_wikitext-103-raw.valid",

"resume": null,

"labeled_dataset": null,

"no_label": false,

"validation": false

}

Traceback (most recent call last):

File "train.py", line 168, in <module>

main()

File "train.py", line 75, in main

args.train_path, vocab, chain_length=1)

File "/dnn4_added/wanglina/contextual_augmentation-master/chain_utils.py", line 87, in __init__

get_last_only=get_last_only)

File "/dnn4_added/wanglina/contextual_augmentation-master/chain_utils.py", line 63, in make_chain_dataset

max_value=n_lines):

TypeError: __call__() got an unexpected keyword argument 'max_value'

原因：这里是粘贴的一个同学的答案，感谢~

这是progressbar这个组件的问题。text prograss bar 就是一个进度条。

max_value代表进度条的长度，在一些情况下，有的人说，把参数名换成maxval吧。

但深入源码，大伙儿会发现，根本不是换成maxval的问题，更新了也不顶用。

这个时候呢，大伙可以尝试重装一下progressbar最新版。

但实话说，我不建议这样。我推荐一个新组件。

from tqdm import tqdm

换一个进度条工具。

for line in tqdm(io.open(path, encoding='utf-8'),total=n_lines):

更改一下这一行的代码，就可以解决了。

解决：修改chain_utils.py中的相关代码，改为以上描述中的代码格式。

加上：from tqdm import tqdm

修改前：for line in bar(io.open(path, encoding='utf-8'),max_value=n_lines):

修改后：for line in tqdm(io.open(path, encoding='utf-8'),total=n_lines):

再次运行：

python -u train.py -g 0 --train datasets/wikitext-103-raw/spacy_wikitext-103-raw.train --valid datasets/wikitext-103-raw/spacy_wikitext-103-raw.valid --vocab datasets/wikitext-103-raw/spacy_wikitext-103-raw.train.vocab.t50 -u 1024 --layer 1 --dropout 0.1 --batchsize 64 --out trained_bilm

提示：

{

"gpu": 0,

"out": "trained_bilm",

"batchsize": 64,

"epoch": 5,

"gradclip": 10,

"lr": 0.0001,

"unit": 1024,

"layer": 1,

"dropout": 0.1,

"vocab": "datasets/wikitext-103-raw/spacy_wikitext-103-raw.train.vocab.t50",

"train_path": "datasets/wikitext-103-raw/spacy_wikitext-103-raw.train",

"valid_path": "datasets/wikitext-103-raw/spacy_wikitext-103-raw.valid",

"resume": null,

"labeled_dataset": null,

"no_label": false,

"validation": false

}

100%|???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????| 5407481/5407481 [01:24<00:00, 63936.20it/s]

100%|???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????| 11415/11415 [00:00<00:00, 59487.21it/s]

#train = 4771160

#valid = 10116

#vocab = 49873

74549 iters per epoch

log and eval are scheduled at every (745, 'iteration') and (37250, 'iteration')

iter/epoch 74549

Training start

epoch iteration main/perp validation/main/perp elapsed_time

0 745 302.07 16118.7

0 1490 232.779 16573.7

0 2235 170.714 17027.5

0 2980 127.466 17481.7