复现数据增强实验(2)--Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations

上一篇记录到第四步,接下来记录接下来的几步,目前为止没有遇到问题。

(5) # train a bi-directional language model
命令:python -u train.py -g 0 --train datasets/wikitext-103-raw/spacy_wikitext-103-raw.train --valid datasets/wikitext-103-raw/spacy_wikitext-103-raw.valid --vocab datasets/wikitext-103-raw/spacy_wikitext-103-raw.train.vocab.t50 -u 1024 --layer 1 --dropout 0.1 --batchsize 64 --out trained_bilm

提示:提示有错误。

Traceback (most recent call last):

  File "train.py", line 9, in <module>

    import chain_utils

  File "/dnn4_added/wanglina/contextual_augmentation-master/chain_utils.py", line 7, in <module>

    import progressbar

ModuleNotFoundError: No module named 'progressbar'

解决:没有progressbar模块,需安装。

$ pip install progressbar

Collecting progressbar

  Downloading https://files.pythonhosted.org/packages/a3/a6/b8e451f6cff1c99b4747a2f7235aa904d2d49e8e1464e0b798272aa84358/progressbar-2.5.tar.gz

Building wheels for collected packages: progressbar

  Building wheel for progressbar (setup.py) ... done

  Stored in directory: /sunj/wanglina/.cache/pip/wheels/c0/e9/6b/ea01090205e285175842339aa3b491adeb4015206cda272ff0

Successfully built progressbar

Installing collected packages: progressbar

Successfully installed progressbar-2.5

 继续运行,又有错误。

提示:

Traceback (most recent call last):

  File "train.py", line 13, in <module>

    from text_classification import text_datasets

  File "/dnn4_added/wanglina/contextual_augmentation-master/text_classification/text_datasets.py", line 13, in <module>

    from nlp_utils import make_vocab

ModuleNotFoundError: No module named 'nlp_utils'

解决:因为没有nlp_utils模块,需安装。

$ pip install nlp_utils

Collecting nlp_utils

  Downloading https://files.pythonhosted.org/packages/4b/c6/d73700ba68b7d894345334d37e3e2d27028c38ae1fa7c61e8c8b9a368c5b/nlp_utils-0.1-py3-none-any.whl

Collecting tensorflow (from nlp_utils)

  Downloading https://files.pythonhosted.org/packages/de/f0/96fb2e0412ae9692dbf400e5b04432885f677ad6241c088ccc5fe7724d69/tensorflow-1.14.0-cp36-cp36m-manylinux1_x86_64.whl (109.2MB)

    |████████████████████████████████| 109.2MB 28kB/s

Collecting nltk (from nlp_utils)

  Downloading https://files.pythonhosted.org/packages/87/16/4d247e27c55a7b6412e7c4c86f2500ae61afcbf5932b9e3491f8462f8d9e/nltk-3.4.4.zip (1.5MB)

     |████████████████████████████████| 1.5MB 1.2MB/s

  Ignoring singledispatch: markers 'python_version < "3.4"' don't match your environment

Requirement already satisfied: numpy in /*****************/python-3.6/lib/python3.6/site-packages (from nlp_utils) (1.16.4)

Requirement already satisfied: absl-py>=0.7.0 in //*****************/python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (0.7.1)

Requirement already satisfied: gast>=0.2.0 in /*****************/python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (0.2.2)

Collecting google-pasta>=0.1.6 (from tensorflow->nlp_utils)

  Downloading https://files.pythonhosted.org/packages/d0/33/376510eb8d6246f3c30545f416b2263eee461e40940c2a4413c711bdf62d/google_pasta-0.1.7-py3-none-any.whl (52kB)

     |████████████████████████████████| 61kB 315kB/s

Requirement already satisfied: wheel>=0.26 in /*****************/python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (0.33.4)

Requirement already satisfied: protobuf>=3.6.1 in /*****************//python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (3.7.1)

Requirement already satisfied: six>=1.10.0 in //*****************/python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (1.12.0)

Requirement already satisfied: termcolor>=1.1.0 in /*****************/python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (1.1.0)

Requirement already satisfied: grpcio>=1.8.6 in /*****************/python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (1.22.0)

Requirement already satisfied: keras-applications>=1.0.6 in /*****************/python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (1.0.8)

Requirement already satisfied: keras-preprocessing>=1.0.5 in /*****************/python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (1.1.0)

Requirement already satisfied: astor>=0.6.0 in /*****************/python-3.6/lib/python3.6/site-packages (from tensorflow->nlp_utils) (0.8.0)

Collecting wrapt>=1.11.1 (from tensorflow->nlp_utils)

  Downloading https://files.pythonhosted.org/packages/23/84/323c2415280bc4fc880ac5050dddfb3c8062c2552b34c2e512eb4aa68f79/wrapt-1.11.2.tar.gz

Collecting tensorflow-estimator<1.15.0rc0,>=1.14.0rc0 (from tensorflow->nlp_utils)

  Downloading https://files.pythonhosted.org/packages/3c/d5/21860a5b11caf0678fbc8319341b0ae21a07156911132e0e71bffed0510d/tensorflow_estimator-1.14.0-py2.py3-none-any.whl (488kB)

     |████████████████████████████████| 491kB 1.1MB/s

Collecting tensorboard<1.15.0,>=1.14.0 (from tensorflow->nlp_utils)

  Downloading https://files.pythonhosted.org/packages/91/2d/2ed263449a078cd9c8a9ba50ebd50123adf1f8cfbea1492f9084169b89d9/tensorboard-1.14.0-py3-none-any.whl (3.1MB)

     |████████████████████████████████| 3.2MB 1.0MB/s

Requirement already satisfied: setuptools in //*****************/python-3.6/lib/python3.6/site-packages (from protobuf>=3.6.1->tensorflow->nlp_utils) (28.8.0)

Requirement already satisfied: h5py in /*****************/python-3.6/lib/python3.6/site-packages (from keras-applications>=1.0.6->tensorflow->nlp_utils) (2.9.0)

Requirement already satisfied: werkzeug>=0.11.15 in /*****************/python-3.6/lib/python3.6/site-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->nlp_utils) (0.15.5)

Requirement already satisfied: markdown>=2.6.8 in /*****************//python-3.6/lib/python3.6/site-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->nlp_utils) (3.1.1)

Building wheels for collected packages: nltk, wrapt

  Building wheel for nltk (setup.py) ... done

  Stored in directory: /sunj/wanglina/.cache/pip/wheels/41/c8/31/48ace4468e236e0e8435f30d33e43df48594e4d53e367cf061

  Building wheel for wrapt (setup.py) ... done

  Stored in directory: /sunj/wanglina/.cache/pip/wheels/d7/de/2e/efa132238792efb6459a96e85916ef8597fcb3d2ae51590dfd

Successfully built nltk wrapt

ERROR: nltk 3.4.4 requires singledispatch, which is not installed.

ERROR: tensorflow-gpu 1.11.0 has requirement tensorboard<1.12.0,>=1.11.0, but you'll have tensorboard 1.14.0 which is incompatible.

ERROR: tensorboard 1.14.0 has requirement setuptools>=41.0.0, but you'll have setuptools 28.8.0 which is incompatible.

Installing collected packages: google-pasta, wrapt, tensorflow-estimator, tensorboard, tensorflow, nltk, nlp-utils

  Found existing installation: tensorboard 1.11.0

    Uninstalling tensorboard-1.11.0:

      Successfully uninstalled tensorboard-1.11.0

Successfully installed google-pasta-0.1.7 nlp-utils-0.1 nltk-3.4.4 tensorboard-1.14.0 tensorflow-1.14.0 tensorflow-estimator-1.14.0 wrapt-1.11.2

WARNING: You are using pip version 19.1.1, however version 19.2.1 is available.

You should consider upgrading via the 'pip install --upgrade pip' command.

提示:

Traceback (most recent call last):

  File "train.py", line 13, in <module>

    from text_classification import text_datasets

  File "/dnn4_added/wanglina/contextual_augmentation-master/text_classification/text_datasets.py", line 13, in <module>

    from nlp_utils import make_vocab

ImportError: cannot import name 'make_vocab'

原因:查看是否有对应文件,import nlp_utils;print(nlp_utils) , nlp_utils 中没有make_vocab文件。

解决:将text_classification/nlp_utils.py文件拷贝到/wln_install/python-3.6/lib/python3.6/site-packages/nlp_utils.py

继续运行:python -u train.py -g 0 --train datasets/wikitext-103-raw/spacy_wikitext-103-raw.train --valid datasets/wikitext-103-raw/spacy_wikitext-103-raw.valid --vocab datasets/wikitext-103-raw/spacy_wikitext-103-raw.train.vocab.t50 -u 1024 --layer 1 --dropout 0.1 --batchsize 64 --out trained_bilm

报错:

{

  "gpu": 0,

  "out": "trained_bilm",

  "batchsize": 64,

  "epoch": 5,

  "gradclip": 10,

  "lr": 0.0001,

  "unit": 1024,

  "layer": 1,

  "dropout": 0.1,

  "vocab": "datasets/wikitext-103-raw/spacy_wikitext-103-raw.train.vocab.t50",

  "train_path": "datasets/wikitext-103-raw/spacy_wikitext-103-raw.train",

  "valid_path": "datasets/wikitext-103-raw/spacy_wikitext-103-raw.valid",

  "resume": null,

  "labeled_dataset": null,

  "no_label": false,

  "validation": false

}

Traceback (most recent call last):

  File "train.py", line 168, in <module>

    main()

  File "train.py", line 75, in main

    args.train_path, vocab, chain_length=1)

  File "/dnn4_added/wanglina/contextual_augmentation-master/chain_utils.py", line 87, in __init__

    get_last_only=get_last_only)

  File "/dnn4_added/wanglina/contextual_augmentation-master/chain_utils.py", line 63, in make_chain_dataset

    max_value=n_lines):

TypeError: __call__() got an unexpected keyword argument 'max_value'

原因:这里是粘贴的一个同学的答案,感谢~ 

这是progressbar这个组件的问题。text prograss bar 就是一个进度条。

max_value代表进度条的长度,在一些情况下,有的人说,把参数名换成maxval吧。

但深入源码,大伙儿会发现,根本不是换成maxval的问题,更新了也不顶用。

这个时候呢,大伙可以尝试重装一下progressbar最新版。

但实话说,我不建议这样。我推荐一个新组件。

from tqdm import tqdm

换一个进度条工具。

 for line in tqdm(io.open(path, encoding='utf-8'),total=n_lines):

更改一下这一行的代码,就可以解决了。

解决:修改chain_utils.py中的相关代码,改为以上描述中的代码格式。

加上:from tqdm import tqdm

修改前:for line in bar(io.open(path, encoding='utf-8'),max_value=n_lines):

修改后:for line in tqdm(io.open(path, encoding='utf-8'),total=n_lines):

再次运行:

python -u train.py -g 0 --train datasets/wikitext-103-raw/spacy_wikitext-103-raw.train --valid datasets/wikitext-103-raw/spacy_wikitext-103-raw.valid --vocab datasets/wikitext-103-raw/spacy_wikitext-103-raw.train.vocab.t50 -u 1024 --layer 1 --dropout 0.1 --batchsize 64 --out trained_bilm

提示:

{

  "gpu": 0,

  "out": "trained_bilm",

  "batchsize": 64,

  "epoch": 5,

  "gradclip": 10,

  "lr": 0.0001,

  "unit": 1024,

  "layer": 1,

  "dropout": 0.1,

  "vocab": "datasets/wikitext-103-raw/spacy_wikitext-103-raw.train.vocab.t50",

  "train_path": "datasets/wikitext-103-raw/spacy_wikitext-103-raw.train",

  "valid_path": "datasets/wikitext-103-raw/spacy_wikitext-103-raw.valid",

  "resume": null,

  "labeled_dataset": null,

  "no_label": false,

  "validation": false

}

100%|???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????| 5407481/5407481 [01:24<00:00, 63936.20it/s]

100%|???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????| 11415/11415 [00:00<00:00, 59487.21it/s]

#train = 4771160

#valid = 10116

#vocab = 49873

74549 iters per epoch

log and eval are scheduled at every (745, 'iteration') and (37250, 'iteration')

iter/epoch 74549

Training start

epoch       iteration   main/perp   validation/main/perp  elapsed_time

0           745         302.07                            16118.7      

0           1490        232.779                           16573.7      

0           2235        170.714                           17027.5      

0           2980        127.466                           17481.7      

0           3725        120.539                           17934.5      

0           4470        125.575                           18387.5      

0           5215        104.433                           18839.6      

0           5960        84.3001                           19294.6      

0           6705        79.2073                           19749.8      

0           7450        71.7358                           20203.8      

0           8195        77.5412                           20656.9      

0           8940        54.9137                           21106.4      

0           9685        70.0783                           21561.3      

0           10430       67.5645                           22012.1      

0           11175       69.1995                           22465.2

很长(不一一粘贴了)…………

total [###################################...............] 71.13%

this epoch [###########################.......................] 55.67%

    265150 iter, 3 epoch / 5 epochs

4           371010      19.6101                           241281       

4           371755      14.3248                           241732       

4           372500      19.0776     16.5097               242210

 

双向语言模型训练完毕,耗时约三天。机器比较慢,作为初学者,时间代价太大了。。。。。。

下一篇更新接下来的步骤。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值