Simplify the Usage of Lexicon in Chinese NER跑论文代码遇坑及解决方法

github上给出的版本信息是:python3.6,pytorch0.4.1

遇坑:

1.安装后运行显示需要装transformers,直接pip install transformers,结果版本太高不匹配

解决:搜了一下发现装transformers3.4.0版本的比较多,就去装了3.4.0

2.装完后import transformers出现ImportError: cannot import name '_softmax_backward_data'

解决:再降版本,安装transformers2.1.1,这下可以了

3.再次运行程序,出现error:

Traceback (most recent call last):
  File "D:\program\envs\pytorch\lib\site-packages\urllib3\connection.py", line 175, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "D:\program\envs\pytorch\lib\site-packages\urllib3\util\connection.py", line 95, in create_connection
    raise err
  File "D:\program\envs\pytorch\lib\site-packages\urllib3\util\connection.py", line 85, in create_connection
    sock.connect(sa)
TimeoutError: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。

解决:一开始以为是联网的问题,开关防火墙、科学上网都试过了,无解。然后debug一步一步走,发现代码里需要连接一个网址:

url:'https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt'

恍然大悟是bert模块没装,自己得先下载下来(我是小白我承认5555),放到路径中:

path = 'C:\\Users\\yuyuan\\.pytorch_pretrained_bert\\bert-base-chinese'

然后修改文件functions.py,添加上面这个path,再改动对应函数:

#tokenizer = BertTokenizer.from_pretrained('bert-base-chinese', do_lower_case=True)
tokenizer = BertTokenizer.from_pretrained(path)

gazlstm.py里也要改:

if self.use_bert:
    #self.bert_encoder = BertModel.from_pretrained('bert-base-chinese')
    self.bert_encoder = BertModel.from_pretrained(path)
    for p in self.bert_encoder.parameters():
        p.requires_grad = False

ok终于解决,可以成功调用bert(虽然这个方法很简单粗暴,再换电脑还得修改代码,害)

4.继续往下运行,输出build batched crf...后,出现错误:(这个时候已经快暴走了)

cublas runtime error : the GPU program failed to execute at C:/ProgramData/Miniconda3/conda-bld/pytorch_1533096106539/work/
aten/src/THC/THCBlas.cu:249

解决:一通搜索发现是显卡和CUDA不匹配……我的电脑是RTX3050Ti,只有安装CUDA11.0及以上版本才能用GPU,而我为了用pytorch0.4.1,装的是CUDA9.0。

要解决只能升级CUDA,但CUDA11.0对应的pytorch版本是1.7.1,高版本pytorch跑这个论文代码肯定会出点问题。

没办法,只能重装CUDA11.0,对应的cudnn,以及对应的pytorch

5.可以继续运行了,顺利进入训练,但开始疯狂warning刷屏,训练信息都看不到了

D:\undergraduation\LexiconAugmentedNER-master\model\gazlstm.py:151: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at  ..\aten\src\ATen\native\cuda\LegacyDefinitions.cpp:28.)
  gaz_embeds = gaz_embeds_d.data.masked_fill_(gaz_mask.data, 0)  #(b,l,4,g,ge)  ge:gaz_embed_dim
D:\undergraduation\LexiconAugmentedNER-master\model\crf.py:97: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (Triggered internally at  ..\aten\src\ATen/native/IndexingUtils.h:25.)
  masked_cur_partition = cur_partition.masked_select(mask_idx)
D:\undergraduation\LexiconAugmentedNER-master\model\crf.py:102: UserWarning: masked_scatter_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at  ..\aten\src\ATen\native\cuda\LegacyDefinitions.cpp:72.)
  partition.masked_scatter_(mask_idx, masked_cur_partition)
D:\undergraduation\LexiconAugmentedNER-master\model\crf.py:248: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (Triggered internally at  ..\aten\src\ATen/native/IndexingUtils.h:25.)
  tg_energy = tg_energy.masked_select(mask.transpose(1,0))
[W ..\aten\src\ATen\native\cuda\LegacyDefinitions.cpp:72] Warning: masked_scatter_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (function masked_scatter__cuda)
[W ..\aten\src\ATen\native\cuda\LegacyDefinitions.cpp:28] Warning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (function masked_fill__cuda)
[W IndexingUtils.h:25] Warning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (function expandTensors)

下面都是在重复后面三个warning。

解决:把batchify_with_label函数return里的mask改成mask.bool()

 # print(bert_seq_tensor.type())
    return gazs, word_seq_tensor, biword_seq_tensor, word_seq_lengths, label_seq_tensor, layer_gaz_tensor, gaz_count_tensor,gaz_chars_tensor, gaz_mask_tensor, gazchar_mask_tensor, mask.bool(), bert_seq_tensor, bert_mask

这时还有一个warning,显示在gazlstm.py中

再修改get_tags函数:

gaz_mask = gaz_mask_input.unsqueeze(-1).repeat(1,1,1,1,self.gaz_emb_dim)

# 加一句
gaz_mask = gaz_mask.bool()

gaz_embeds = gaz_embeds_d.data.masked_fill_(gaz_mask.data, 0)  #(b,l,4,g,ge)  ge:gaz_embed_dim

ok,没有warning了,顺利运行!

  • 3
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 7
    评论
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值