ELMo模型实践

对ELMo进行下面的实践:

一、直接通过AllenNLP获取ELMo词向量

1. 环境配置
参考我之前的博客https://blog.csdn.net/Findingxu/article/details/91542654

2.下载训练好的参数和模型
参数下载:
https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5
模型下载:
https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_options.json

3.使用代码:

from allennlp.commands.elmo import ElmoEmbedder
import torch
options_file = "elmo_2x4096_512_2048cnn_2xhighway_options.json"
weight_file = "elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5"
 
elmo = ElmoEmbedder(options_file, weight_file)
 
# use batch_to_ids to convert sentences to character ids
context_tokens = [['I', 'love', 'you', '.'], ['Sorry', ',', 'I', 'don', "'t", 'love', 'you', '.']] #references
elmo_embedding, elmo_mask = elmo.batch_to_embeddings(context_tokens)

print(elmo_embedding,elmo_embedding.size()) # torch.Size([2, 3, 8, 1024]), (batch_size, 3, num_timesteps, 1024)
print(elmo_mask,elmo_mask.size()) #mask表示的是句子有单词的部分,torch.Size([2, 8]), (batch_size, num_timesteps).

 

二、在AllenNLP里面调用

AllenNLP是在pytorch基础上的封装,是一个很好用深度学习NLP工具包。它的原理和实践可以看我的这篇博客:AllenNLP之入门解读代码

要想在AllenNLP里面使用的话,就在配置文件(json格式)里面对word_embeddings进行设置就好:
例子如下:和上面的还是一样的,需要options_file和weight_file,如果下载较慢,那就先下下来,把链接改为路径就好

  "model": {
    "type": "lstm_classifier",

    "word_embeddings": {
      "tokens": {
        "type": "elmo_token_embedder",
        "options_file": "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x1024_128_2048cnn_1xhighway/elmo_2x1024_128_2048cnn_1xhighway_options.json",
        "weight_file": "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x1024_128_2048cnn_1xhighway/elmo_2x1024_128_2048cnn_1xhighway_weights.hdf5",
        "do_layer_norm": false,
        "dropout": 0.5
      }
    },

    "encoder": {
      "type": "lstm",
      "input_size": embedding_dim, # 256
      "hidden_size": hidden_dim # 128
    }
  },

 

  • 3
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值