OpenNRE源码结构解读

Chloe Chiu

于 2021-02-23 22:35:40 发布

阅读量843

点赞数 1

分类专栏： NLP 文章标签：自然语言处理深度学习 pytorch

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/qq_43496336/article/details/114004712

版权

本文深入解读THUNLP的OpenNRE框架，重点剖析BagRE及其内部组件BagAttention。BagRE使用PCNN编码器，数据加载器处理BagREDataset，通过对sentence的分词、索引、位置编码进行模型输入。BagAttention涉及attention机制，用于计算得分并进行预测。在训练和评估过程中，计算P@N和F1等指标。

摘要由CSDN通过智能技术生成

OpenNRE源码结构解读

THUNLP OpenNRE开源模型代码解读，以BagRE为例。

structure

层次：framework(BagRE) - model(BagAttention) - encoder(PCNN)

framework(BagRE)
- dataLoader = BagRELoader
  - datasest = BagREDataset
    - tokenizer = model.sentence_encoder.tokenize（传入的model的成员的函数对象）
    - 其getitem函数为
      - 见下
- model = BagAttention（传入的参数）
  - sentence_encoder = PCNNEncoder（在实例化时传入的参数）
    - 继承自父类BaseEncoder，其中包含
      - tokenizer = WordTokenizer(vocab=self.token2id)
        
        token2id (dictionary of token->idx mapping)
      - word ＆ pos1,2 embeddings，
    - 其tokenize函数为
      - Sentence -> Token: 利用BaseEncoder的tokenizer.tokenize进行分词（得到切分好的List of wordpieces & pos）
      - Token -> index: 利用tokenizer.convert_tokens_to_ids得到indexed_tokens（通过utils.convert_by_vocab）
      - Position -> index | Mask | Padding: pos1, pos2, mask
      - return indexed_tokens, pos1, pos2, mask
    - 其forward函数为
      - arg：按batch的上述返回值，尺寸为(B, L)，L为sentence length
      - ret：按batch输出向量表示，尺寸为(B, EMBED)
- num classes, id2rel mapping，用于模型内部的卷积、线性层
- forward函数：
  - 先通过view展平为nsum（一个bag有n个sentence，批量处理B个bag对应nsum个sentence，每个bag中的sentence利用scope确定）
  - 然后encode为H=EMBED（隐藏层尺寸即词嵌入dim）
  - 再经过attention得到score，经过softmax得到logits
  - ```
  def forward(self, label, scope, token, pos1, pos2, mask=None, train=True, bag_size=0):
      """
      Args:
          label: (B), label of the bag
          scope: (B), scope for each bag
          token: (nsum, L), index of tokens
          pos1: (nsum, L), relative position to head entity
          pos2: (nsum, L), relative position to tail entity
          mask: (nsum, L), used for piece-wise CNN
      Return:
          logits, (B, N)
      """
      
      # 1. get representation, size (nsum, H)
      if bag_size 
```

最低0.47元/天解锁文章

关注

1
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
OpenNRE源码结构解读

OpenNRE源码结构解读THUNLP OpenNRE开源模型代码解读，以BagRE为例。structure层次：framework(BagRE) - model(BagAttention) - encoder(PCNN)framework(BagRE)dataLoader = BagRELoaderdatasest = BagREDatasettokenizer = model.sentence_encoder.tokenize（传入的model的成员的函数对象）其getitem
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。