[论文阅读笔记27]biaffine4NER:双仿射分类器在NER的应用

题目

Named Entity Recognition as Dependency Parsing

Yu, J., Bohnet, B., & Poesio, M. (2020). Named Entity Recognition as Dependency Parsing. ArXiv, abs/2005.07150.
代码:https://github.com/juntaoy/biaffine-ner

作者

Juntao Yu
Queen Mary University London, UK 伦敦玛丽女王大学

Bernd Bohnet
Google Research Netherlands

Massimo Poesio
Queen Mary University London, UK

摘要

也是为了解决nested NER而提出的模型;
基本思想:以基于图的依赖解析构建一个全局图作为输入,经过biaffine模型进行处理;具体地,biaffine模型主要是对句子中的tokens的start与end对(即所有的spans)进行评分,最后达到实体抽取的目的。

模型方法

主要思想:把实体抽取任务看成为识别start与end索引的问题,同时对这个start与end形成的span赋予类型。采用biaffine模型在多层神经网络的顶端输出所有spans的分数,然后根据这些分数对候选spans进行排序,返回top-rank spans,这些数据满足flat or nested Ner的限制条件的。网络结构如下图:

image-20210303143638280

input

​ 级联word embeddings和character embeddings;

​ 词编码:BERT_Large,fastText

​ 输入入的数据集处理成的样子:

{"doc_key": "batch_01", 
"ners": [[[0, 0, "PER"], [3, 3, "GPE"], [5, 5, "GPE"]], 
[[3, 3, "PER"], [10, 14, "ORG"], [20, 20, "GPE"], [20, 25, "GPE"], [22, 22, "GPE"]], 
[]], 
"sentences": [["Anwar", "arrived", "in", "Shanghai", "from", "Nanjing", "yesterday", "afternoon", "."], 
["This", "morning", ",", "Anwar", "attended", "the", "foundation", "laying", "ceremony", "of", "the", "Minhang", "China-Malaysia", "joint-venture", "enterprise", ",", "and", "after", "that", "toured", "Pudong", "'s", "Jingqiao", "export", "processing", "district", "."], 
["(", "End", ")"]]}

数据主要分为三种信息:doc_key, ners, sentences; 文档id, 实体,句子。实体采用start,end,类型三元组来表达,句子采用字符数组来表达。

然后看下模型实际要的数据格式:

image-20210303155547499

然后看一下这个文本是怎么组装的:

def tensorize_example(self, example, is_training):
  ners = example["ners"]
  sentences = example["sentences"]

  max_sentence_length = max(len(s) for s in sentences)
  max_word_length = max(max(max(len(w) for w in s) for s in sentences), max(self.config["filter_widths"]))
  text_len = np.array([len(s) for s in sentences])
  tokens = [[""] * max_sentence_length for _ in sentences]
  char_index = np.zeros([len(sentences), max_sentence_length, max_word_length])
  context_word_emb = np.zeros([len(sentences), max_sentence_length, self.context_embeddings_size])
  lemmas = []
  if "lemmas" in example:
    lemmas = example["lemmas"]
  for i, sentence in enumerate(sentences):
    for j, word in enumerate(sentence):
      tokens[i][j] = word
      if self.context_embeddings.is_in_embeddings(word):
        context_word_emb[i, j] = self.context_embeddings[word]
      elif lemmas and self.context_embeddings.is_in_embeddings(lemmas[i][j]):
        context_word_emb[i,j] = self.context_embeddings[lemmas[i][j]]
      char_index[i, j, :len(word)] = [self.char_dict[c] for c in word]

  tokens = np.array(tokens)

  doc_key = example["doc_key"]

  lm_emb = self.load_lm_embeddings(doc_key)

  gold_labels = []
  if is_training:
    for sid, sent in enumerate(sentences):
      ner = {(s,e):self.ner_maps[t] for s,e,t in ners[sid]}
      for s in xrange(len(sent)):
        for e in xrange(s,len(sent)):
          gold_labels.append(ner.get((s,e),0))
  gold_labels = np.array(gold_labels)

  example_tensors = (tokens, context_word_emb,lm_emb, char_index, text_len, is_training, gold_labels)

  return example_tensors

这里可以看到一个句子长*句子长的字符下标矩阵char_index,这个只要是对char进行embedding的,它的处理模型采用CNN。

再看一下输入的级联所包含的信息:

context_emb_list = []
context_emb_list.append(context_word_emb)
context_emb_list.append(aggregated_char_emb)
context_emb_list.append(aggregated_lm_emb)
context_emb = tf.concat(context_emb_list, 2)
context_emb = tf.nn.dropout(context_emb, self.lexical_dropout)

隐含层:BiLSTM

FFNNs

这里对BiLSTM分别用了两个前馈全连接网络。

image-20210303151438499

其中projection函数为:

def projection(inputs, output_size, initializer=None):
  return ffnn(inputs, 0, -1, output_size, dropout=None, output_weights_initializer=initializer)

从代码就可以看到BiLSTM分别输入两个FFNN。

分类器:biaffine classififier.

关于biaffine分类器:

image-20210303152716293

r矩阵的大小为:l*l*c.l表示句子长度;c是NER类别+1;r矩阵提供了所有spans的可能。

以下公式对于任意的spans赋予分类标签:
image-20210303153044052

然后对于瓢标签的进行根据分数进行排序。

损失函数为:
image-20210303153150750

从代码中看:

candidate_ner_scores = util.bilinear_classifier(candidate_starts_emb,candidate_ends_emb,self.dropout,output_size=self.num_types+1)#[num_sentence, max_sentence_length,max_sentence_length,types+1]
candidate_ner_scores = tf.boolean_mask(tf.reshape(candidate_ner_scores,[-1,self.num_types+1]),flattened_candidate_scores_mask)
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=gold_labels, logits=candidate_ner_scores)
loss = tf.reduce_sum(loss)

把start与end一起输入到双曲线分类器了,最后采用交叉熵来计算损失函数。

实验

数据集:

​ 对于nested ner: ACE 2004, ACE 2005, GENIA;

​ 对于flat ner: CONLL 2002,CONLL 2003 ,ONTONOTES;

实验1: nested ner

image-20210303153556402

实验2: Flat NER

image-20210303153759655

Ablation Study – 消融研究

选择ONTONOTES数据集来研究神经网络各个模块的量化情况。

image-20210303153933023

总结

感觉这是一个比较粗暴的方法。题目写的时dependency parsing可是论文也没有发现用到它的内容,只是用了一下biaffine。虽然它是在句子分析用到过它,可是也不能用来作题目吧,感觉是标题党。

不管怎样,它的结果还可以,可是看了代码,它的代码效果感觉是有改进的空间的。

还有整个模型也是相对比较简单的,整篇文章就是一个r评分矩阵有用一点。

相关技术

Flat Named Entity Recognition

Nested Named Entity Recognition

参考

【1】Timothy Dozat and Christopher Manning. 2017. Deep biaffine attention for neural dependency parsing. In Proceedings of 5th International Conference on Learning Representations (ICLR).

comply:v. 服从,遵守

  • 3
    点赞
  • 25
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值