2020.3 Enhanced meta-learning for cross-lingual named entity recognition with minimal resources 阅读笔记

Motivation

  • Problem Setting:
    • a) One source language with rich labeled data.
    • b) No labeled data in the target language.
  • 现有的 Cross-lingula NER 方法可以分为两大类:
    • a) Label projection (generate labeled data in target languages)
      • parallel data + word alignment information + label projection
      • word-to-word or phrase-to-phrase translation + label projection
    • b) Direct model transfer (exploit language-independent features)
      • Cross-lingual word representations/clusters/wikifier features/gazetteers.
    • STOA: multilingual BERT (direct model transfer)
  • 本文提出:direct model transfer 用到的 source-trained 模型可以进一步提升,因为:
    • a) Given test example, 相比于 directly test on it, 还可以 fine-tune the source-trained model with the similar examples.
      • 如何 retreive similar examples ? => 利用 cross-lingual 的 sentence representation model 计算source-target sentence pair 之间的 cosine simialarity.
      • 以何种方式 similar? => In structure or semantics.
        在这里插入图片描述
    • b) 由于 retrieve across different languages, 所以 the set size of the similar examples 很小
    • c) 所以只能 finetune with a small set and only a few update steps.
      • => Fast adapt to new tasks (languages here) with very limited data.
      • => Apply meta-learning! (Learn a good parameter initialization of a model (more sentitive to the new task/data)
  • 进一步提出:
    • a) masking scheme
    • b) a max-loss term

Methodology

  • i) 构建 Pseudo-Meta-NER Tasks:
    • 把每个 example 看做一个独立的 pseudo test set.
    • 用 mBERT [CLS] 做为 sentence representation 计算 cosine similarity.
    • 相应的 similar examples 作为 pseudo training set.
    • 由此构建 N个 pseuso tasks.
  • ii) Meta-training and Adaptation with Pseudo Tasks:
    在这里插入图片描述
  • iii) Masking Scheme:
    • Motivation: The learned representations of infrequent entities across different languages are not well-aligned in the shared space. (infrequent entities 在 mBERT 的 training corpus 中出现的比较少)
    • How to? => Mask entities in each training examples with a certain probability, to encourage the model to predict through context information.
  • iv) Max Loss
    • Motivation: 对所有 token 的 loss 进行 average 会弱化对于 loss 最高的那个 token 的学习。=> Put more effort in learning from high-loss tokens, which would probably
      be corrected during meta-training.
      在这里插入图片描述

Experimental Results

在这里插入图片描述
在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值