2020.3 Enhanced meta-learning for cross-lingual named entity recognition with minimal resources 阅读笔记

最新推荐文章于 2023-11-14 19:30:41 发布

元气少女wuqh

最新推荐文章于 2023-11-14 19:30:41 发布

阅读量953

点赞数

分类专栏： Paper Reading 文章标签：机器学习人工智能

本文链接：https://blog.csdn.net/tsinghuahui/article/details/104785739

版权

Paper Reading 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

Motivation

Problem Setting:
- a) One source language with rich labeled data.
- b) No labeled data in the target language.
现有的 Cross-lingula NER 方法可以分为两大类：
- a) Label projection (generate labeled data in target languages)
  - parallel data + word alignment information + label projection
  - word-to-word or phrase-to-phrase translation + label projection
- b) Direct model transfer (exploit language-independent features)
  - Cross-lingual word representations/clusters/wikifier features/gazetteers.
- STOA: multilingual BERT (direct model transfer)
本文提出：direct model transfer 用到的 source-trained 模型可以进一步提升，因为：
- a) Given test example, 相比于 directly test on it, 还可以 fine-tune the source-trained model with the similar examples.
  - 如何 retreive similar examples ? => 利用 cross-lingual 的 sentence representation model 计算source-target sentence pair 之间的 cosine simialarity.
  - 以何种方式 similar? => In structure or semantics.
- b) 由于 retrieve across different languages, 所以 the set size of the similar examples 很小。
- c) 所以只能 finetune with a small set and only a few update steps.
  - => Fast adapt to new tasks (languages here) with very limited data.
  - => Apply meta-learning! (Learn a good parameter initialization of a model (more sentitive to the new task/data)
进一步提出:
- a) masking scheme
- b) a max-loss term

Methodology

i) 构建 Pseudo-Meta-NER Tasks:
- 把每个 example 看做一个独立的 pseudo test set.
- 用 mBERT [CLS] 做为 sentence representation 计算 cosine similarity.
- 相应的 similar examples 作为 pseudo training set.
- 由此构建 N个 pseuso tasks.
ii) Meta-training and Adaptation with Pseudo Tasks:
iii) Masking Scheme:
- Motivation: The learned representations of infrequent entities across different languages are not well-aligned in the shared space. (infrequent entities 在 mBERT 的 training corpus 中出现的比较少)
- How to? => Mask entities in each training examples with a certain probability, to encourage the model to predict through context information.
iv) Max Loss
- Motivation: 对所有 token 的 loss 进行 average 会弱化对于 loss 最高的那个 token 的学习。=> Put more effort in learning from high-loss tokens, which would probably
  be corrected during meta-training.

Experimental Results

在这里插入图片描述

元气少女wuqh

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
2020.3 Enhanced meta-learning for cross-lingual named entity recognition with minimal resources 阅读笔记

MotivationProblem Setting:a) One source language with rich labeled data.b) No labeled data in the target language.现有的 Cross-lingula NER 方法可以分为两大类：a) Label projection (generate labeled data i...
复制链接

扫一扫

专栏目录