【论文笔记】Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

We present Unicoder, a universal language encoder that is insensitive to different languages. Given an arbitrary NLP task, a model can be trained with Unicoder using training data in one language and directly applied to inputs of the same task in other languages.


universal的含义:

  • 对语言是不敏感的
  • 在一种语言上训练后,可以直接应用到其他语言

.

Comparing to similar efforts such as Multilingual BERT (Devlin et al., 2018) and XLM (Lample and Conneau, 2019), three new cross-lingual pre-training tasks are proposed, including cross-lingual word recovery, cross-lingual paraphrase classification and cross-lingual masked language model. These tasks help Unicoder learn the mappings among different languages from more perspectives.


  • 文章在Multilingual BERT和XLM的基础上,提出了三种新的预训练任务。

.

Experiments are performed on two tasks: cross-lingual natural language inference (XNLI) and cross-lingual question answering (XQA), where XLM is our baseline.


两个检验任务:跨语言推断和跨语言问答。

简介

Multilingual BERT trains a BERT model based on multilingual Wikipedia,
which covers 104 languages. As its vocabulary contains tokens from all languages, Multilingual BERT can be used to cross-lingual tasks directly.


关于Multilingual BERT的一些简介:

  • 包含104种语言
  • 共享词表
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值