Transformer学习笔记1

模型分类:

  • GPT类型: auto-regressive(decoder模型,过去时刻的输出也是现在的输入,例如要得到y8还需要知道y1到y7,但不能使用y9,例如用于文本生成任务)

  1. GPT

  1. GPT2

  1. CTRL

  1. Transformer XL

  • BERT类型:auto-encoding(encoder模型,对全部上下文有认识,例如序列分类任务)

  1. bert

  1. albert

  1. distillbert

  1. electra

  1. roberta

  • BART类型: sequence-to-sequence(encoder-decoder模型,依靠输入的序列去生成新的序列;解码器使用特征向量组和已经产生的输出来预测新的输出,例如y3使用y1到y2;例如翻译任务)

  1. BART

  1. mBART

  1. Marian

  1. T5

encoder与decoder不共享权重,并且可以自己组合选择不同的预训练模型

语言模型:

  • models have been trained on large amounts of raw text in a self-supervised fashion

  • Self-supervised learning is a type of training in which the objective is automatically computed from the inputs of the model. That means that humans are not needed to label the data

  • transfer learning:the model is fine-tuned in a supervised way — that is, using human-annotated labels — on a given task.预训练虽然可以转移知识,但也会转移原模型的偏差。

Transformer架构

  • 模型为编码器-解码器架构:编码器获得输入并构建其特征表达式;解码器利用编码器得到的特征和其他输入来得到最后的序列

  • Encoder-only models: Good for tasks that require understanding of the input, such as sentence classification and named entity recognition.

  • Decoder-only models: Good for generative tasks such as text generation

  • Encoder-decoder models or sequence-to-sequence models: Good for generative tasks that require an input, such as translation or summarization.

注意力机制attention is all you need

  • a word by itself has a meaning, but that meaning is deeply affected by the context, which can be any other word (or words) before or after the word being studied.

  • encoder:可以使用所有单词

  • decoder:只能使用已经输出的单词和全部的encoder的输出,例如翻译任务中,要得到y4需要知道y1到y3

  • attention mask: prevent the model from paying attention to some special words

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值