1. BERT(bidirectional encoder representations from transformer)是谷歌公司发明的,基于自注意力机制的nlp模型。有预训练好的模型,并且开源的供所有人去做调优。是2018年的时候最优秀的nlp模型。
训练方式:自编码(Autoencoding)
预测目标:给定上下文,预测其中的一个或多个缺失单词
输入处理:双向,可以同时考虑一个词的左右上下文
适用场景:适合理解上下文,有助于信息提取、问答系统、情感分析等
架构:基于Transformer的编码器
语言模型:判别式(Discriminative)
优点:对上下文理解能力较强
缺点:生成的文本连贯性较弱
GitHub - google-research/bert: TensorFlow code and pre-trained models for BERT
2. GPT(Generative Pretrained Transformer)
训练方式:自回归(Autoregressive)
预测目标:在给定前面的单词时,预测下一个单词
输入处理:单向(从左往右或者从右往左)
适用场景:适合生成式任务,如文章生成、诗歌创作等
架构:基于Transformer的解码器
语言模型:生成式(Generative)
优点:预测的连贯性较强
缺点:对上下文理解能力相对较弱
GPT 1:
论文:Improving Language Understanding by Generative Pre Training
GPT 2:
论文:Language Models are Unsupervised Multitask Learner
GPT 3:
论文:Language Models are Few-Shot Learners
GPT4 :1)能理解图片的内容,能理解物理知识点
论文:
1)Sparks of Artificial General Intelligence Early experience with GPT-4
2)GPTs are GPTs:An Early Look at the Labor Market Impact Potential of Large Language Models
3)GPT-4 Architecture,Infrastructure,Training Dataset,Costs,Vision,MoE
4)Chain of Thought Prompting Elicits Reasoning in Large Language Models
5) SELF-CONSISTENCY IMPROVES CHAIN OF THOUGHT REASONING IN LANGUAGE MODELS
6) Tree of Thoughts: Deliberate Problem Solving with Large Language Models
数据集:GSM8K是小学数学应用题基准测试
3. T5
4. Bart
可以去仔细阅读的文章如下:(待读)
1) Attention is all you need
2) BERT:Pre-training of Deep Bidirectional Transformer for language understanding
5. PaLM是谷歌开发的大语言模型
6.通用千问开源模型
体验
GPT体验
https://chat.openai.com