Training language models to follow instructions with human feedback

67 篇文章 2 订阅
60 篇文章 1 订阅

Abstract

使语言模型变得更大并不意味着它们本身就能更好地遵循用户的意图。模型的输出结果可能存在以下问题

  • 不真实
  • 有毒
  • 对用户没有帮助

即这些模型没有和用户 “对齐”(aligned)

在给定的 Prompt 分布上,1.3B 的 InstructGPT 的输出比 175B GPT-3 的输出更好(尽管参数量相差 100 多倍)。

1 Introduction

语言建模的目标:predicting the next token on a webpage from the internet

期望的目标: follow the user’s instructions helpfully and safely (Radford et al., 2019; Brown et al., 2020; Fedus et al., 2021; Rae et al., 2021; Thoppilan et al., 2022)

因此我们说语言模型的的目标 没有对齐 (misaligned)

用户意图包含两类:

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Get started in the world of software development: go from zero knowledge of programming to comfortably writing small to medium-sized programs in Python. Programming can be intimidating (especially when most books on software require you to know and use obscure command line instructions) but it doesn’t have to be that way! In Learn to Program with Python, author Irv Kalb uses his in-person teaching experience to guide you through learning the Python computer programming language. He uses a conversational style to make you feel as though he is your personal tutor. All material is laid out in a thoughtful manner, each lesson building on previous ones. Many real–world analogies make the material easy to relate to. A wide variety of well-documented examples are provided. Along the way, you’ll develop small programs on your own through a series of coding challenges that reinforce the content of the chapters What You Will Learn: Learn fundamental programming concepts including: variables and assignment statements, functions, conditionals, loops, lists, strings, file input and output, Internet data, and data structures Get comfortable with the free IDLE Interactive Development Environment (IDE), which you will use to write and debug all your Python code – no need to use the command line! Build text-based programs, including a number of simple games Learn how to re-use code by building your own modules Use Python’s built-in data structures and packages to represent and make use of complex data from the Internet

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值