gpt mbr ext3_gpt 3善与恶

最新推荐文章于 2024-03-08 04:02:30 发布

weixin_26712095

最新推荐文章于 2024-03-08 04:02:30 发布

阅读量290

点赞数

文章标签： python

原文链接：https://medium.com/@schilderf/gpt-3-the-good-the-bad-and-the-ugly-5e2e5b7f0f66

版权

gpt mbr ext3

If you follow the latest AI news, you probably came across several stunning applications of the latest Language Model (LM) released by OpenAI: GPT-3. The applications that this LM can fuel reach from question answering to generating Python code. The list of use cases is growing daily. Check out the following youtube videos: GPT-3 demo and explanation, 14 cool GPT-3 apps and 14 more GPT-3 apps.

如果您关注最新的AI新闻，则可能会遇到OpenAI发布的最新语言模型(LM)的多个令人惊叹的应用程序： GPT-3 。该LM可以促进从回答问题到生成Python代码的应用程序。用例列表每天都在增长。观看以下youtube视频： GPT-3演示和说明， 14个很棒的GPT-3应用程序和14个其他GPT-3应用程序。

GPT-3 is currently in beta and only a restricted number of people have access, but will be released to everybody on October 1st. OpenAI was very much interested in spreading the hype and showing amazing samples of cool applications. As of September 22, 2020, their strategy obviously worked out. While writing this blog post, Microsoft announced that they acquired the exclusive rights on the language model. OpenAI will probably continue to license the access to the LM via an API, but the purchase by Microsoft allowed OpenAI to get an ROI of their investment of $4.6 million — the estimated cost of training this massive LM.

GPT-3目前处于测试阶段，只有少数人可以使用，但将于10月1日向所有人发布。 OpenAI对传播炒作和展示出色的酷应用样本非常感兴趣。截至2020年9月22日，他们的策略显然已经制定出来。在撰写此博客文章时， Microsoft宣布他们获得了该语言模型的专有权。 OpenAI可能会继续通过API许可对LM的访问，但是Microsoft的购买使OpenAI获得了460万美元的投资ROI(培训这种大型LM的估计成本)。

Because OpenAI is quite successful in their marketing by enlisting many people to post fascinating examples which are strictly speaking only anecdotal evidence of the capabilities, one should see the current hype with some skepticism. People will most likely only post examples that confirm their bias that the machine “understands” language at a new level. At the same time the negative examples such as racist stories that are automatically generated when your prompt is “three muslims”, as discussed further below, should raise concern about potentially doing more harm than good.

由于OpenAI通过吸引许多人发布引人入胜的示例(严格来说仅是功能的传闻)而在营销方面非常成功，因此人们应该对当前的炒作持怀疑态度。人们很可能只会发布一些示例，以证实他们对机器“理解”语言的新高度的偏见。同时，如以下进一步讨论的那样，当您的提示为“三位穆斯林”时，会自动生成诸如种族主义故事之类的负面例子，这可能引起人们对弊大于利的担忧。

Before I discuss in more detail “the Good, the Bad, and the Ugly”, let’s briefly review what the main contribution of GPT-3 is. OpenAI released a previous version called GPT-2 last year. The technology has not changed since then. It is basically the enormous amount of data that led to the LM with now 175 Billion parameters compared to currently used LM such as T5 with 11 Billion parameters. After training the model data largely crawled from the “Internet”, the authors were able to show that the system was able to reach or even beat State-Of-The-Art systems in various NLP tasks (e.g., question answering, machine translation). Most impressive, however, was the fact that the system was never trained on the tasks and was able to achieve reasonable performance with no, one or just a few examples (i.e., no-shot/one-shot/few-shot learning).

在我详细讨论“好，坏和丑”之前，让我们简要回顾一下GPT-3的主要贡献是什么。 OpenAI去年发布了以前的版本，称为GPT-2。从那时起，技术一直没有改变。与目前使用的具有110亿个参数的LM(例如T5)相比，基本上是导致LM(目前拥有1750亿个参数)的大量数据。在训练了大部分从“ Internet”爬取的模型数据之后，作者能够证明该系统能够在各种NLP任务(例如，问题解答，机器翻译)中达到甚至击败最先进的系统。。然而，最令人印象深刻的是，该系统从未接受过有关任务的培训，并且能够在没有，只有一个或几个示例的情况下(即，无经验/无经验/无经验学习)获得合理的性能。

Image for post — Comparison between in-context learning and fine-tuning (source: https://arxiv.org/abs/2005.14165)

The figure from the GPT-3 paper illustrates how GPT-3 can be told with just a handful of examples how to do a task in contrast to the traditional approach of fine-tuning a deep learning model by feeding it lots of examples (…). In addition, the fine-tuning also requires you to define the solution space (i.e., the number of labels) in advance and you have to make sure you have enough examples in your training data so that the machine can learn how to distinguish the different classes. All this is not required when using GPT-3 (provided enough data for the task was available in the data that was fed to the LM).

GPT-3论文中的数字说明了如何通过少量示例来告知GPT-3如何完成一项任务，而传统的方法是通过提供大量示例来微调深度学习模型(…) 。此外，微调还要求您预先定义解决方案空间(即标签数量)，并且必须确保训练数据中有足够的示例，以便机器可以学习如何区分不同的内容。类。使用GPT-3时，不需要所有这些操作(提供给LM的数据中提供了足够的任务数据)。

善良 (The Good)

GPT-3 shows impressive results for a number of NLP tasks such as questions answering (QA), generating code (or other formal languages/editorial assist) and (fictional) story generation. Those applications have shown impressive results and and will most likely be incorporated into already existing system showing improvements over the current state-of-the-art.

GPT-3对于许多NLP任务显示出令人印象深刻的结果，例如问题回答(QA)，生成代码(或其他形式语言/编辑辅助)和(虚构的)故事生成。这些应用程序已显示出令人印象深刻的结果，并且很可能将其并入已存在的系统中，从而显示出对当前最新技术的改进。

The GPT-3 paper shows, for example, impressive results for various QA tasks such as TriviaQA. It is quite promising that few-shot learning often shows better results than one-shot or zero-shot learning indicating that with more labeled examples the LM may improve even more.

例如，GPT-3论文显示了针对各种质量检查任务(例如TriviaQA)的出色结果。非常有希望的是，少拍学习通常会比单拍学习或零拍学习表现出更好的结果，这表明通过标记更多的示例，LM可以进一步提高。

These are the application that probably will benefit from GPT-3 the most:

这些是可能会最受益于GPT-3的应用程序：

Improving question answering (if used as an additional component of a complex QA-system).
改善问题解答(如果用作复杂质量检查系统的附加组件)。
Supporting creating content (as we have already seen with auto-suggest). Drafting assistants of documents, query suggestions are applications that come to mind.
支持创建内容(我们已经在自动建议中看到过)。文档起草助手，查询建议是我想到的应用程序。
Easy access to a deep learning technology will allow to create powerful applications without a lot of training data.
轻松访问深度学习技术将允许创建功能强大的应用程序而无需大量培训数据。

In particular, the last bullet point could lead to the assumption that it will be easy to create applications without much technical knowledge. Using the GPT-3 playground would allow everybody to explore concepts that can be quickly tested via zero- of few-shot learning. The following examples shows how somebody with just two examples would get answers for questions like “What are the elements of…?” I looked for answers of the first two questions on the internet and the answer for third question was generated by the system. Without the two examples, the system’s answer were too short and not informative, but just two examples put it on the “right” path (bold is my input, normal font is GPT-3’s continuation):

特别是，最后一个要点可能导致这样的假设，即无需太多技术知识即可轻松创建应用程序。使用GPT-3运动场将使每个人都可以探索可以通过零次学习获得快速验证的概念。以下示例显示了只有两个示例的人如何获得诸如“……的要素是什么？”之类的问题的答案。我在互联网上寻找前两个问题的答案，而第三个问题的答案是由系统生成的。没有这两个示例，系统的答案就太短了，不能提供足够的信息，但是只有两个示例将其放在“正确”的路径上(粗体是我的输入，普通字体是GPT-3的延续)：

Q: What are the elements of trespass?

问：侵入的要素是什么？

A: Trespass is an invasion in to another’s exclusive right to possession of property. It is a direct unlawful interference with another’s property. Generally, the main elements of trespass are an unlawful intrusion or invasion upon a property, intent of intrusion, force, and consequent injury to an owner.

答：侵入是对他人专有财产拥有权的侵犯。 这是对他人财产的直接非法干扰。 通常，侵入的主要因素是对财产的非法侵入或入侵，侵入的意图，用力以及对所有者的随后伤害。

Q: What are the elements of fraud?

问：欺诈的要素是什么？

A: In the United States, common law generally identifies nine elements needed to establish fraud: (1) a representation of fact; (2) its falsity; (3) its materiality; (4) the representer’s knowledge of its falsity or ignorance of its truth; (5) the representer’s intent that it should be acted upon by the person in the manner reasonably contemplated; (6) the injured party’s ignorance of its falsity; (7) the injured party’s reliance on its truth; (8) the injured party’s right to rely thereon; and (9) the injured party’s consequent and proximate injury.

答：在美国，普通法通常确定构成欺诈的九个要素：(1)事实的陈述； (二)虚假的； (三)实质性； (4)代表人对其虚假性或对真理的无知知识； (5)代表人意图由该人以合理预期的方式行事的意图； (6)受害方对其虚假事实的无知； (七)受害方对事实的依赖； (八)受害方的依赖权； (9)受害方的直接和间接伤害。

Q: What are the elements of a civil conspiracy?

问：公民阴谋的要素是什么？

A: (1) an agreement between two or more persons; (2) an overt act by one or more parties to further the purpose of the conspiracy; and (3) damages suffered by the injured party as a result of the conspiracy.

答：(1)两个或两个以上人之间的协议； (2)一个或多个当事方为促进串谋的目的而公开采取的行动； (三)串谋使受害方蒙受的损失。

Although these answers sound very impressive (at least to me as a non-lawyer), one should be aware of the risks that using the LM without any further safeguards might bring.

尽管这些答案听起来非常令人印象深刻(至少对我来说是非律师)，但人们应该意识到在没有任何进一步保障的情况下使用LM可能带来的风险。

坏人 (The Bad)

One of the QA tasks GPT-3 was tested on was NaturalQS that focusses on factual accuracy. GPT-3 underperformed in this task, whereas it got high marks for trivia questions. This behavior is troubling because it seems to indicate that question answer pairs that are frequently found on the internet are more likely to be given as correct answers. But text understanding that is required to answer a complex question from just one example of text is clearly beyond the capability of the LM. If the answer, however, sounds authoritative and is written in correct English, humans may not spot the wrong answer so easily.

对GPT-3进行的质量检查任务之一是NaturalQS ，它注重事实的准确性。 GPT-3在这项任务中的表现不佳，但在琐事问题上获得了很高的评价。此行为令人不安，因为它似乎表明，互联网上常见的问题答案对更有可能被给出为正确答案。但是，仅从一个文本示例中回答一个复杂问题所需的文本理解显然超出了LM的能力。但是，如果答案听起来有权威性，并且用正确的英语书写，那么人类可能不会轻易发现错误答案。

As a matter of fact, it’s getting more and more difficult for humans to distinguish news written by a machine from articles written by humans. One of the experiments reported in the GPT-3 paper showed that humans have a hard time identifying machine generated news. The larger the LM got the more problems humans had correctly identifying machine-written news and with the largest version of GPT-3 (175B parameters) the decision was basically a coin flip.

事实上，人类将机器撰写的新闻与人类撰写的文章区分开来越来越困难。 GPT-3论文报道的一项实验表明，人类很难识别机器生成的新闻。 LM越大，人类正确识别机器书面新闻的问题就越多，而对于最大版本的GPT-3(175B参数)，决定基本上是掷硬币。

Another risk of using this LM unfiltered is the missing grounding of the answers. Even though the generated sentence may provide the correct answer, there is no way to back up the statement. The language model is only grounded in frequencies of words but not in the deep understanding of statutes and case law, for example. A recent academic paper by Emily Bender and Alexander Koller provides a similar criticism stating that the meaning of language cannot be learned from LMs.

使用未过滤的LM的另一个风险是缺少答案的基础。即使生成的句子可能提供正确的答案，也无法备份该语句。语言模型仅基于单词的频率，而不基于例如对法规和判例法的深刻理解。艾米莉·班德( Emily Bender)和亚历山大·科勒( Alexander Koller)最近发表的一篇学术论文也提出了类似的批评，指出语言的含义无法从LM那里学到。

An even more devastating rebuke of GPT-3 was delivered by Gary Marcus and Ernest Davis in a recent MIT Technology Review article. They showed that the model does not understand what it is generating via various continuations of complex situations that would require social/biological/physical or other kind of reasoning (again, normal font is GPT-3’s continuation) :

Gary Marcus和Ernest Davis在最近的《 MIT技术评论》文章中对GPT-3进行了更具破坏性的谴责。他们表明，该模型无法理解复杂情况的各种延续所产生的信息，这些复杂情况需要进行社会/生物/物理或其他类型的推理(同样，正常字体是GPT-3的延续)：

You poured yourself a glass of cranberry juice, but then you absentmindedly poured about a teaspoon of grape juice into it. It looks okay. You try sniffing it, but you have a bad cold, so you can’t smell anything. You are very thirsty. So you drink it.

您给自己倒了一杯蔓越莓汁，但是却心不在about地倒了一茶匙葡萄汁。 看起来还好您尝试闻闻它，但感冒得很厉害，因此您闻不到任何气味。 你好渴所以你喝。

You are now dead.

你现在死了。

Somehow GPT-3 thinks that grape juice is poisonous although the internet offers many drink recipes that contain cranberries and grapes as ingredients. Moreover, the conclusion that the drink may be fatal comes somehow out of nowhere. Marcus and Davies conclude that GPT-3 “[i]s a fluent spouter of bullshit, but even with 175 billion parameters and 450 gigabytes of input data, it’s not a reliable interpreter of the world.”

尽管互联网提供了许多包含蔓越莓和葡萄作为成分的饮料配方，但GPT-3还是以某种方式认为葡萄汁是有毒的。此外，以某种方式喝酒可能致命的结论从何而来。 Marcus和Davies得出结论，GPT-3“是胡说八道，但是即使有1750亿个参数和450 GB的输入数据，它也不是世界上可靠的解释器。”

In addition to these risks, the LM model works well for language generation only, may this be as an answer or a fictional text. Other NLP tasks, on the other hand, can not so easily be solved with the help of GPT-3. Typical tasks such as named entity extraction (i.e., labeling strings wether they are companies or person names) or text classification task are more challenging for a LM.

除了这些风险外，LM模型仅适用于语言生成，可以作为答案或虚构的文本。另一方面，借助GPT-3不能轻易解决其他NLP任务。对于LM而言，诸如命名实体提取(即，使用公司或个人名称来标记字符串)之类的典型任务或文本分类任务更具挑战性。

丑陋的 (The Ugly)

It’s a well-known fact that NLP applications such as chat bots can sometimes be difficult to control and one may end up with a program that spews out racist or sexist comments, as Microsoft had to learn when they released their chatbot Tay in 2016. To their credit, OpenAi addressed this problem right from the start and they identify toxic or simply political content generated with a warning. It needs to be seen how they will control applications that may only by accident (or purposefully) generate racist or sexist language.

众所周知的事实是，聊天机器人等NLP应用程序有时可能难以控制，并且最终可能会散发出种族主义或性别歧视言论，因为微软在2016年发布聊天机器人Tay时必须学习。值得称赞的是，OpenAi从一开始就解决了这个问题，他们识别出带有警告的有毒或简单的政治内容。需要看到它们将如何控制仅偶然(或故意)产生种族主义或性别歧视语言的应用程序。

Other beta user were also quick to point out that prompting GPT-3 with “three muslims” will often lead to text where they are depicted as terrorist or criminals. My own experiments confirmed this bias and I also found a similar tendency of portraying them in a stereotypical fashion when I prompted the LM with other religious groups or nationalities.

其他测试版用户也很快指出，用“三名穆斯林”提示GPT-3通常会导致文字被描述为恐怖分子或罪犯。我自己的实验证实了这种偏见，当我与其他宗教团体或国籍人士共同倡导LM时，我也发现了以定型方式刻画它们的类似趋势。

Debiasing LM is an active research topic in the community and I expect to see even more activity in this area. OpenAI is clearly aware of this and they spend a lot of time in the use of terms on how their API should and shouldn’t be used.

消除LM偏差是社区中一个活跃的研究主题，我希望在这一领域能看到更多的活动。 OpenAI清楚地意识到了这一点，并且他们在使用术语以及如何使用API上花费了大量时间。

结论 (Conclusions)

Despite the restrictions and possible toxic text GPT-3 may generate, I believe this LM is a fascinating new tool that will probably trigger improvements of NLP tasks that require to generate language. Combined with other technology and the respective safeguards it will push the AI capabilities we can use for our products even further. People may also come up with new applications of this technology nobody has yet really thought of. Translating Legalese to plain English may only be the start of further innovation this technology will spur.

尽管GPT-3可能产生限制和有毒文字，但我相信此LM是一种引人入胜的新工具，可能会触发NLP任务的改进，这些任务需要生成语言。结合其他技术和相应的防护措施，它将进一步推动我们可用于产品的AI功能。人们可能还会想出该技术尚未真正想到的新应用。将Legalese转换为简单的英语可能仅仅是该技术将推动进一步创新的开始。

翻译自: https://medium.com/@schilderf/gpt-3-the-good-the-bad-and-the-ugly-5e2e5b7f0f66

gpt mbr ext3

weixin_26712095

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
gpt mbr ext3_gpt 3善与恶

gpt mbr ext3If you follow the latest AI news, you probably came across several stunning applications of the latest Language Model (LM) released by OpenAI: GPT-3. The applications that this LM can fuel...
复制链接

扫一扫