【论文阅读】要使用工具！《Toolformer: Language Models Can Teach Themselves to Use Tools》

最新推荐文章于 2024-10-16 21:42:11 发布

bylander

最新推荐文章于 2024-10-16 21:42:11 发布

阅读量1.2k

点赞数 28

分类专栏： AI论文阅读文章标签：论文阅读语言模型人工智能 transformer 自然语言处理

本文链接：https://blog.csdn.net/bylander/article/details/139033517

版权

AI论文阅读专栏收录该内容

45 篇文章 2 订阅

订阅专栏

论文链接：https://ar5iv.labs.arxiv.org/html/2302.04761
人与动物的区别，就在于人会使用工具。大模型旨在模范人的思维，理应学会使用工具。Toolformer，在通过少量人类的引导后，让模型学会使用工具。
在这里插入图片描述

摘要

语言模型（LMs）表现出了从极少量的示例或文本指令中解决新任务的显著能力，尤其是在模型规模较大时表现的更加显著。矛盾的是，它们也在与基本功能作斗争，如算术或事实查找，在这些功能中，更简单、更小的模型脱颖而出。在本文中，我们展示了LMs可以通过简单的API自学使用外部工具，从而实现两全其美。我们介绍了Toolformer，这是一个经过训练的模型，用于决定调用哪些API，何时调用它们，传递什么参数，以及如何将结果最好地结合到未来的Token预测中。这是以一种自监督的方式完成的，只需要为每个API进行少量的演示。我们整合了一系列工具，包括计算器、问答系统、搜索引擎、翻译系统和日历。Toolformer在不牺牲其核心语言建模能力的情况下，在各种下游任务中实现了显著改进的零样本性能，在很多情况下可以与更大的模型相媲美。

介绍

大型语言模型在各种自然语言处理任务上取得了令人印象深刻的零样本和少样本结果，并展示出几种新兴能力。然而，所有这些模型都有一些固有的局限性，这些局限性需要通过进一步扩展来部分解决。这些局限性包括无法获取关于近期事件的最新信息以及与之相关的倾向于虚构事实，对资源较少的语言存在理解困难，缺乏执行精确计算的数学技能以及对时间进展的无意识。

克服语言模型的这些局限性的一个简单方法，是赋予它们使用搜索引擎、计算器或日历等外部工具的能力。然而，现有的方法要么依赖于大量的人类注释，要么将工具使用限制在特定任务的设置中，这阻碍了在语言模型中更广泛地采用工具使用。
因此，我们提出了Toolformer，这是一个以新颖方式学习使用工具的模型，满足了以下目的：

工具的使用应该以自监督的方式学习，而不需要大量的人类注释。这不仅因为注释成本高昂，而且因为人类认为有用的东西可能与模型认为有用的不同。
语言模型不应该丧失其通用性，并且应该能够自己决定何时以及如何使用哪个工具。与现有方法不同，这使得工具的使用更加全面，不局限于特定任务。

文章实现这些目标的方法基于最近的想法，就是使用具有上下文学习的大型LM从头开始生成整个数据集：仅给出几个关于如何使用API的人工编写示例，通过这些示例，让LM注释具有潜在API调用的大型语言建模数据集。然后，使用自监督损失来确定这些API调用中的哪些调用实际上有助于模型预测未来Token。最后，在LM认为有用的API调用上微调LM本身。如图1所示，通过这种简单的方法，LM可以学会控制各种工具，并自行选择何时使用以及如何使用哪种工具。

由于这种方法与正在使用的数据集无关，因此可以将其应用于最初用于预训练模型的完全相同的数据集。这确保了模型不会失去任何通用性和语言建模能力。文章在各种不同的下游任务上进行了实验，证明了在学习使用工具后，基于预训练的 GPT-J 模型（具有 6.7B 参数）的 Toolformer，实现了更强的零样本结果，在各种的任务上，明显优于更大的 GPT-3 模型以及其他几个基线模型。

在这里插入图片描述
图1展示了Toolformer的示例预测。该模型自动决定调用不同的API（从上到下：问答系统、计算器、机器翻译系统和Wikipedia搜索引擎）以获取对完成一段文本有用的信息。

图2展示了方法的关键步骤，以问答工具为例：给定一个输入文本x，我们首先采样一个位置和相应的API调用候选项。然后执行这些API调用，并过滤掉所有没有减少下一个token上损失的调用。所有剩余的API调用与原始文本交错，形成一个新的文本。

主要方法

Toolformer 模型的目标，赋予语言模型 M 通过 API 调用使用不同工具的能力。
对于调用的API，要求每个 API 的输入和输出都能表示为文本序列。这允许无缝地将 API 调用插入任何给定的文本中，使用特殊的Token来标记每个调用的开始和结束。
不包括和包括其API调用结果的 API 调用的线性序列表示如下：
在这里插入图片描述
ac 是API 的名称，ic是相应的输入，r是输出。
其中，“”，“”和 “→” 是特殊的Token。在实践中，文章为了不修改现有语言模型的词汇表，使用标记序列 “[”，“]” 和 “->” 来表示 “”，“” 和 “→”。为了可读性，文章在整个部分中仍然将它们称为 “”，“” 和 “→”。

给定一个普通文本数据集C，其转换为一个增强了 API 调用的数据集𝒞∗。这通过三个步骤完成，如图 2 所示。首先，利用语言模型的上下文学习能力来采样大量的潜在 API 调用。然后执行这些 API 调用，并最终检查所获得的响应是否有助于预测未来的Token，这被用作过滤标准。过滤后，将不同工具的 API 调用合并，得到增强的数据集𝒞∗，并在该数据集上微调语言模型。

重点：可以看到，在前面的文章，SELF-INSTRUCT、SteerLM以及检索技术等，都有类似的技术。基本的套路就是，或者通过提示、或者通过一个小模型，进行数据集的重新标注，然后通过一些过滤技术形成新的更高质量的数据集，基于新的数据集重新微调语言模型。核心就是减少人工工作，通过启发充分激发大模型的能力，通过模型自举完成模型的自我性能提升。

下面依次看看Toolformer 模型的技术细节。

采样 API 调用

对于每个 API，通过一个提示 P(x) ，提示中带有语言模型注释 API 调用的示例。
图 3 展示了一个针对问答工具的提示示例；附录 A.2 显示了所有使用的提示。
在这里插入图片描述
首先是计算文本序列中每一个位置中调用API的概率。语言模型是顺序的预测下一个Token，对于序列中的每个位置𝑖∈{1,…,𝑛}，基于提示P(x)和前面的输入x1:i−1，预测下一个Token为的概率，这里给定一个阈值，超过阈值就保留这个位置，如果有超过 𝑘 个这样的位置，只保留前𝑘 个。
然后对于每一个需要插入API调用的位置，给定序列 [𝑃(𝑥),𝑥1,…,𝑥𝑖−1,] 作为前缀和作为序列结束标记，语言模型生成多达 𝑚 个 API 调用。

执行 API 调用

这一步很容易理解，执行由上一步生成的所有 API 调用以获得相应的结果。具体执行过程完全取决于 API 本身——例如，它可能涉及调用另一个神经网络、执行 Python 脚本或使用检索系统在大型语料库中进行搜索。每个 API 调用的响应需要是一个单一的文本序列。

过滤 API 调用

如何过滤？简单的说，如果提供 API 调用的输入和输出可以使模型𝑀更容易预测未来的Token，与完全不接收 API 调用或只接收其输入相比，API 调用对模型是有帮助的，就保留API，否则就过滤掉。
表示为公式，设𝑖 是 API 调用的位置，引入权重序列wj，通过下面的公式：
在这里插入图片描述
L就是模型M以𝑧 为前缀，在输入序列x下的加权交叉熵损失。
L+是如果将 API 调用及其结果作为前缀提供给 𝑀的损失。
L-是如果不进行 API 调用或仅提供 API 调用的输入但不提供结果，所获得的损失的最小值。
再设置一个阈值，通过下面的公式决定是保留还是过滤API调用：
在这里插入图片描述

模型微调

经过API过滤后，将剩余的 API 调用与原始输入融合。就是，将输入文本序列在前面确定的位置插入相应的API调用，形成新的文本序列，如果有多个 API 调用的文本，类似地进行操作，这样就形成了带有API调用的新的数据集。使用这个新数据集，使用标准的语言建模目标微调语言模型，微调后的语言模型能够基于其自身的反馈决定何时以及如何使用哪个工具。

推理

在使用上面的方法微调语言模型 M 后生成文本时，执行常规解码，直到 M生成 “→” Token，表明语言模型接下来期望获得 API 调用的响应。此时，中断解码过程，调用相应的 API 获取响应，并在插入响应和 Token后继续解码过程。

API工具

文章使用的API工具的要求：（i）工具的输入和输出都可以表示为文本序列，以及（ii）可以获得一些其预期用途的演示。

文章使用的工具包括五种工具：问答系统、维基百科搜索引擎、计算器、日历和机器翻译系统.
在这里插入图片描述

方法限制

工具链式使用的限制：Toolformer目前无法将一个工具的输出作为另一个工具的输入来使用（即不支持工具的链式调用）。这是因为每个工具的API调用是独立生成的，导致在微调数据集中没有链式工具使用的示例。

交互式使用工具的限制：当前的方法不允许语言模型以交互式的方式使用工具，特别是对于那些可能返回数百种不同结果的工具（如搜索引擎），能够浏览这些结果或根据需要细化搜索查询对于某些应用可能至关重要。

输入措辞敏感性：使用Toolformer训练的模型在决定是否调用API时，通常对输入的确切措辞非常敏感；这可能是因为语言模型在零样本和少样本设置中对提示非常敏感。

样本效率低：根据所使用的工具，该方法在样本效率上也存在问题。例如，处理超过一百万份文档可能只产生几千个对计算器API有用的调用示例。

未考虑计算成本：在决定是否进行API调用时，Toolformer目前没有考虑到进行API调用所带来的工具依赖的计算成本。

附录 A.2 中使用的提示

下面是用于为所使用的每个工具采样API调用的提示：

Question Answering
We use the following prompt for the question answering tool: {spverbatim} Your task is to add calls to a Question Answering API to a piece of text. The questions should help you get information required to complete the text. You can call the API by writing ”[QA(question)]” where ”question” is the question you want to ask. Here are some examples of API calls: Input: Joe Biden was born in Scranton, Pennsylvania. Output: Joe Biden was born in [QA(”Where was Joe Biden born?”)] Scranton, [QA(”In which state is Scranton?”)] Pennsylvania.

Input: Coca-Cola, or Coke, is a carbonated soft drink manufactured by the Coca-Cola Company. Output: Coca-Cola, or [QA(”What other name is Coca-Cola known by?”)] Coke, is a carbonated soft drink manufactured by [QA(”Who manufactures Coca-Cola?”)] the Coca-Cola Company.

Input: x Output:

Calculator
We use the following prompt for the calculator: {spverbatim} Your task is to add calls to a Calculator API to a piece of text. The calls should help you get information required to complete the text. You can call the API by writing ”[Calculator(expression)]” where ”expression” is the expression to be computed. Here are some examples of API calls: Input: The number in the next term is 18 + 12 x 3 = 54. Output: The number in the next term is 18 + 12 x 3 = [Calculator(18 + 12 * 3)] 54.

Input: The population is 658,893 people. This is 11.4Output: The population is 658,893 people. This is 11.4

Input: A total of 252 qualifying matches were played, and 723 goals were scored (an average of 2.87 per match). This is three times less than the 2169 goals last year. Output: A total of 252 qualifying matches were played, and 723 goals were scored (an average of [Calculator(723 / 252)] 2.87 per match). This is twenty goals more than the [Calculator(723 - 20)] 703 goals last year.

Input: I went to Paris in 1994 and stayed there until 2011, so in total, it was 17 years. Output: I went to Paris in 1994 and stayed there until 2011, so in total, it was [Calculator(2011 - 1994)] 17 years.

Input: From this, we have 4 * 30 minutes = 120 minutes. Output: From this, we have 4 * 30 minutes = [Calculator(4 * 30)] 120 minutes.

Input: x Output:

Wikipedia Search
We use the following prompt for the Wikipedia search tool: {spverbatim} Your task is to complete a given piece of text. You can use a Wikipedia Search API to look up information. You can do so by writing ”[WikiSearch(term)]” where ”term” is the search term you want to look up. Here are some examples of API calls: Input: The colors on the flag of Ghana have the following meanings: red is for the blood of martyrs, green for forests, and gold for mineral wealth. Output: The colors on the flag of Ghana have the following meanings: red is for [WikiSearch(”Ghana flag red meaning”)] the blood of martyrs, green for forests, and gold for mineral wealth.

Input: But what are the risks during production of nanomaterials? Some nanomaterials may give rise to various kinds of lung damage. Output: But what are the risks during production of nanomaterials? [WikiSearch(”nanomaterial production risks”)] Some nanomaterials may give rise to various kinds of lung damage.

Input: Metformin is the first-line drug for patients with type 2 diabetes and obesity. Output: Metformin is the first-line drug for [WikiSearch(”Metformin first-line drug”)] patients with type 2 diabetes and obesity.

Input: x Output:

Machine Translation
We use the following prompt for the machine translation tool:

{spverbatim}
Your task is to complete a given piece of text by using a Machine Translation API. You can do so by writing ”[MT(text)]” where text is the text to be translated into English. Here are some examples:

Input: He has published one book: O homem suprimido (“The Supressed Man”) Output: He has published one book: O homem suprimido [MT(O homem suprimido)] (“The Supressed Man”)

Input: In Morris de Jonge’s Jeschuah, der klassische jüdische Mann, there is a description of a Jewish writer Output: In Morris de Jonge’s Jeschuah, der klassische jüdische Mann [MT(der klassische jüdische Mann)], there is a description of a Jewish writer

Input: 南京高淳县住房和城乡建设局城市新区设计 a plane of reference Gaochun is one of seven districts of the provincial capital Nanjing Output: [MT(南京高淳县住房和城乡建设局城市新区设计)] a plane of reference Gaochun is one of seven districts of the provincial capital Nanjing

Input: x Output:

Calendar
We use the following prompt for the calendar tool:

{spverbatim}
Your task is to add calls to a Calendar API to a piece of text. The API calls should help you get information required to complete the text. You can call the API by writing ”[Calendar()]” Here are some examples of API calls:

Input: Today is the first Friday of the year. Output: Today is the first [Calendar()] Friday of the year.

Input: The president of the United States is Joe Biden. Output: The president of the United States is [Calendar()] Joe Biden.

Input: The current day of the week is Wednesday. Output: The current day of the week is [Calendar()] Wednesday.

Input: The number of days from now until Christmas is 30. Output: The number of days from now until Christmas is [Calendar()] 30.

Input: The store is never open on the weekend, so today it is closed. Output: The store is never open on the weekend, so today [Calendar()] it is closed.

Input: x Output: