【论文阅读】要使用工具!《Toolformer: Language Models Can Teach Themselves to Use Tools》







  • 工具的使用应该以自监督的方式学习,而不需要大量的人类注释。这不仅因为注释成本高昂,而且因为人类认为有用的东西可能与模型认为有用的不同。
  • 语言模型不应该丧失其通用性,并且应该能够自己决定何时以及如何使用哪个工具。与现有方法不同,这使得工具的使用更加全面,不局限于特定任务。


由于这种方法与正在使用的数据集无关,因此可以将其应用于最初用于预训练模型的完全相同的数据集。这确保了模型不会失去任何通用性和语言建模能力。文章在各种不同的下游任务上进行了实验,证明了在学习使用工具后,基于预训练的 GPT-J 模型(具有 6.7B 参数)的 Toolformer,实现了更强的零样本结果,在各种的任务上,明显优于更大的 GPT-3 模型以及其他几个基线模型。




Toolformer 模型的目标,赋予语言模型 M 通过 API 调用使用不同工具的能力。
对于调用的API,要求每个 API 的输入和输出都能表示为文本序列。这允许无缝地将 API 调用插入任何给定的文本中,使用特殊的Token来标记每个调用的开始和结束。
不包括和包括其API调用结果的 API 调用的线性序列表示如下:
ac 是API 的名称,ic是相应的输入,r是输出。
其中,“”,“”和 “→” 是特殊的Token。在实践中,文章为了不修改现有语言模型的词汇表,使用标记序列 “[”,“]” 和 “->” 来表示 “”,“” 和 “→”。为了可读性,文章在整个部分中仍然将它们称为 “”,“” 和 “→”。

给定一个普通文本数据集C,其转换为一个增强了 API 调用的数据集𝒞∗。这通过三个步骤完成,如图 2 所示。首先,利用语言模型的上下文学习能力来采样大量的潜在 API 调用。然后执行这些 API 调用,并最终检查所获得的响应是否有助于预测未来的Token,这被用作过滤标准。过滤后,将不同工具的 API 调用合并,得到增强的数据集𝒞∗,并在该数据集上微调语言模型。


下面依次看看Toolformer 模型的技术细节。

采样 API 调用

对于每个 API,通过一个提示 P(x) ,提示中带有语言模型注释 API 调用的示例。
图 3 展示了一个针对问答工具的提示示例;附录 A.2 显示了所有使用的提示。
首先是计算文本序列中每一个位置中调用API的概率。语言模型是顺序的预测下一个Token,对于序列中的每个位置𝑖∈{1,…,𝑛},基于提示P(x)和前面的输入x1:i−1,预测下一个Token为的概率,这里给定一个阈值,超过阈值就保留这个位置,如果有超过 𝑘 个这样的位置,只保留前𝑘 个。
然后对于每一个需要插入API调用的位置,给定序列 [𝑃(𝑥),𝑥1,…,𝑥𝑖−1,] 作为前缀和 作为序列结束标记,语言模型生成多达 𝑚 个 API 调用。

执行 API 调用

这一步很容易理解,执行由上一步生成的所有 API 调用以获得相应的结果。具体执行过程完全取决于 API 本身——例如,它可能涉及调用另一个神经网络、执行 Python 脚本或使用检索系统在大型语料库中进行搜索。每个 API 调用的响应需要是一个单一的文本序列。

过滤 API 调用

如何过滤?简单的说,如果提供 API 调用的输入和输出可以使模型𝑀更容易预测未来的Token,与完全不接收 API 调用或只接收其输入相比,API 调用对模型是有帮助的,就保留API,否则就过滤掉。
表示为公式,设𝑖 是 API 调用的位置,引入权重序列wj,通过下面的公式:
L就是模型M以𝑧 为前缀,在输入序列x下的加权交叉熵损失。
L+是如果将 API 调用及其结果作为前缀提供给 𝑀的损失。
L-是如果不进行 API 调用或仅提供 API 调用的输入但不提供结果,所获得的损失的最小值。


经过API过滤后,将剩余的 API 调用与原始输入融合。就是,将输入文本序列在前面确定的位置插入相应的API调用,形成新的文本序列,如果有多个 API 调用的文本,类似地进行操作,这样就形成了带有API调用的新的数据集。使用这个新数据集,使用标准的语言建模目标微调语言模型,微调后的语言模型能够基于其自身的反馈决定何时以及如何使用哪个工具。


在使用上面的方法微调语言模型 M 后生成文本时,执行常规解码,直到 M生成 “→” Token,表明语言模型接下来期望获得 API 调用的响应。此时,中断解码过程,调用相应的 API 获取响应,并在插入响应和 Token后继续解码过程。


文章使用的API工具的要求:(i)工具的输入和输出都可以表示为文本序列,以及 (ii)可以获得一些其预期用途的演示。








附录 A.2 中使用的提示


Question Answering
We use the following prompt for the question answering tool: {spverbatim} Your task is to add calls to a Question Answering API to a piece of text. The questions should help you get information required to complete the text. You can call the API by writing ”[QA(question)]” where ”question” is the question you want to ask. Here are some examples of API calls: Input: Joe Biden was born in Scranton, Pennsylvania. Output: Joe Biden was born in [QA(”Where was Joe Biden born?”)] Scranton, [QA(”In which state is Scranton?”)] Pennsylvania.

Input: Coca-Cola, or Coke, is a carbonated soft drink manufactured by the Coca-Cola Company. Output: Coca-Cola, or [QA(”What other name is Coca-Cola known by?”)] Coke, is a carbonated soft drink manufactured by [QA(”Who manufactures Coca-Cola?”)] the Coca-Cola Company.

Input: x Output:

We use the following prompt for the calculator: {spverbatim} Your task is to add calls to a Calculator API to a piece of text. The calls should help you get information required to complete the text. You can call the API by writing ”[Calculator(expression)]” where ”expression” is the expression to be computed. Here are some examples of API calls: Input: The number in the next term is 18 + 12 x 3 = 54. Output: The number in the next term is 18 + 12 x 3 = [Calculator(18 + 12 * 3)] 54.

Input: The population is 658,893 people. This is 11.4Output: The population is 658,893 people. This is 11.4

Input: A total of 252 qualifying matches were played, and 723 goals were scored (an average of 2.87 per match). This is three times less than the 2169 goals last year. Output: A total of 252 qualifying matches were played, and 723 goals were scored (an average of [Calculator(723 / 252)] 2.87 per match). This is twenty goals more than the [Calculator(723 - 20)] 703 goals last year.

Input: I went to Paris in 1994 and stayed there until 2011, so in total, it was 17 years. Output: I went to Paris in 1994 and stayed there until 2011, so in total, it was [Calculator(2011 - 1994)] 17 years.

Input: From this, we have 4 * 30 minutes = 120 minutes. Output: From this, we have 4 * 30 minutes = [Calculator(4 * 30)] 120 minutes.

Input: x Output:

Wikipedia Search
We use the following prompt for the Wikipedia search tool: {spverbatim} Your task is to complete a given piece of text. You can use a Wikipedia Search API to look up information. You can do so by writing ”[WikiSearch(term)]” where ”term” is the search term you want to look up. Here are some examples of API calls: Input: The colors on the flag of Ghana have the following meanings: red is for the blood of martyrs, green for forests, and gold for mineral wealth. Output: The colors on the flag of Ghana have the following meanings: red is for [WikiSearch(”Ghana flag red meaning”)] the blood of martyrs, green for forests, and gold for mineral wealth.

Input: But what are the risks during production of nanomaterials? Some nanomaterials may give rise to various kinds of lung damage. Output: But what are the risks during production of nanomaterials? [WikiSearch(”nanomaterial production risks”)] Some nanomaterials may give rise to various kinds of lung damage.

Input: Metformin is the first-line drug for patients with type 2 diabetes and obesity. Output: Metformin is the first-line drug for [WikiSearch(”Metformin first-line drug”)] patients with type 2 diabetes and obesity.

Input: x Output:

Machine Translation
We use the following prompt for the machine translation tool:

Your task is to complete a given piece of text by using a Machine Translation API. You can do so by writing ”[MT(text)]” where text is the text to be translated into English. Here are some examples:

Input: He has published one book: O homem suprimido (“The Supressed Man”) Output: He has published one book: O homem suprimido [MT(O homem suprimido)] (“The Supressed Man”)

Input: In Morris de Jonge’s Jeschuah, der klassische jüdische Mann, there is a description of a Jewish writer Output: In Morris de Jonge’s Jeschuah, der klassische jüdische Mann [MT(der klassische jüdische Mann)], there is a description of a Jewish writer

Input: 南京高淳县住房和城乡建设局 城市新区设计 a plane of reference Gaochun is one of seven districts of the provincial capital Nanjing Output: [MT(南京高淳县住房和城乡建设局 城市新区设计)] a plane of reference Gaochun is one of seven districts of the provincial capital Nanjing

Input: x Output:

We use the following prompt for the calendar tool:

Your task is to add calls to a Calendar API to a piece of text. The API calls should help you get information required to complete the text. You can call the API by writing ”[Calendar()]” Here are some examples of API calls:

Input: Today is the first Friday of the year. Output: Today is the first [Calendar()] Friday of the year.

Input: The president of the United States is Joe Biden. Output: The president of the United States is [Calendar()] Joe Biden.

Input: The current day of the week is Wednesday. Output: The current day of the week is [Calendar()] Wednesday.

Input: The number of days from now until Christmas is 30. Output: The number of days from now until Christmas is [Calendar()] 30.

Input: The store is never open on the weekend, so today it is closed. Output: The store is never open on the weekend, so today [Calendar()] it is closed.

Input: x Output:





