GPT最佳实践

官方文档:https://platform.openai.com/docs/guides/gpt-best-practices

获得更好结果的六种策略

1 写清楚指示

1.1 在你的请求中包含更多细节以获取更相关的答案

WorseBetter
How do I add numbers in Excel?How do I add up a row of dollar amounts in Excel? I want to do this automatically for a whole sheet of rows with all the totals ending up on the right in a column called “Total”.
Who’s president?Who was the president of Mexico in 2021, and how frequently are elections held?
Write code to calculate the Fibonacci sequence.Write a TypeScript function to efficiently calculate the Fibonacci sequence. Comment the code liberally to explain what each piece does and why it’s written that way.
Summarize the meeting notes.Summarize the meeting notes in a single paragraph. Then write a markdown list of the speakers and each of their key points. Finally, list the next steps or action items suggested by the speakers, if any.

1.2 让模型扮演一个角色

system消息可用于指定模型在其回复中使用的角色。例如:

SYSTEM
When I ask for help to write something, you will reply with a document that contains at least one joke or playful comment in every paragraph.
USER
Write a thank you note to my steel bolt vendor for getting the delivery in on time and in short notice. This made it possible for us to deliver an important order.

在这里插入图片描述

1.3 使用分隔符清楚地指示输入的不同部分

三引号、XML标记、分段标题等分隔符可以帮助划分要区别对待的文本部分。

对于此类简单的任务,使用分隔符可能不会对输出质量产生影响。然而,任务越复杂,消除任务细节的歧义就越重要。不要让GPTs工作来准确理解你对它们的要求。

三引号

USER
Summarize the text delimited by triple quotes with a haiku.

"""insert text here"""

XML标记

SYSTEM
You will be provided with a pair of articles (delimited with XML tags) about the same topic. First summarize the arguments of each article. Then indicate which of them makes a better argument and explain why.
USER
<article> insert first article here </article>

<article> insert second article here </article>

分段标题

SYSTEM
You will be provided with a thesis abstract and a suggested title for it. The thesis title should give the reader a good idea of the topic of the thesis but should also be eye-catching. If the title does not meet these criteria, suggest 5 alternatives.
USER
Abstract: insert abstract here

Title: insert title here

1.4 指定完成某个任务所需的步骤

一些任务最好指定为一系列步骤。明确地写出步骤可以使模型更容易遵循它们。

SYSTEM
Use the following step-by-step instructions to respond to user inputs.

Step 1 - The user will provide you with text in triple quotes. Summarize this text in one sentence with a prefix that says "Summary: ".

Step 2 - Translate the summary from Step 1 into Spanish, with a prefix that says "Translation: ".
USER
"""insert text here"""

1.5 提供示例

这也被称为“few-shot”提示。

SYSTEM
Answer in a consistent style.
USER
Teach me about patience.
ASSISTANT
The river that carves the deepest valley flows from a modest spring; the grandest symphony originates from a single note; the most intricate tapestry begins with a solitary thread.
USER
Teach me about the ocean.

1.6 指定期望的输出长度

您可以要求模型生成具有给定目标长度的输出。目标输出长度可以根据单词、句子、段落、要点等的数量来指定。但是请注意,指示模型生成特定数量的单词并不具有高精度。在指定了具体几个段落或bullet point后模型可以更可靠地生成。

指定单词数

USER
Summarize the text delimited by triple quotes in about 50 words.

"""insert text here"""

指定段落数

USER
Summarize the text delimited by triple quotes in 2 paragraphs.

"""insert text here"""

指定 bullet point数

USER
Summarize the text delimited by triple quotes in 3 bullet points.

"""insert text here"""

2 提供参考文本

GPTs可以自信地编造虚假答案,尤其是当被问及深奥的话题或引文和网址时。就像一张笔记可以帮助学生在考试中做得更好一样,向GPTs提供参考文本可以帮助他们以更少的捏造回答。

2.1 指示模型依据参考文本进行回答

如果我们可以为模型提供与当前查询相关的可信信息,那么我们可以指示模型使用提供的信息来编写其答案。

SYSTEM
Use the provided articles delimited by triple quotes to answer questions. If the answer cannot be found in the articles, write "I could not find an answer."
USER
<insert articles, each delimited by triple quotes>

Question: <insert question here>

鉴于GPTs的上下文窗口有限,为了应用这种策略,我们需要某种方法来动态查找与所问问题相关的信息。Embeddings可用于实现高效的知识检索。

2.2 指示模型使用参考文本中的引用来回答问题

如果输入已经补充了相关知识,那么可以通过引用提供的文档中的段落,直接请求模型在其答案中添加引用。请注意,输出中的引用可以通过在提供的文档中进行字符串匹配来进行程序验证。

SYSTEM
You will be provided with a document delimited by triple quotes and a question. Your task is to answer the question using only the provided document and to cite the passage(s) of the document used to answer the question. If the document does not contain the information needed to answer this question then simply write: "Insufficient information." If an answer to the question is provided, it must be annotated with a citation. Use the following format for to cite relevant passages ({"citation":}).
USER
"""<insert document here>"""

Question: <insert question here>

3 将复杂任务分解为多个简单子任务

复杂的任务往往比简单的任务有更高的错误率。此外,复杂的任务通常可以被重新定义为更简单任务的工作流,在这种工作流中,早期任务的输出被用来构造以后任务的输入。

3.1 使用意图分类来识别与用户请求最相关的指令

对于需要大量不同指令来处理不同case的任务,首先对查询类型进行分类并使用该分类确定所需的指令可能是有益的。这可以通过定义固定类别并对与处理给定类别任务相关的指令进行硬编码来实现。这个过程也可以递归地应用于将任务分解为一系列阶段。这种方法的优点是每个请求只包含执行任务下一个阶段所需的指令,与使用单个请求执行整个任务相比具有更低的错误率。这也可能使得成本更低,因为更大的prompt费用更高。

SYSTEM
You will be provided with customer service queries. Classify each query into a primary category and a secondary category. Provide your output in json format with the keys: primary and secondary.

Primary categories: Billing, Technical Support, Account Management, or General Inquiry.

Billing secondary categories:
- Unsubscribe or upgrade
- Add a payment method
- Explanation for charge
- Dispute a charge

Technical Support secondary categories:
- Troubleshooting
- Device compatibility
- Software updates

Account Management secondary categories:
- Password reset
- Update personal information
- Close account
- Account security

General Inquiry secondary categories:
- Product information
- Pricing
- Feedback
- Speak to a human
USER
I need to get my internet working again.

在这里插入图片描述

根据客户请求的分类,可以向GPT模型提供一组更具体的指令来处理后续步骤。例如,假设客户需要“故障排除”方面的帮助。

SYSTEM
You will be provided with customer service inquiries that require troubleshooting in a technical support context. Help the user by:

- Ask them to check that all cables to/from the router are connected. Note that it is common for cables to come loose over time.
- If all cables are connected and the issue persists, ask them which router model they are using
- Now you will advise them how to restart their device:
-- If the model number is MTD-327J, advise them to push the red button and hold it for 5 seconds, then wait 5 minutes before testing the connection.
-- If the model number is MTD-327S, advise them to unplug and replug it, then wait 5 minutes before testing the connection.
- If the customer's issue persists after restarting the device and waiting 5 minutes, connect them to IT support by outputting {"IT support requested"}.
- If the user starts asking questions that are unrelated to this topic then confirm if they would like to end the current chat about troubleshooting and classify their request according to the following scheme:

<insert primary/secondary classification scheme from above here>
USER
I need to get my internet working again.

在这里插入图片描述

请注意,模型已被指示发出特殊字符串来指示对话状态何时发生变化。这使我们能够将我们的系统变成一个状态机,状态决定注入哪些指令。通过跟踪状态,在该状态下哪些指令是相关的,以及允许从该状态进行哪些状态转换,我们可以在用户体验周围设置护栏,这在结构较少的方法中很难实现。

3.2 对于需要很长对话的对话应用程序,总结或过滤以前的对话

由于GPT具有固定的上下文长度,因此用户和助手之间的对话(整个对话都包含在上下文窗口中)不能无限期地继续。 这个问题有各种变通方法,其中之一是总结对话中的先前回合。一旦输入的大小达到预定的阈值长度,这可能会触发一个总结部分对话的查询,并且先前对话的摘要可以作为系统消息的一部分包含在内。或者,先前的对话可以在整个对话中在后台异步汇总。 另一种解决方案是动态选择与当前查询最相关的对话的先前部分。参考使用Embedding。

3.3 将长文档分段化并递归构造完整摘要

由于GPT具有固定的上下文长度,因此它们不能用于汇总超过上下文长度减去单个查询中生成的摘要长度的文本。 要总结一个很长的文档,比如一本书,我们可以使用一系列查询来总结文档的每个部分。部分摘要可以被连接和总结,产生摘要的摘要。这个过程可以递归进行,直到整个文档被总结。如果有必要使用关于前面部分的信息来理解后面的部分,那么另一个有用的技巧是在总结书中任何给定点之前包含文本的运行摘要,同时总结该点的内容。OpenAI使用GPT-3的变体在之前的研究中研究了总结书籍的过程的有效性。

4 给GPT一些思考时间

在回答之前要求一系列推理可以帮助GPTs更可靠地推理正确答案。

4.1 指示模型在匆忙下结论之前先得到自己的解决方案

有时,当我们明确指示模型在得出结论之前从第一原理进行推理时,我们会得到更好的结果。例如,假设我们想要一个模型来评估学生对数学问题的解决方案。最明显的方法是简单地询问模型学生的解决方案是否正确。

SYSTEM
Determine if the student's solution is correct or not.
USER
Problem Statement: I'm building a solar power installation and I need help working out the financials.
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost me a flat $100k per year, and an additional $10 / square foot
What is the total cost for the first year of operations as a function of the number of square feet.

Student's Solution: Let x be the size of the installation in square feet.
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
ASSISTANT
The student's solution is correct.

但是学生的解决方案实际上是不正确的!我们可以通过提示模型首先生成自己的解决方案来让模型成功地注意到这一点。

SYSTEM
First work out your own solution to the problem. Then compare your solution to the student's solution and evaluate if the student's solution is correct or not. Don't decide if the student's solution is correct until you have done the problem yourself.
USER
Problem Statement: I'm building a solar power installation and I need help working out the financials.
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost me a flat $100k per year, and an additional $10 / square foot
What is the total cost for the first year of operations as a function of the number of square feet.

Student's Solution: Let x be the size of the installation in square feet.
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000

ASSISTANT
Let x be the size of the installation in square feet.
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 10x
Total cost: 100x + 250x + 100,000 + 10x = 360x + 100,000

The student's solution is incorrect. They made an error in the maintenance cost calculation, using 100x instead of 10x. The correct total cost for the first year of operations is 360x + 100,000.

4.2 使用内心独白或一系列的问题来隐藏模型的推理过程

先前的策略表明,在回答特定问题之前,模型有时需要详细地思考问题。对于某些应用程序,模型用于得出最终答案的推理过程可能不适合与用户共享。例如,在辅导应用程序中,我们可能希望鼓励学生自己解决问题,但模型关于学生解决方案的推理过程可能会向学生透露答案。

内心独白是一种可用于缓解这种情况的策略。内心独白的想法是指示模型将那些不应该向用户显示的输出部分放入一个结构化格式中,以便于解析。然后,在向用户呈现输出之前,对输出进行解析,只有部分输出被显示出来。

SYSTEM
Follow these steps to answer the user queries.

Step 1 - First work out your own solution to the problem. Don't rely on the student's solution since it may be incorrect. Enclose all your work for this step within triple quotes (""").

Step 2 - Compare your solution to the student's solution and evaluate if the student's solution is correct or not. Enclose all your work for this step within triple quotes (""").

Step 3 - If the student made a mistake, determine what hint you could give the student without giving away the answer. Enclose all your work for this step within triple quotes (""").

Step 4 - If the student made a mistake, provide the hint from the previous step to the student (outside of triple quotes). Instead of writing "Step 4 - ..." write "Hint:".
USER
Problem Statement: <insert problem statement>

Student Solution: <insert student solution>

或者,可以通过一系列查询来实现,其中除了最后一个查询外,所有查询的输出都对最终用户隐藏。 首先,我们可以要求模型自行解决问题。由于这个初始查询不需要学生的解决方案,因此可以省略。这提供了额外的优势,即模型的解决方案不会受到学生尝试的解决方案的影响。

接下来,我们可以让模型使用所有可用的信息来评估学生解决方案的正确性。

SYSTEM
Compare your solution to the student's solution and evaluate if the student's solution is correct or not.
USER
Problem statement: """<insert problem statement>"""

Your solution: """<insert model generated solution>"""

Student’s solution: """<insert student's solution>"""

最后,我们可以让模型利用自身的分析能力,在有用的导师角色中构建回复。

SYSTEM
You are a math tutor. If the student made an error, offer a hint to the student in a way that does not reveal the answer. If the student did not make an error, simply offer them an encouraging comment.
USER
Problem statement: """<insert problem statement>"""

Your solution: """<insert model generated solution>"""

Student’s solution: """<insert student's solution>"""

Analysis: """<insert model generated analysis from previous step>"""

4.3 询问模型在之前的结果中是否有遗漏的内容

假设我们正在使用一个模型来列出与特定问题相关的源摘录。在列出每个摘录后,模型需要确定它是否应该开始写另一个摘录,还是停止。如果源文档很大,模型通常会过早停止并未能列出所有相关的摘录。在这种情况下,通过提示模型进行后续查询,以找到之前漏掉的任何摘录,可以获得更好的性能。

SYSTEM
You will be provided with a document delimited by triple quotes. Your task is to select excerpts which pertain to the following question: "What significant paradigm shifts have occurred in the history of artificial intelligence."

Ensure that excerpts contain all relevant context needed to interpret them - in other words don't extract small snippets that are missing important context. Provide output in JSON format as follows:

[{"excerpt": "..."},
...
{"excerpt": "..."}]
USER
"""<insert document here>"""
ASSISTANT
[{"excerpt": "the model writes an excerpt here"},
...
{"excerpt": "the model writes another excerpt here"}]
USER
Are there more relevant excerpts? Take care not to repeat excerpts. Also ensure that excerpts contain all relevant context needed to interpret them - in other words don't extract small snippets that are missing important context.

5 使用外部工具

通过向GPTs提供其他工具的输出来弥补GPTs的弱点。例如,文本检索系统可以告诉GPTs相关文档。代码执行引擎可以帮助GPTs做数学和运行代码。如果一个任务可以通过工具而不是GPT更可靠或更有效地完成,卸载它以获得两者的最佳效果。

5.1 使用基于嵌入的搜索来实现高效的知识检索

如果提供作为其输入的外部信息,模型可以利用外部信息。这可以帮助模型生成更具见解和最新的响应。例如,如果用户询问有关特定电影的问题,将高质量的电影信息(例如演员,导演等)添加到模型的输入中可能会很有用。嵌入可以用于实现有效的知识检索,以便在运行时动态地将相关信息添加到模型输入中。

文本嵌入是可以衡量文本字符串相关性的向量。相似或相关的字符串将比不相关的字符串更接近。这个事实以及快速向量搜索算法的存在意味着可以使用嵌入来实现有效的知识检索。特别地,文本语料库可以分成块,每个块可以被嵌入和存储。然后可以嵌入给定的查询并执行向量搜索,以找到与查询最相关的文本块(即在嵌入空间中最接近的文本块)。

5.2 使用代码执行进行更精确的计算或调用外部API

GPTs不能单独准确地执行算术或长时间的计算。在需要这样做的情况下,可以指示模型编写和运行代码,而不是使用GPT自己去计算。特别是,可以指示模型将要运行的代码放入指定的格式,例如三个反引号。在产生输出后,可以提取并运行代码。最后,如果必要,代码执行引擎(例如Python解释器)的输出可以作为下一个查询的输入提供给模型。

SYSTEM
You can write and execute Python code by enclosing it in triple backticks, e.g. ```code goes here```. Use this to perform calculations.
USER
Find all real-valued roots of the following polynomial: 3*x**5 - 5*x**4 - 3*x**3 - 7*x - 10.

在这里插入图片描述

另一个代码执行的好用例是调用外部API。如果模型被指导正确使用API,它可以编写利用它的代码。可以通过提供文档和/或代码示例来指导模型如何使用API。

SYSTEM
You can write and execute Python code by enclosing it in triple backticks. Also note that you have access to the following module to help users send messages to their friends:

```python
import message
message.write(to="John", message="Hey, want to meetup after work?")```

在这里插入图片描述

警告:执行由模型生成的代码本质上并不安全,任何试图执行此操作的应用程序都应该采取预防措施。特别是需要一个沙盒式的代码执行环境来限制不受信任的代码可能造成的危害。

6 系统地测试更改

如果你能测量性能,提高性能会更容易。在某些情况下,对提示符的修改会在几个孤立的示例上实现更好的性能,但会在更具代表性的示例集上导致更差的整体性能。因此,为了确保更改对性能是净积极的,可能有必要定义一个全面的测试套件(也称为“eval”)。

有时候很难判断一个变化(例如新的指令或新的设计)是让你的系统变得更好还是更糟。看一些例子可能会暗示哪个更好,但是在样本量很小的情况下,很难区分真正的改进还是随机运气。也许这个变化会在某些输入上提高性能,但在其他输入上会降低性能。

评估程序(或“评估”)对于优化系统设计非常有用。好的评估应该具有以下特点:

  • 代表真实世界的使用(或至少多样化)
  • 包含许多测试用例以获得更大的统计能力(请参见下表的指南)
  • 易于自动化或重复执行
DIFFERENCE TO DETECTSAMPLE SIZE NEEDED FOR 95% CONFIDENCE
30%~10
10%~100
3%~1,000
1%~10,000

输出的评估可以由计算机、人类或混合进行。计算机可以使用客观标准(例如,具有单个正确答案的问题)自动化评估,以及一些主观或模糊标准,在这些标准中,模型输出由其他模型查询进行评估。OpenAI Evals是一个开源软件框架,提供创建自动化评估工具的工具。

当存在一系列可能被认为具有相同高质量的输出时(例如,对于长答案的问题),基于模型的评估可以很有用。基于模型的评估和需要人类评估的界限是模糊的,并且随着模型变得更加强大而不断变化。我们鼓励尝试实验,以确定基于模型的评估在您的用例中可以发挥多大作用。

6.1 参考黄金标准答案评估模型输出

假设已知一个问题的正确答案应该涉及到一组特定的已知事实。那么我们可以使用一个模型查询来计算答案中包含了多少个必要的事实。

例如,使用以下系统消息:

SYSTEM
You will be provided with text delimited by triple quotes that is supposed to be the answer to a question. Check if the following pieces of information are directly contained in the answer:

- Neil Armstrong was the first person to walk on the moon.
- The date Neil Armstrong first walked on the moon was July 21, 1969.

For each of these points perform the following steps:

1 - Restate the point.
2 - Provide a citation from the answer which is closest to this point.
3 - Consider if someone reading the citation who doesn't know the topic could directly infer the point. Explain why or why not before making up your mind.
4 - Write "yes" if the answer to 3 was yes, otherwise write "no".

Finally, provide a count of how many "yes" answers there are. Provide this count as {"count": <insert count here>}.

这是两个点都满足的示例输入:

SYSTEM
<insert system message above>
USER
"""Neil Armstrong is famous for being the first human to set foot on the Moon. This historic event took place on July 21, 1969, during the Apollo 11 mission."""

在这里插入图片描述

这是一个仅满足一个点的示例输入:

SYSTEM
<insert system message above>
USER
"""Neil Armstrong made history when he stepped off the lunar module, becoming the first person to walk on the moon."""

在这里插入图片描述

这是一个没有满足任何一点的示例输入:

SYSTEM
<insert system message above>
USER
"""In the summer of '69, a voyage grand,
Apollo 11, bold as legend's hand.
Armstrong took a step, history unfurled,
"One small step," he said, for a new world."""

有许多可能的基于模型的评估变体。考虑以下变体,它跟踪候选答案与黄金标准答案之间的重叠类型,并跟踪候选答案是否与黄金标准答案的任何部分相矛盾。

SYSTEM
Use the following steps to respond to user inputs. Fully restate each step before proceeding. i.e. "Step 1: Reason...".

Step 1: Reason step-by-step about whether the information in the submitted answer compared to the expert answer is either: disjoint, equal, a subset, a superset, or overlapping (i.e. some intersection but not subset/superset).

Step 2: Reason step-by-step about whether the submitted answer contradicts any aspect of the expert answer.

Step 3: Output a JSON object structured like: {"type_of_overlap": "disjoint" or "equal" or "subset" or "superset" or "overlapping", "contradiction": true or false}

以下是一个示例输入,其答案不符合标准,但与专家答案并不矛盾:

SYSTEM
<insert system message above>
USER
Question: """What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time."""

Submitted Answer: """Didn't he walk on the moon or something?"""

Expert Answer: """Neil Armstrong is most famous for being the first person to walk on the moon. This historic event occurred on July 21, 1969."""

在这里插入图片描述

这是一个与专家答案直接矛盾的示例输入:

SYSTEM
<insert system message above>
USER
Question: """What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time."""

Submitted Answer: """On the 21st of July 1969, Neil Armstrong became the second person to walk on the moon, following after Buzz Aldrin."""

Expert Answer: """Neil Armstrong is most famous for being the first person to walk on the moon. This historic event occurred on July 21, 1969."""

在这里插入图片描述

在这里插入图片描述

这是一个带有正确答案的示例输入,它还提供了比必要的更多的细节:

SYSTEM
<insert system message above>
USER
Question: """What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time."""

Submitted Answer: """At approximately 02:56 UTC on July 21st 1969, Neil Armstrong became the first human to set foot on the lunar surface, marking a monumental achievement in human history."""

Expert Answer: """Neil Armstrong is most famous for being the first person to walk on the moon. This historic event occurred on July 21, 1969."""

在这里插入图片描述

7 其他资源

For more inspiration, visit the OpenAI Cookbook, which contains example code and also links to third-party resources such as:

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值