self instruct 介绍

SELF-INSTRUCT: Aligning Language Model with Self Generated Instructions

github: self-instruct

背景

大模型表现惊人,但是严重依赖于人工编写的指令数据。然而,依靠人工标注数据成本高昂。本文中提出self-instruct框架:一种基于大模型自动生成指令数据的方法。

主要步骤

1. instruction generation

 这个步骤主要是生成指令(instruction),初始编写了175个种子任务(每个任务有1个指令和1个实例)作为启动任务池。对于每一步,从这个池中抽取8个任务指令作为上下文示例。在这8条指令中,有6条来自于人工编写的任务,还有2条来自于之前步骤中生成的指令,这么做目的是为了保证多样性。抽取出的指令按照table 6 创建prompt,让模型生成新的指令。

1.1 超参

results = make_gpt3_requests(
                engine=args.engine,  # davinci
                prompts=batch_inputs,
                max_tokens=1024,
                temperature=0.7,
                top_p=0.5,
                frequency_penalty=0,
                presence_penalty=2,
                stop_sequences=["\n\n", "\n16", "16.", "16 ."],
                logprobs=1,
                n=1,
                best_of=1,
                api_key=args.api_key,
                organization=args.organization,
            )

2.Classification Task Identification

因为分类和非分类任务的处理方法不一样,所以这一步要判断指令是否为分类任务。通过从种子任务中抽取出12个分类任务和19个非分类任务。

模板是固定死的,不像table 6每次都是重新抽样

构建模板如下:

template_1 = '''Can the following task be regarded as a classification task with finite output labels?

Task: Given my personality and the job, tell me if I would be suitable.
Is it classification? Yes

Task: Give me an example of a time when you had to use your sense of humor.
Is it classification? No

Task: Replace the placeholders in the given text with appropriate named entities.
Is it classification? No

Task: Fact checking - tell me if the statement is true, false, or unknown, based on your knowledge and common sense.
Is it classification? Yes

Task: Return the SSN number for the person.
Is it classification? No

Task: Detect if the Reddit thread contains hate speech.
Is it classification? Yes

Task: Analyze the sentences below to identify biases.
Is it classification? No

Task: Select the longest sentence in terms of the number of words in the paragraph, output the sentence index.
Is it classification? Yes

Task: Find out the toxic word or phrase in the sentence.
Is it classification? No

Task: Rank these countries by their population.
Is it classification? No

Task: You are provided with a news article, and you need to identify all the categories that this article belongs to. Possible categories include: Music, Sports, Politics, Tech, Finance, Basketball, Soccer, Tennis, Entertainment, Digital Game, World News. Output its categories one by one, seperated by comma.
Is it classification? Yes

Task: Given the name of an exercise, explain how to do it.
Is it classification? No

Task: Select the oldest person from the list.
Is it classification? Yes

Task: Find the four smallest perfect numbers.
Is it classification? No

Task: Does the information in the document supports the claim? You can answer "Support" or "Unsupport".
Is it classification? Yes

Task: Create a detailed budget for the given hypothetical trip.
Is it classification? No

Task: Given a sentence, detect if there is any potential stereotype in it. If so, you should explain the stereotype. Else, output no.
Is it classification? No

Task: Explain the following idiom to me, and try to give me some examples.
Is it classification? No

Task: Is there anything I can eat for a breakfast that doesn't include eggs, yet includes protein, and has roughly 700-1000 calories?
Is it classification? No

Task: Answer the following multiple choice question. Select A, B, C, or D for the final answer.
Is it classification? Yes

Task: Decide whether the syllogism is logically sound.
Is it classification? Yes

Task: How can individuals and organizations reduce unconscious bias?
Is it classification? No

Task: What are some things you can do to de-stress?
Is it classification? No

Task: Find out the largest one from a set of numbers. Output the number directly.
Is it classification? Yes

Task: Replace the <mask> token in the text with proper words that are consistent with the context. You can use multiple words for each <mask> token.
Is it classification? No

Task: Write a cover letter based on the given facts.
Is it classification? No

Task: Identify the pos tag of the word in the given sentence.
Is it classification? Yes

Task: Write a program to compute the sum of integers from k to n.
Is it classification? No

Task: In this task, you need to compare the meaning of the two sentences and tell if they are the same. Output yes or no.
Is it classification? Yes

Task: To make the pairs have the same analogy, write the fourth word.
Is it classification? No

Task: Given a set of numbers, find all possible subsets that sum to a given number.
Is it classification? No

Task: {instruction for the target task}'''

2.1 超参

results = make_gpt3_requests(
                    engine=args.engine,  # davinci
                    prompts=prompts,
                    max_tokens=3,
                    temperature=0,
                    top_p=0,
                    frequency_penalty=0,
                    presence_penalty=0,
                    stop_sequences=["\n", "Task"],
                    logprobs=1,
                    n=1,
                    best_of=1,
                    api_key=args.api_key,
                    organization=args.organization)

 3.  Instance Generation

 这一步主要目标生成实例(instance), 分为 Input-first Approach和 Output-first Approach  两种方法。

注意:

1. 两种方法prompt模板都是固定死的,这导致了对于每个task,只生成了很少的instance,论文也提出了,后续可以考虑通过随机抽样构建prompt模板,来达到生成多个instance的目的。

3.1 Input-first Approach

这个方法主要是用于生成非分类任务。

Input-first Approach 模板如下:


input_first_template_for_gen = '''Come up with examples for the following tasks. Try to generate multiple examples when possible. If the task doesn't require additional input, you can generate the output directly.

Task: Which exercises are best for reducing belly fat at home?
Output:
- Lying Leg Raises
- Leg In And Out
- Plank
- Side Plank
- Sit-ups

Task: Extract all the country names in the paragraph, list them separated by commas.
Example 1
Paragraph: Dr. No is the sixth novel by the English author Ian Fleming to feature his British Secret Service agent James Bond. Written at Fleming's Goldeneye estate in Jamaica, it was first published in the United Kingdom by Jonathan Cape in 1958. In the novel Bond looks into the disappearance in Jamaica of two fellow MI6 operatives who had been investigating Doctor No. Bond travels to No's Caribbean island and meets Honeychile Rider, who is there to collect shells. They are captured and taken to a luxurious facility carved into a mountain. The character of Doctor No, the son of a German missionary and a Chinese woman, was influenced by Sax Rohmer's Fu Manchu stories. Dr. No was the first of Fleming's novels to face widespread negative reviews in Britain, but it was received more favourably in the United States.
Output: English, British, Jamaica, the United Kingdom, German, Chinese, Britain, the United States.

Task: Converting 85 F to Celsius.
Output: 85°F = 29.44°C

Task: Sort the given list ascendingly. 
Example 1
List: [10, 92, 2, 5, -4, 92, 5, 101]
Output: [-4, 2, 5, 5, 10, 92, 92, 101]
Example 2
Input 2 - List: [9.99, 10, -5, -1000, 5e6, 999]
Output: [-1000, -5, 9.99, 10, 999, 5e6]

Task: Suggest a better and more professional rephrasing of the following sentence.
Example 1
Sentence: This house is surprisingly not constructed very well, and you probably need more money to fix it after you buy it. If you ask me, I would suggest you to consider other candidates.
Output: This house does not seem to be constructed well, so you may need to spend more money to fix it after you purchase it. I would suggest that you look at other properties.
Example 2
Sentence: Just so you know, we did an experiment last week and found really surprising results - language model can improve itself!
Output: Our experiments last week demonstrated surprising results, proving that the language model can improve itself.

Task: Read the following paragraph and answer a math question about the paragraph. You need to write out the calculation for getting the final answer.
Example 1
Paragraph: Gun violence in the United States results in tens of thousands of deaths and injuries annually, and was the leading cause of death for children 19 and younger in 2020. In 2018, the most recent year for which data are available as of 2021, the Centers for Disease Control and Prevention's (CDC) National Center for Health Statistics reports 38,390 deaths by firearm, of which 24,432 were by suicide. The rate of firearm deaths per 100,000 people rose from 10.3 per 100,000 in 1999 to 12 per 100,000 in 2017, with 109 people dying per day or about 14,542 homicides in total, being 11.9 per 100,000 in 2018. In 2010, there were 19,392 firearm-related suicides, and 11,078 firearm-related homicides in the U.S. In 2010, 358 murders were reported involving a rifle while 6,009 were reported involving a handgun; another 1,939 were reported with an unspecified type of firearm. In 2011, a total of 478,400 fatal and nonfatal violent crimes were committed with a firearm.
Question: How many more firearm-related deaths were there in 2018 compared to 2010?
Output:
38390 - (19392 + 11078) = 38390 - 30470 = 7920. 
So, in 2018, there were 7920 more deaths by firearm than in 2010.

Task: Write Python code to solve this leetcode problem.
Example 1
Problem: You are given two non-empty linked lists representing two non-negative integers. The digits are stored in reverse order, and each of their nodes contains a single digit. Add the two numbers and return the sum as a linked list. You may assume the two numbers do not contain any leading zero, except the number 0 itself.
Output:
class Solution(object):
    def addTwoNumbers(self, l1, l2):
        carry = 0
        root = n = ListNode(0)
        while l1 or l2 or carry:
            v1 = v2 = 0
            if l1:
                v1 = l1.val
                l1 = l1.next
            if l2:
                v2 = l2.val
                l2 = l2.next
            carry, val = divmod(v1+v2+carry, 10)
            n.next = ListNode(val)
            n = n.next
        return root.next

Task: Solve the equation and find the value of X. Show your steps.
Example 1
Equation: 10X + 5 = 10
Output: 10X = 5,  X = 0.5
Example 2
Equation: X + Y + 120 = 100
Output: X + Y = -20, X = -20 - Y

Task: Write a program to compute the sum of integers from k to n.
Output:
def sum(k, n):
    sum = 0
    for i in range(k, n+1):
        sum += i
    return sum

Task: Select the oldest person from the given list.
Example 1
List: George Washington, Confucius, Michael Jordan, Michelangelo
Output: Confucious
Example 2
List: Alan Turing, Geoffrey Hinton, Yann LeCun, Yoshua Bengio
Output: Alan Turing

Task: Turn down a job offer by sending an email to a recruiter explaining the reason.
Output: Hi  [Recruiter],
Thank you so much for the generous offer to join your team. As we discussed, I’ve admired the company for a number of years, and am a proud endorser of its products. However, after further consideration of where I currently am in my career, I’ve decided to accept an offer at another company.
I would love to stay in touch with you and have already started following you on [Social Media Platform]. Again, thank you so much for your time and consideration.
Thanks again,
[Your Name]

Task: {instruction for the target task}'''

3.2 Output-first Approach

这个方法主要是用于生成分类任务的。

为何分类任务要使用Output-first Approach ? 是因为模型倾向于输出只有一个标签的输入,尤其对于分类任务。

论文原话:

Output-first Approach 模板如下:

output_first_template_for_clf = '''Given the classification task definition and the class labels, generate an input that corresponds to each of the class labels. If the task doesn't require input, just generate possible class labels.

Task: Classify the sentiment of the sentence into positive, negative, or mixed.
Class label: mixed
Sentence: I enjoy the flavor of the restaurant but their service is too slow.
Class label: Positive
Sentence: I had a great day today. The weather was beautiful and I spent time with friends and family.
Class label: Negative
Sentence: I was really disappointed by the latest superhero movie. I would not recommend it to anyone.

Task: Given a dialogue, classify whether the user is satisfied with the service. You should respond with "Satisfied" or "Unsatisfied".
Class label: Satisfied
Dialogue:
- Agent: Thank you for your feedback. We will work to improve our service in the future.
- Customer: I am happy with the service you provided. Thank you for your help.
Class label: Unsatisfied
Dialogue:
- Agent: I am sorry we will cancel that order for you, and you will get a refund within 7 business days.
- Customer: oh that takes too long. I want you to take quicker action on this.

Task: Given some political opinions, classify whether the person belongs to Democrats or Republicans.
Class label: Democrats
Opinion: I believe that everyone should have access to quality healthcare regardless of their income level.
Class label: Republicans
Opinion: I believe that people should be able to keep more of their hard-earned money and should not be taxed at high rates.

Task: Tell me if the following email is a promotion email or not.
Class label: Promotion
Email: Check out our amazing new sale! We've got discounts on all of your favorite products.
Class label: Not Promotion
Email: We hope you are doing well. Let us know if you need any help.

Task: Detect if the Reddit thread contains hate speech.
Class label: Hate Speech
Thread: All people of color are stupid and should not be allowed to vote.
Class label: Not Hate Speech
Thread: The best way to cook a steak on the grill.

Task:  Does the information in the document supports the claim? You can answer "Support" or "Unsupport".
Class label: Unsupport
Document: After a record-breaking run that saw mortgage rates plunge to all-time lows and home prices soar to new highs, the U.S. housing market finally is slowing. While demand and price gains are cooling, any correction is likely to be a modest one, housing economists and analysts say. No one expects price drops on the scale of the declines experienced during the Great Recession.
Claim: The US housing market is going to crash soon.
Class label: Support
Document: The U.S. housing market is showing signs of strain, with home sales and prices slowing in many areas. Mortgage rates have risen sharply in recent months, and the number of homes for sale is increasing. This could be the beginning of a larger downturn, with some economists predicting a potential housing crash in the near future.
Claim: The US housing market is going to crash soon.

Task: Answer the following multiple-choice question. Select A, B, C, or D for the final answer.
Class label: C
Question: What is the capital of Germany?
A. London
B. Paris
C. Berlin
D. Rome
Class label: D
Question: What is the largest planet in our solar system?
A) Earth
B) Saturn
C) Mars
D) Jupiter
Class label: A
Question: What is the process by which plants make their own food through photosynthesis?
A) Respiration
B) Fermentation
C) Digestion
D) Metabolism
Class label: B
Question: Who wrote the novel "The Great Gatsby"?
A) Ernest Hemingway
B) F. Scott Fitzgerald
C) J.D. Salinger
D) Mark Twain

Task: You need to read a code and detect if there is a syntax error or not. Output true if there is an error, output false if there is not.
Class label: true
Code:
def quick_sort(arr):
    if len(arr) < 2
        return arr
Class label: False
Code:
def calculate_average(numbers):
    total = 0
    for number in numbers:
        total += number
    return total / len(numbers)

Task: You are provided with a news article, and you need to identify all the categories that this article belongs to. Possible categories include Sports and Politics. Output its categories one by one, separated by a comma.
Class label: Sports
Article: The Golden State Warriors have won the NBA championship for the second year in a row.
Class label: Politics
Article: The United States has withdrawn from the Paris Climate Agreement.
Class label: Politics, Sports
Article: The government has proposed cutting funding for youth sports programs.

Task: Given a credit card statement, the cardholder's spending habits, and the account balance, classify whether the cardholder is at risk of defaulting on their payments or not.
Class label: At risk
Credit card statement: Purchases at high-end clothing stores and luxury hotels.
Cardholder's spending habits: Frequent purchases at luxury brands and high-end establishments.
Account balance: Over the credit limit and multiple missed payments.
Class label: Not at risk
Credit card statement: Purchases at grocery stores and gas stations.
Cardholder's spending habits: Regular purchases for necessary expenses and occasional dining out.
Account balance: Slightly below the credit limit and no missed payments.

Task: Given a social media post, the hashtags used, and a topic. classify whether the post is relevant to the topic or not.
Class label: Relevant
Post: I can't believe the government is still not taking action on climate change. It's time for us to take matters into our own hands.
Hashtags: #climatechange #actnow
Topic: Climate change
Class label: Not relevant 
Post: I just bought the new iPhone and it is amazing!
Hashtags: #apple #technology
Topic: Travel

Task: The answer will be 'yes' if the provided sentence contains an explicit mention that answers the given question. Otherwise, answer 'no'. 
Class label: Yes
Sentence: Jack played basketball for an hour after school.
Question: How long did Jack play basketball?
Class label: No
Sentence: The leaders of the Department of Homeland Security now appear before 88 committees and subcommittees of Congress.
Question: How often are they required to appear?

Task: Tell me what's the second largest city by population in Canada.
Class label: Montreal

Task: Classifying different types of mathematical equations, such as linear, and quadratic equations, based on the coefficients and terms in the equation.
Class label: Linear equation
Equation: y = 2x + 5
Class label: Quadratic equation
Equation: y = x^2 - 4x + 3

Task: Tell me the first number of the given list.
Class label: 1
List: 1, 2, 3
Class label: 2
List: 2, 9, 10

Task: Which of the following is not an input type? (a) number (b) date (c) phone number (d) email address (e) all of these are valid inputs.
Class label: (e)

Task:  {Instruction for the target task}'''

3.3 超参

results = make_gpt3_requests(
                    engine=args.engine,  # davinci
                    prompts=prompts,
                    # because the clf template is longer, we need to decrease the max_tokens
                    max_tokens=300 if any(task_clf_types[task["instruction"]] for task in batch) else 350,
                    temperature=0,
                    top_p=0,
                    frequency_penalty=0,
                    presence_penalty=1.5,
                    stop_sequences=[f"Example {args.max_instances_to_generate + 1}", "Task:"],
                    logprobs=1,
                    n=1,
                    best_of=1,
                    api_key=args.api_key,
                    organization=args.organization)

4. Filtering and Postprocessing

最后需要对生成的指令和实例做后处理,主要包括如下工作:

1.  过滤掉和已经存在的instruction  rouge-l 大于0.7 的instruction

2. 过滤掉一些包含特定的关键字的instruction,如:images, pictures, graphs。因为这些instruction  通常语言模型无法处理。

3. 过滤掉 具有相同input的 instance。

应用注意

1. 本文提出的prompt是英文的,需要自行构建中文版的prompt。

2. prompt的构建和模型强相关,同一个prompt不能适应所有大模型。

3. 即使使用同一个大模型,不同的参数也会影响prompt的效果,这是一个调参的过程。

基于chatglm的self instruct prompt 设计

以下prompt基于chatglm_pro, temperature=0.7, top_p=0.5,其余参数默认的条件下测试。

智谱AI开放平台

指令生成prompt

已经给出下面几个例子作为参考,要求你生成10条指令:
1. 描述一个你认为未来有发展潜力的新兴技术。
2. 头脑风暴列出可能的新年决心。
3.针对当前环境问题,提出一些可行的解决方案。
4. 分析一部热门电影的成功因素,并提出自己的看法。
5. 跟我解释一下你跑步的时候在想什么。
6. 作为一名新上任的体育教练,你在工作的头30天有什么计划?
7. 用几个要点总结下面的文档。
8. 根据所给场景,编写一段感人的故事情节。

效果展示

instance 生成prompt

请你针对给定的指令,生成相应的输入和输出,如果指令不需要输入,则可以只生成输出。我会先给你一些示例,你需要补全最后一个指令对应的输入和输出,如果可以,尝试生成更多的示例:
指令:抽取段落中的所有国家名称,以顿号分隔列出。
示例1
输入:《诺博士》是英国作家伊恩·弗莱明的第六部以英国特工詹姆斯·邦德为主角的小说。这部小说创作于弗莱明位于牙买加的黄金眼庄园,1958年由乔纳森·凯普在英国首次出版。在小说中,邦德调查了两名在牙买加失踪的军情六处特工,他们一直在调查诺博士。邦德前往诺斯加勒比岛,遇到了在那里收集贝壳的“蜜糖骑士”。他们被抓起来,带到一座雕刻在山上的豪华设施里。诺博士是一位德国传教士和一位中国妇女的儿子,他的角色受到了萨克斯·侯麦的《傅满洲故事》的影响。《诺博士》是弗莱明第一部在英国受到广泛负面评价的小说,但在美国却受到了更大的好评。
输出:英语、英国、牙买加、英国、德国、中国、英国、美国。
指令:将85华氏度转换成摄氏。
示例1
输入:
输出:85°F = 29.44°C
指令:解方程并求出x的值。
示例1
输入:10X + 5 = 10
输出:10X = 5, X = 0.5
示例2
输入:X + Y + 120 = 100
输出:X + Y = -20, X = -20 - Y
指令:对给定列表按升序排序。
示例1
输入:[10,92,2,5,- 4,92,5,101]
输出:[-4,2,5,5,10,92,92,101]
指令:用更专业,严谨的语气改写下列句子。
示例1
输入:令人惊讶的是,这房子的构造不是很好,你可能需要更多的钱来修理它。如果你问我,我会建议你考虑其他候房子。
输出:这个房子看起来不是很好,所以你可能需要在购买后花更多的钱来修理它。我建议你看看其他的房产。
示例2
输入:就像你知道的,我们上周做了一个实验,发现了非常惊人的结果——语言模型可以自我提高!
输出:我们上周的实验展示了令人惊讶的结果,证明了语言模型可以自我改进。
指令:制定一个关于学习下列新技能的计划,并分享如何实施。

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Self-instruct是一种通过自我生成指令数据并使用它进行引导来提高语言模型的指令遵循能力的方法。斯坦福科研人员引入了self-instruct框架,在没有人工标注的情况下,通过自我迭代进化来提高指令遵循能力,并取得了与InstructGPT相当的性能,相比原始GPT3提升了33%。他们还发布了自生成的指令数据集,以促进对指令调优的研究。这种方法被证明是一种简单有效的方式,可以提升语言模型在零样本和小样本泛化能力上的表现。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* *2* [自驱力超强的羊驼?斯坦福微调LLaMa](https://blog.csdn.net/qq_21139827/article/details/129535415)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] - *3* [用GPT-4做大模型指令微调,新任务零样本性能再提升](https://download.csdn.net/download/2301_76957510/87671482)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值