【论文阅读】《SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions》论文中的几个实现细节

在前面文章中,大概了解了《SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions》这篇论文的总体框架。

过两天再仔细读,发现有一些技术细节,可以学习更多东西。这里做一些补充说明。
其中,指令生成、分类任务识别和实例生成,都是通过Prompt工程,借此也学习一下如何写Prompt。

指令数据

通过下图,可以直观的看出,什么是指令数据:
在这里插入图片描述

指令生成

文章使用 175 个任务启动任务池(每个任务 1 条指令和 1 个实例)。对于每次指令生成,从此池中抽取 8 条任务指令作为上下文示例。在 8 条指令中,有 6 条来自人工编写的任务, 和 2 个来自前面步骤中为促进多样性而生成的模型任务。指令生成通过下面的提示:
在这里插入图片描述

分类任务识别

因为需要两种不同的方法来处理分类和非分类任务,所以需要确定生成的指令是否代表分类任务。文章使用种子任务中的 12 条分类指令和 19 条非分类指令,然后通过如下的提示:


Can the following task be regarded as a classification task with finite output labels?

Task: Given my personality and the job, tell me if I would be suitable.
Is it classification? Yes

Task: Give me an example of a time when you had to use your sense of humor.
Is it classification? No

Task: Replace the placeholders in the given text with appropriate named entities.
Is it classification? No

Task: Fact checking - tell me if the statement is true, false, or unknown, based on your knowledge and common sense.
Is it classification? Yes

Task: Return the SSN number for the person.
Is it classification? No

Task: Detect if the Reddit thread contains hate speech.
Is it classification? Yes

Task: Analyze the sentences below to identify biases.
Is it classification? No

Task: Select the longest sentence in terms of the number of words in the paragraph, output the sentence index.
Is it classification? Yes

Task: Find out the toxic word or phrase in the sentence.
Is it classification? No

Task: Rank these countries by their population.
Is it classification? No

Task: You are provided with a news article, and you need to identify all the categories that this article belongs to. Possible categories include: Music, Sports, Politics, Tech, Finance, Basketball, Soccer, Tennis, Entertainment, Digital Game, World News. Output its categories one by one, seperated by comma.
Is it classification? Yes

Task: Given the name of an exercise, explain how to do it.
Is it classification? No

Task: Select the oldest person from the list.
Is it classification? Yes

Task: Find the four smallest perfect numbers.
Is it classification? No

Task: Does the information in the document supports the claim? You can answer “Support” or “Unsupport”.
Is it classification? Yes

Task: Create a detailed budget for the given hypothetical trip.
Is it classification? No

Task: Given a sentence, detect if there is any potential stereotype in it. If so, you should explain the stereotype. Else, output no.
Is it classification? No


Task: To make the pairs have the same analogy, write the fourth word.
Is it classification? No

Task: Given a set of numbers, find all possible subsets that sum to a given number.
Is it classification? No

Task: {instruction for the target task}


实例生成

有两种实例生成方法,输出优先方法应用于前一步中确定的分类任务,输入优先方法应用于其余非分类任务。

给定指令及其任务类型,独立为每条指令生成实例。这是具有挑战性的,因为这要求模型根据指令了解目标任务是什么,找出需要哪些额外的输入字段并生成它们,最后通过生成输出来完成任务。文章发现,当使用来自其他任务的指令-输入-输出上下文示例提示时,预训练的 LM 可以在很大程度上实现这一点。一种自然的方法是输入优先方法,可以要求 LM 首先根据指令提出输入字段,然后生成相应的输出。这种生成顺序类似于使用模型来响应指令和输入的方式,但此处使用来自其他任务的上下文作为示例。

请注意,在许多情况下,指令和实例输入没有严格的边界。例如,“写一篇关于学校安全的文章”可以是希望模型直接响应的有效指令,也可以表述为“写一篇关于以下主题的文章”作为指令,将“学校安全”作为实例输入。为了鼓励数据格式的多样性,文章允许不需要额外输入的此类指令,所以提示中可以看到有些指令示例没有额外输入。输入优先的提示如下:


Come up with examples for the following tasks. Try to generate multiple examples when possible. If the task doesn’t require additional input, you can generate the output directly.

Task: Which exercises are best for reducing belly fat at home?
Output:

  • Lying Leg Raises
  • Leg In And Out
  • Plank
  • Side Plank
  • Sit-ups

Task: Extract all the country names in the paragraph, list them separated by commas.
Example 1
Paragraph: Dr. No is the sixth novel by the English author Ian Fleming to feature his British Secret Service agent James Bond. Written at Fleming’s Goldeneye estate in Jamaica, it was first published in the United Kingdom by Jonathan Cape in 1958. In the novel Bond looks into the disappearance in Jamaica of two fellow MI6 operatives who had been investigating Doctor No. Bond travels to No’s Caribbean island and meets Honeychile Rider, who is there to collect shells. They are captured and taken to a luxurious facility carved into a mountain. The character of Doctor No, the son of a German missionary and a Chinese woman, was influenced by Sax Rohmer’s Fu Manchu stories. Dr. No was the first of Fleming’s novels to face widespread negative reviews in Britain, but it was received more favourably in the United States.
Output: English, British, Jamaica, the United Kingdom, German, Chinese, Britain, the United States.

Task: Converting 85 F to Celsius.
Output: 85°F = 29.44°C

Task: Sort the given list ascendingly.
Example 1
List: [10, 92, 2, 5, -4, 92, 5, 101]
Output: [-4, 2, 5, 5, 10, 92, 92, 101]
Example 2
Input 2 - List: [9.99, 10, -5, -1000, 5e6, 999]
Output: [-1000, -5, 9.99, 10, 999, 5e6]

Task: Suggest a better and more professional rephrasing of the following sentence.
Example 1
Sentence: This house is surprisingly not constructed very well, and you probably need more money to fix it after you buy it. If you ask me, I would suggest you to consider other candidates.
Output: This house does not seem to be constructed well, so you may need to spend more money to fix it after you purchase it. I would suggest that you look at other properties.
Example 2
Sentence: Just so you know, we did an experiment last week and found really surprising results - language model can improve itself!
Output: Our experiments last week demonstrated surprising results, proving that the language model can improve itself.

Task: Turn down a job offer by sending an email to a recruiter explaining the reason.
Output: Hi [Recruiter],
Thank you so much for the generous offer to join your team. As we discussed, I’ve admired the company for a number of years, and am a proud endorser of its products. However, after further consideration of where I currently am in my career, I’ve decided to accept an offer at another company.
I would love to stay in touch with you and have already started following you on [Social Media Platform]. Again, thank you so much for your time and consideration.
Thanks again,
[Your Name]

Task: {Instruction for the target task}


然而,文章发现输入优先这种方法可能生成偏向于一个标签的输入,特别是对于分类任务(例如,对于语法错误检测,它通常会生成语法输入)。因此,为分类任务提出了一种输出优先方法,即首先生成可能的类标签,然后在每个类标签上调节输入生成。输出优先的提示如下:


Given the classification task definition and the class labels, generate an input that corresponds to each of the class labels. If the task doesn’t require input, just generate the correct class label.

Task: Classify the sentiment of the sentence into positive, negative, or mixed.
Class label: mixed
Sentence: I enjoy the flavor of the restaurant but their service is too slow.
Class label: Positive
Sentence: I had a great day today. The weather was beautiful and I spent time with friends.
Class label: Negative
Sentence: I was really disappointed by the latest superhero movie. I would not recommend it.

Task: Given a dialogue, classify whether the user is satisfied with the service. You should respond with “Satisfied” or “Unsatisfied”.
Class label: Satisfied
Dialogue:

  • Agent: Thank you for your feedback. We will work to improve our service in the future.
  • Customer: I am happy with the service you provided. Thank you for your help.
    Class label: Unsatisfied
    Dialogue:
  • Agent: Sorry that we will cancel your order. You will get a refund within 7 business days.
  • Customer: oh that takes too long. I want you to take quicker action on this.

Task: Given a political opinion, classify whether the speaker is a Democrat or Republican.
Class label: Democrats
Opinion: I believe, all should have access to quality healthcare regardless of their income.
Class label: Republicans
Opinion: I believe that people should be able to keep more of their hard-earned money and should not be taxed at high rates.

Task: Tell me if the following email is a promotion email or not.
Class label: Promotion
Email: Check out our amazing new sale! We’ve got discounts on all of your favorite products.
Class label: Not Promotion
Email: We hope you are doing well. Let us know if you need any help.

Task: Detect if the Reddit thread contains hate speech.
Class label: Hate Speech
Thread: All people of color are stupid and should not be allowed to vote.
Class label: Not Hate Speech
Thread: The best way to cook a steak on the grill.

Task: Does the document supports the claim? Answer with “Support” or “Unsupport”.
Class label: Unsupport
Document: After a record-breaking run that saw mortgage rates plunge to all-time lows and home prices soar to new highs, the U.S. housing market finally is slowing. While demand and price gains are cooling, any correction is likely to be a modest one, housing economists and analysts say. No one expects price drops on the scale of the declines experienced during the Great Recession.
Claim: The US housing market is going to crash soon.
Class label: Support
Document: The U.S. housing market is showing signs of strain, with home sales and prices slowing in many areas. Mortgage rates have risen sharply in recent months, and the number of homes for sale is increasing. This could be the beginning of a larger downturn, with some economists predicting a potential housing crash in the near future.
Claim: The US housing market is going to crash soon.

Task: Which of the following is not an input type? (a) number (b) date © phone number (d) email address (e) all of these are valid inputs.
Class label: (e)

Task: {instruction for the target task}


  • 13
    点赞
  • 30
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值