微软开源PromptWizard使用方法

微软开源PromptWizard使用方法

https://github.com/microsoft/PromptWizard

PromptWizard (PW) 旨在自动化和简化提示优化。它将 LLM 的迭代反馈与高效的探索和改进技术相结合,在几分钟内创建高效的prompts。

PW的核心是其自我进化和自适应机制,LLM 会同时迭代生成、评论和改进提示和示例。此过程通过反馈和综合确保持续改进,实现针对特定任务的整体优化。

如何运行 PromptWizard

安装

  1. 克隆仓库:
git clone https://github.com/microsoft/PromptWizard
cd PromptWizard

或者

直接上网站用http下载,然后进入当前路径PromptWizard
  1. 创建并激活虚拟环境
    如果有conda就用conda也行

    conda activate env

  2. 导入依赖(要在PromptWizard路径下,同时在虚拟环境内,当然没虚拟环境问题也不大,就是版本控制不太好)

    pip install -e .
    
  3. 进入demo找到gsm8k这个demo进行改造

    主要改造

  •  class GSM8k(DatasetSpecificProcessing):
    
         def dataset_to_jsonl(self, dataset_jsonl: str, **kwargs: Any) -> None:
             def extract_answer_from_output(completion):
                 # Your functions for metrics and prompt building
                 ans_re = compile(r"#### (\-?[0-9\.\,]+)")
                 self.INVALID_ANS = "[invalid]"
    
                 match = ans_re.search(completion)
                 if match:
                     match_str = match.group(1).strip()
                     match_str = match_str.replace(",", "")
                     return match_str
                 else:
                     return self.INVALID_ANS
    
             examples_set = []
    
             for _, sample in tqdm(enumerate(kwargs["dataset"]), desc="Evaluating samples"):
                 example = {
                   DatasetSpecificProcessing.QUESTION_LITERAL: sample['question'],
                   DatasetSpecificProcessing.ANSWER_WITH_REASON_LITERAL: sample['answer'],
                   DatasetSpecificProcessing.FINAL_ANSWER_LITERAL: extract_answer_from_output(sample["answer"])
                 }
                 examples_set.append(example)
    
             save_jsonlist(dataset_jsonl, examples_set, "w")
    
  • if not os.path.exists("data"):
        os.mkdir("data")
    
    dataset = load_dataset("./gsm8k/main")
    num_samples = 0
    for dataset_type in ['train','test']:
        data_list = []
        for data in dataset[dataset_type]:
            data_list.append({"question": data['question'], "answer": data['answer']})
            if num_samples == 100 and dataset_type == 'train': # We sample only 100 train examples and use 25 out them for training randomly
                break
            num_samples += 1
        gsm8k_processor.dataset_to_jsonl("data/"+ dataset_type+'.jsonl', dataset=data_list)
    

    一个是数据集,一个是将数据变成question和answer。

    data_list = [{},{}]这样的类型

    data_list的eg:

[
	{'question': 'Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?', 
	 'answer': 'Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72'}, 
	{'question': 'Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?', 
	 'answer': 'Weng earns 12/60 = $<<12/60=0.2>>0.2 per minute.\nWorking 50 minutes, she earned 0.2 x 50 = $<<0.2*50=10>>10.\n#### 10'}, 
	{'question': 'Betty is saving money for a new wallet which costs $100. Betty has only half of the money she needs. Her parents decided to give her $15 for that purpose, and her grandparents twice as much as her parents. How much more money does Betty need to buy the wallet?', 
	 'answer': "In the beginning, Betty has only 100 / 2 = $<<100/2=50>>50.\nBetty's grandparents gave her 15 * 2 = $<<15*2=30>>30.\nThis means, Betty needs 100 - 50 - 30 - 15 = $<<100-50-30-15=5>>5 more.\n#### 5"}
]

剩下就差不多一样了,就具体变question和answer,这具体的代码改改
class GSM8k(DatasetSpecificProcessing),注意def dataset_to_jsonl,这个类函数名尽量不更改。

注意点

如果要修改openai接口或者是改成本地的LLM模型接口,可以改PromptWizard/promptwizard/glue/common/llm/llm_mgr.py

这个文件关于openai接口的部分。

def call_api(messages):
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值