微软开源PromptWizard使用方法_promptwizard怎么用-CSDN博客

本文链接：https://blog.csdn.net/gaos_n/article/details/146014680

微软开源PromptWizard使用方法

https://github.com/microsoft/PromptWizard

PromptWizard (PW) 旨在自动化和简化提示优化。它将 LLM 的迭代反馈与高效的探索和改进技术相结合，在几分钟内创建高效的prompts。

PW的核心是其自我进化和自适应机制，LLM 会同时迭代生成、评论和改进提示和示例。此过程通过反馈和综合确保持续改进，实现针对特定任务的整体优化。

如何运行 PromptWizard

安装

克隆仓库：

git clone https://github.com/microsoft/PromptWizard
cd PromptWizard

或者

直接上网站用http下载，然后进入当前路径PromptWizard

创建并激活虚拟环境
如果有conda就用conda也行

conda activate env
导入依赖(要在PromptWizard路径下，同时在虚拟环境内，当然没虚拟环境问题也不大，就是版本控制不太好)
```
pip install -e .
```
进入demo找到gsm8k这个demo进行改造

主要改造

 class GSM8k(DatasetSpecificProcessing):

     def dataset_to_jsonl(self, dataset_jsonl: str, **kwargs: Any) -> None:
         def extract_answer_from_output(completion):
             # Your functions for metrics and prompt building
             ans_re = compile(r"#### (\-?[0-9\.\,]+)")
             self.INVALID_ANS = "[invalid]"

             match = ans_re.search(completion)
             if match:
                 match_str = match.group(1).strip()
                 match_str = match_str.replace(",", "")
                 return match_str
             else:
                 return self.INVALID_ANS

         examples_set = []

         for _, sample in tqdm(enumerate(kwargs["dataset"]), desc="Evaluating samples"):
             example = {
               DatasetSpecificProcessing.QUESTION_LITERAL: sample['question'],
               DatasetSpecificProcessing.ANSWER_WITH_REASON_LITERAL: sample['answer'],
               DatasetSpecificProcessing.FINAL_ANSWER_LITERAL: extract_answer_from_output(sample["answer"])
             }
             examples_set.append(example)

         save_jsonlist(dataset_jsonl, examples_set, "w")

if not os.path.exists("data"):
    os.mkdir("data")

dataset = load_dataset("./gsm8k/main")
num_samples = 0
for dataset_type in ['train','test']:
    data_list = []
    for data in dataset[dataset_type]:
        data_list.append({"question": data['question'], "answer": data['answer']})
        if num_samples == 100 and dataset_type == 'train': # We sample only 100 train examples and use 25 out them for training randomly
            break
        num_samples += 1
    gsm8k_processor.dataset_to_jsonl("data/"+ dataset_type+'.jsonl', dataset=data_list)

一个是数据集，一个是将数据变成question和answer。

data_list = [{},{}]这样的类型

data_list的eg：

[
	{'question': 'Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?', 
	 'answer': 'Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72'}, 
	{'question': 'Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?', 
	 'answer': 'Weng earns 12/60 = $<<12/60=0.2>>0.2 per minute.\nWorking 50 minutes, she earned 0.2 x 50 = $<<0.2*50=10>>10.\n#### 10'}, 
	{'question': 'Betty is saving money for a new wallet which costs $100. Betty has only half of the money she needs. Her parents decided to give her $15 for that purpose, and her grandparents twice as much as her parents. How much more money does Betty need to buy the wallet?', 
	 'answer': "In the beginning, Betty has only 100 / 2 = $<<100/2=50>>50.\nBetty's grandparents gave her 15 * 2 = $<<15*2=30>>30.\nThis means, Betty needs 100 - 50 - 30 - 15 = $<<100-50-30-15=5>>5 more.\n#### 5"}
]