让我们用大模型们继续做题看他们能做对多少

最新推荐文章于 2024-09-30 10:46:34 发布

dotNET跨平台

最新推荐文章于 2024-09-30 10:46:34 发布

阅读量29

点赞数

文章标签： flask python 后端

原文链接：https://mp.weixin.qq.com/s?__biz=MzAwNTMxMzg1MA==&mid=2654099974&idx=7&sn=e61b77771a2d8693c8834f76c77a2f33&chksm=81421f93d719655c50ff5ba2c8e62bae791a15324e84331b87a6c15e46870aa9df0a71855fa2&scene=126&sessionid=0

版权

对于小白来说，面对“如何让大语言模型做题”这一话题，脑海中呈现出的便是用户自己一题一题地输入给LLMs。然而，对于程序员来说，该如何让它自动地读取题库、进而测评呢？谭亲怡同学借这篇稿子具体介绍了如何将LLMs、Azure OpenAI服务和GaoKao-Bench项目配合使用，以测评不同大语言模型针对不同学科的做题能力。这里的题是指高考题，高考题啊，评测结果说老实话让我大吃一惊！

以下为评测方式的说明：

调用Azure OpenAI服务

完整流程可至Tutorial.pdf中查看

我们需要使用以下命令安装 OpenAI Python 客户端库

pip install openai

注册Azure AI Studio账号，转到Azure AI Studio中的资源和密钥，检索api key以及endpoint（两个必要参数），使得后续能成功调用Azure OpenAI

代码修改

以下将逐步展示本项目修改原代码的部分，完整流程可至Tutorial.pdf中查看

在vscode中打开GaoKao-Bench项目（确保以及克隆其仓库），在openai_gpt4.py中更改引用包的函数为AzureOpenAI

from openai import AzureOpenAI

更改base_url

def __init__(self, api_key_list:List[str], base_url: str="your_base_url", organization: str=None, model_name:str="your_model_name", temperature:float=0.3, max_tokens: int=4096):

加入endpoint以及api key这两个参数

azure_endpoint="your_azure_endpoint",
                    api_key="your_api_key",
                    api_version="your_api_version"

在objective_bench.py中将字符编码改为utf-8，确保能成功编码

with open("Obj_Prompt.json", "r",encoding='utf-8') as f:

在bench_function.py中修改原字段的错误（将standard_answer改为answer），使得可以读取answer的json格式

standard_answer = data[i]['answer']

在bench_function.py中修改写入文件的方式为每循环一次写一次，使得程序员可以同步做题情况，代替了原代码写完所有的题再输出的形式，避免了因为一个卡顿而无法获得输出的情况（以下所示为修改后的字段）

def choice_test_changed(**kwargs):
    """

    Get answers of the Choice Questions

    """

    model_api = kwargs['model_api']
    model_name = kwargs['model_name']
    start_num = kwargs['start_num']
    end_num = kwargs['end_num']
    data = kwargs['data']['example']
    keyword = kwargs['keyword']
    prompt = kwargs['prompt']
    question_type = kwargs['question_type']
    save_directory = kwargs['save_directory']

    file_name = model_name + "_seperate_" + keyword + f"_{start_num}-{end_num - 1}.json"
    file_path = os.path.join(save_directory, file_name)

    for i in tqdm(range(start_num, end_num)):

        index = data[i]['index']
        question = data[i]['question'].strip() + '\n'
        year = data[i]['year']
        category = data[i]['category']
        score = data[i]['score']
        standard_answer = data[i]['answer']
        answer_lenth = len(standard_answer)
        analysis = data[i]['analysis']
        # 调用模型
        model_output = model_api(prompt, question)
        # 提取 model_answer
        model_answer = extract_choice_answer(model_output, question_type, answer_lenth)
        # TODO: which content of temp we expect

        dict = {
            'index': index,
            'year': year,
            'category': category,
            'score': score,
            'question': question,
            'standard_answer': standard_answer,
            'analysis': analysis,
            'model_answer': model_answer,
            'model_output': model_output
        }

        with open(file_path, 'a', encoding='utf-8') as f:
            if f.tell() == 0:
                # If it's the first entry, write the opening brace and the keyword
                f.write('{\n"keyword": "' + keyword + '",\n"example": [\n')
            json.dump(dict, f, ensure_ascii=False, indent=4)
            if i < end_num - 1:
                f.write(',\n')  # Add a comma after each entry except the last one
            else:
                f.write('\n]}\n')

        time.sleep(5)