64 自定义路由检索器:在LlamaIndex中选择合适的检索工具

自定义路由检索器:在LlamaIndex中选择合适的检索工具

在本文中,我们将介绍如何定义一个自定义的路由检索器(Router Retriever),该检索器能够根据给定的查询选择一个或多个候选检索器来执行查询。路由模块(BaseSelector)使用LLM动态决定使用哪些底层检索工具,这对于从多样化的数据源中选择一个或多个数据源非常有帮助。

设置环境

首先,确保你已经安装了必要的库并设置了OpenAI API密钥:

%pip install llama-index-llms-openai
!pip install llama-index
import nest_asyncio

nest_asyncio.apply()
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().handlers = []
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

下载数据

下载示例数据:

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

加载数据

加载文档并将其转换为节点,然后插入到文档存储中:

from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    SimpleKeywordTableIndex,
)
from llama_index.core import SummaryIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI

# 加载文档
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
# 初始化LLM + 分词器
llm = OpenAI(model="gpt-4")
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)
# 初始化存储上下文(默认在内存中)
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)
# 定义索引
summary_index = SummaryIndex(nodes, storage_context=storage_context)
vector_index = VectorStoreIndex(nodes, storage_context=storage_context)
keyword_index = SimpleKeywordTableIndex(nodes, storage_context=storage_context)
list_retriever = summary_index.as_retriever()
vector_retriever = vector_index.as_retriever()
keyword_retriever = keyword_index.as_retriever()

定义检索工具

定义不同的检索工具:

from llama_index.core.tools import RetrieverTool

list_tool = RetrieverTool.from_defaults(
    retriever=list_retriever,
    description=(
        "Will retrieve all context from Paul Graham's essay on What I Worked"
        " On. Don't use if the question only requires more specific context."
    ),
)
vector_tool = RetrieverTool.from_defaults(
    retriever=vector_retriever,
    description=(
        "Useful for retrieving specific context from Paul Graham essay on What"
        " I Worked On."
    ),
)
keyword_tool = RetrieverTool.from_defaults(
    retriever=keyword_retriever,
    description=(
        "Useful for retrieving specific context from Paul Graham essay on What"
        " I Worked On (using entities mentioned in query)"
    ),
)

定义选择器模块

定义选择器模块,用于路由选择合适的检索工具:

from llama_index.core.selectors import (
    PydanticMultiSelector,
    PydanticSingleSelector,
)
from llama_index.core.retrievers import RouterRetriever
from llama_index.core.response.notebook_utils import display_source_node

# PydanticSingleSelector
retriever = RouterRetriever(
    selector=PydanticSingleSelector.from_defaults(llm=llm),
    retriever_tools=[list_tool, vector_tool],
)

nodes = retriever.retrieve(
    "Can you give me all the context regarding the author's life?"
)
for node in nodes:
    display_source_node(node)

nodes = retriever.retrieve("What did Paul Graham do after RISD?")
for node in nodes:
    display_source_node(node)

# PydanticMultiSelector
retriever = RouterRetriever(
    selector=PydanticMultiSelector.from_defaults(llm=llm),
    retriever_tools=[list_tool, vector_tool, keyword_tool],
)

nodes = retriever.retrieve(
    "What were noteable events from the authors time at Interleaf and YC?"
)
for node in nodes:
    display_source_node(node)

nodes = await retriever.aretrieve(
    "What were noteable events from the authors time at Interleaf and YC?"
)
for node in nodes:
    display_source_node(node)

通过使用路由检索器,你可以根据查询动态选择合适的检索工具,从而提高检索结果的准确性和相关性。希望这些信息对你有所帮助!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

需要重新演唱

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值