在这篇文章中,我们将介绍如何使用 Pinecone 与 LlamaIndex 搭建一个混合搜索系统。我们将分步骤指导您创建一个 Pinecone 向量索引,加载文档并构建 Pinecone 矢量存储。
创建 Pinecone 索引
首先,我们需要安装必要的库。使用以下命令进行安装:
%pip install llama-index-vector-stores-pinecone
!pip install llama-index>=0.9.31 pinecone-client>=3.0.0 "transformers[torch]"
配置日志记录和导入必要的库:
import logging
import sys
import os
from pinecone import Pinecone, ServerlessSpec
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
os.environ["PINECONE_API_KEY"] = "your-pinecone-api-key" # 替换为你的 Pinecone API Key
os.environ["OPENAI_API_KEY"] = "your-openai-api-key" # 替换为你的 OpenAI API Key
api_key = os.environ["PINECONE_API_KEY"]
pc = Pinecone(api_key=api_key)
# 创建 Pinecone 索引
pc.create_index(
name="quickstart",
dimension=1536,
metric="dotproduct",
spec=ServerlessSpec(cloud="aws", region="us-west-2"),
)
pinecone_index = pc.Index("quickstart")
下载数据
下面的命令用于下载我们将要使用的数据:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
加载文档并构建 PineconeVectorStore
使用 LlamaIndex 加载文档,并构建 Pinecone 矢量存储:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.pinecone import PineconeVectorStore
from IPython.display import Markdown, display
# 加载文档
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
# 配置存储上下文
from llama_index.core import StorageContext
vector_store = PineconeVectorStore(
pinecone_index=pinecone_index,
add_sparse_vector=True,
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
查询索引
我们可以使用以下代码查询索引:
# 设置日志级别为 DEBUG 以获得更详细的输出
query_engine = index.as_query_engine(vector_store_query_mode="hybrid")
response = query_engine.query("Viaweb 发生了什么?")
display(Markdown(f"<b>{response}</b>"))
示例
如果你运行以上代码,你可能会看到类似如下的输出:
At Viaweb, Lisp was used as a programming language. The speaker gave a talk at a Lisp conference about how Lisp was used at Viaweb, and afterward, the talk gained a lot of attention when it was posted online. This led to a realization that publishing essays online could reach a wider audience than traditional print media. The speaker also wrote a collection of essays, which was later published as a book called "Hackers & Painters."
可能遇到的错误
-
API Key 未设置:
如果没有正确设置环境变量PINECONE_API_KEY
或OPENAI_API_KEY
,则会抛出EnvironmentError
。if "OPENAI_API_KEY" not in os.environ: raise EnvironmentError(f"Environment variable OPENAI_API_KEY is not set")
-
网络问题:
在下载数据或请求 API 时,可能会遇到网络问题,这会导致请求失败。在这种情况下,检查网络连接是一个好的起点。
如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!
参考资料: