探索LangChain中的向量存储和检索器：高效集成数据检索与LLM工作流-CSDN博客

本文链接：https://blog.csdn.net/ahdfwcevnhrtds/article/details/142320909

引言

在构建复杂的自然语言处理应用时，数据的高效检索是至关重要的。LangChain提供的向量存储和检索器抽象，为集成向量数据库和其他数据源的数据检索提供了便利，尤其是在RAG（检索增强生成）场景中。本文将介绍LangChain中这两者的使用，以实现高效的数据与语言模型互操作。

主要内容

文档抽象

LangChain使用Document抽象表示文本的基本单元。每个文档由page_content（字符串）和metadata（字典）组成，后者包含关于文档的源信息。

from langchain_core.documents import Document

documents = [
    Document(page_content="Dogs are great companions, known for their loyalty and friendliness.", metadata={"source": "mammal-pets-doc"}),
    Document(page_content="Cats are independent pets that often enjoy their own space.", metadata={"source": "mammal-pets-doc"}),
    # 更多示例文档...
]

向量存储

向量存储通过将文本嵌入为数字向量进行数据的存储和检索。使用LangChainVectorStore，我们可以将文档添加到存储中并根据相似度进行查询。

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma.from_documents(
    documents,
    embedding=OpenAIEmbeddings(),
)

检索器

检索器是LangChain中可运行的对象，支持同步和异步操作。我们可以利用检索器从向量存储中提取数据，并用于更复杂的应用。

from typing import List
from langchain_core.documents import Document
from langchain_core.runnables import RunnableLambda

retriever = RunnableLambda(vectorstore.similarity_search).bind(k=1)
retriever.batch(["cat", "shark"])

代码示例

以下是一个简单的代码示例，展示如何使用LangChain向量存储和检索器进行数据检索并与LLM集成：

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

llm = ChatOpenAI(model="gpt-3.5-turbo")

message = """
Answer this question using the provided context only.

{question}

Context:
{context}
"""

prompt = ChatPromptTemplate.from_messages([("human", message)])

rag_chain = {"context": retriever, "question": RunnablePassthrough()} | prompt | llm

response = rag_chain.invoke("tell me about cats")
print(response.content)