LangChain进行文本摘要总结

大多_C

于 2024-06-02 11:07:49 发布

阅读量597

点赞数 2

文章标签： langchain 服务器运维

本文链接：https://blog.csdn.net/weixin_46933702/article/details/139388743

版权

利用LangChain进行文本摘要的详细总结

LangChain是一个强大的工具，可以帮助您使用大型语言模型（LLM）来总结多个文档的内容。以下是一个详细指南，介绍如何使用LangChain进行文本摘要，包括使用文档加载器、三种常见的摘要方法（Stuff、Map-Reduce和Refine）以及具体的实现步骤。

1. 安装和设置

首先，确保您已安装LangChain，并设置了所需的环境变量。

pip install langchain

设置环境变量来开始记录跟踪：

import getpass
import os

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()

2. 加载文档

使用文档加载器加载内容。例如，可以使用WebBaseLoader从HTML网页加载内容：

from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
docs = loader.load()

3. 三种常见的摘要方法

方法1：Stuff

将所有文档内容连接成一个提示，然后传递给LLM。适用于较大上下文窗口的模型，例如OpenAI的GPT-4或Anthropic的Claude-3。

from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain_core.prompts import PromptTemplate

# 定义提示
prompt_template = """Write a concise summary of the following:
"{text}"
CONCISE SUMMARY:"""
prompt = PromptTemplate.from_template(prompt_template)

# 定义LLM链
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k")
llm_chain = LLMChain(llm=llm, prompt=prompt)

# 定义StuffDocumentsChain
stuff_chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name="text")

docs = loader.load()
result = stuff_chain.invoke(docs)
print(result["output_text"])

方法2：Map-Reduce

先将每个文档分别总结，然后将这些总结归纳成一个全局摘要。

from langchain.chains import MapReduceDocumentsChain, ReduceDocumentsChain
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(temperature=0)

# 映射步骤
map_template = """The following is a set of documents
{docs}
Based on this list of docs, please identify the main themes 
Helpful Answer:"""
map_prompt = PromptTemplate.from_template(map_template)
map_chain = LLMChain(llm=llm, prompt=map_prompt)

# 归约步骤
reduce_template = """The following is set of summaries:
{docs}
Take these and distill it into a final, consolidated summary of the main themes. 
Helpful Answer:"""
reduce_prompt = PromptTemplate.from_template(reduce_template)
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

combine_documents_chain = StuffDocumentsChain(llm_chain=reduce_chain, document_variable_name="docs")

reduce_documents_chain = ReduceDocumentsChain(
    combine_documents_chain=combine_documents_chain,
    collapse_documents_chain=combine_documents_chain,
    token_max=4000,
)

map_reduce_chain = MapReduceDocumentsChain(
    llm_chain=map_chain,
    reduce_documents_chain=reduce_documents_chain,
    document_variable_name="docs",
    return_intermediate_steps=False,
)

result = map_reduce_chain.invoke(docs)
print(result["output_text"])

方法3：Refine

通过迭代文档更新滚动摘要，每次根据新文档和当前摘要生成新的摘要。

chain = load_summarize_chain(llm, chain_type="refine")
result = chain.invoke(docs)
print(result["output_text"])

4. 使用AnalyzeDocumentChain

将文本拆分和摘要包装在一个链中，方便操作。

from langchain.chains import AnalyzeDocumentChain

text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=1000, chunk_overlap=0)
summarize_document_chain = AnalyzeDocumentChain(combine_docs_chain=chain, text_splitter=text_splitter)
result = summarize_document_chain.invoke(docs[0].page_content)
print(result["output_text"])

通过上述步骤，您可以使用LangChain高效地总结多个文档的内容，并为LLM提供有用的背景信息。

大多_C

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
LangChain进行文本摘要总结

LangChain是一个强大的工具，可以帮助您使用大型语言模型（LLM）来总结多个文档的内容。以下是一个详细指南，介绍如何使用LangChain进行文本摘要，包括使用文档加载器、三种常见的摘要方法（Stuff、Map-Reduce和Refine）以及具体的实现步骤。将所有文档内容连接成一个提示，然后传递给LLM。适用于较大上下文窗口的模型，例如OpenAI的GPT-4或Anthropic的Claude-3。# 定义提示"{text}"# 定义LLM链# 定义StuffDocumentsChain。
复制链接

扫一扫