#Langchain | RAG | LLM #多条链并行处理获取summarize-分支与合并、映射与规约

最新推荐文章于 2024-08-18 10:00:00 发布

向日葵花籽儿

最新推荐文章于 2024-08-18 10:00:00 发布

阅读量874

点赞数 15

分类专栏： LangChain 教程文章标签： langchain AIGC prompt LLM

本文链接：https://blog.csdn.net/weixin_45312236/article/details/136811777

版权

LangChain 教程专栏收录该内容

11 篇文章 0 订阅

订阅专栏

本文介绍了如何使用LLM技术和流水线式处理（如RunnableParallels）来实现文档的分支和合并，通过MapReduce方法分解任务，生成文档摘要并合并核心主题。作者详细描述了映射阶段、归约阶段以及在实际代码中的应用，展示了LLM驱动的自主代理系统及其组件、规划、记忆和工具使用等内容。

摘要由CSDN通过智能技术生成

Branching and Merging

一个组件的输出被两个或更多其他组件处理。
RunnableParallels 允许分割或分支链，使多个组件可以并行处理输入。
之后，其他组件可以合并或合成结果以合成最终响应。这种类型的链创建了如下所示的计算图：

     输入
      / \
     /   \
 分支1  分支2
     \   /
      \ /
      合并


planner =(
    ChatPromptTemplate.from_template("生成关于：{input} 的论据")
| ChatOpenAI()
| StrOutputParser()
| {"base_response": RunnablePassthrough()}
)

arguments_for =(
    ChatPromptTemplate.from_template(
"列出 {base_response} 的正面或积极方面"
)
| ChatOpenAI()
| StrOutputParser()
)
arguments_against =(
    ChatPromptTemplate.from_template(
"列出 {base_response} 的负面或消极方面"
)
| ChatOpenAI()
| StrOutputParser()
)

final_responder =(
    ChatPromptTemplate.from_messages(
[
("ai","{original_response}"),
("human","优点：\n{results_1}\n\n缺点：\n{results_2}"),
("system","根据评价生成最终回应"),
]
)
| ChatOpenAI()
| StrOutputParser()
)

chain =(
    planner
| {
"results_1": arguments_for,
"results_2": arguments_against,
"original_response": itemgetter("base_response"),
}
| final_responder
)


chain.invoke({"input": "敏捷开发"})

输出结果：

 '尽管敏捷开发存在潜在的缺点和挑战，但许多组织已成功地采用并实施这种项目管理框架，取得了巨大的效果。上面提到的缺点可以通过适当的培训、支持和持续改进来减轻或克服。还要注意，并非所有缺点都适用于每个组织或项目。\n\n例如，尽管敏捷开发一开始可能复杂，但通过适当的培训和指导，团队可以快速掌握概念和实践。缺乏可预测性可以通过实施速度跟踪和发布计划等技术来减轻。有限的文档可以通过在轻量级文档和团队成员之间的清晰沟通之间保持平衡来解决。团队协作的依赖可以通过有效的沟通渠道和定期的团队建设活动来改善。\n\n敏捷开发可以通过使用 Scrum of Scrums 或 LeSS（大规模敏捷开发）等框架来扩展和适应更大的项目。速度与质量的问题可以通过将质量保证实践（如持续集成和自动化测试）纳入敏捷开发流程来解决。

代码解释：

根据给定的代码逻辑，这段代码是串行执行的，而不是并行执行。在代码中，每个任务都按顺序定义，并且后一个任务的输入依赖于前一个任务的输出。通过使用管道操作符 `|` 将任务连接起来形成一个流水线，但任务的执行是按顺序进行的，而不是并行执行。

具体的执行顺序如下：

1. `planner` 任务被执行，生成关于输入的论据。
2. `arguments_for` 任务被执行，生成基于 `base_response` 的正面方面。
3. `arguments_against` 任务被执行，生成基于 `base_response` 的负面方面。
4. `final_responder` 任务被执行，生成最终回应的消息。

整个流程是按照定义的顺序依次执行的，每个任务的输出作为下一个任务的输入。这是一个串行执行的流程，而不是并行执行。

请注意，如果您希望实现并行处理，可以使用并发编程的方法，例如使用 `concurrent.futures` 模块中的线程池或进程池。在现有的代码中，没有明确的并行处理部分。

Map-Reduce

可以自定义 LLMs 和映射和减少阶段的提示。
请添加图片描述

Map-Reduce方法是一种处理和总结大量文档的复杂方法。它利用语言模型（LLM）的力量，通过将任务分解为更小、更易管理的部分，然后再将结果合并以形成最终的、综合的摘要。
Map-Reduce过程中涉及的步骤的概述：

映射阶段（Map）：在这个阶段，每个文档都被单独处理，使用LLMChain生成一个摘要。这里，您定义了一个LLMChain，它使用特定的模板和提示来从文档中识别主要主题，并生成一个摘要。

归约阶段（Reduce）：在映射阶段生成的摘要之后，归约阶段将这些摘要合并成一个单一的全局摘要。ReduceDocumentsChain负责处理文档映射结果，并将其简化为单个输出。如果文档的累积大小超过了预设的令牌上限（例如4000个令牌），它会递归地将文档分批传递给StuffDocumentsChain来创建批量摘要。

合并Map和Reduce链：将映射和归约链合并为一个链，即MapReduceDocumentsChain。这个链通过在文档上映射一个链，然后将结果合并来组合文档。它还处理了文本分割，确保每个文档块的大小不超过指定的令牌数。

执行和输出：最后，通过运行map_reduce_chain，您将得到一个基于提供的文档列表的主要主题摘要。这个摘要包括了LLM驱动的自主代理的概念、系统组件、规划、记忆、工具使用、案例研究、挑战以及引用和参考文献等关键点。

整个过程是一个迭代和分层的摘要过程，它不仅能够处理大量的文本数据，还能够提炼出文本中的核心信息，为用户提供一个清晰、有组织的知识结构。这种方法特别适合于处理复杂的信息集合，如研究报告、技术文档或大量的日志文件。

具体流程

指定用于将每个文档映射到单个摘要的 LLMChain

from langchain.chains import MapReduceDocumentsChain, ReduceDocumentsChain
from langchain_text_splitters import CharacterTextSplitter

llm = ChatOpenAI(temperature=0)

# Map
map_template = """The following is a set of documents
{docs}
Based on this list of docs, please identify the main themes 
Helpful Answer:"""
map_prompt = PromptTemplate.from_template(map_template)
map_chain = LLMChain(llm=llm, prompt=map_prompt)

使用提示中心来存储和获取提示

from langchain import hub

map_prompt = hub.pull("rlm/map-prompt")
map_chain = LLMChain(llm=llm, prompt=map_prompt)

ReduceDocumentsChain 句柄获取文档映射结果并将其简化为单个输出。它包装一个泛型 CombineDocumentsChain （如 StuffDocumentsChain ），但增加了在将文档传递给 CombineDocumentsChain 之前折叠文档的功能 token_max ，如果文档的累积大小超过 。在这个例子中，我们实际上可以重用我们的链来组合我们的文档，以折叠我们的文档。
因此，如果我们映射文档中的累积令牌数超过 4000 个令牌，那么我们将递归地将 \< 4000 个令牌的文档批量传递给我们 StuffDocumentsChain 以创建批量摘要。一旦这些批处理摘要累计少于 4000 个令牌，我们将最后一次将它们全部传递给 StuffDocumentsChain 以创建最终摘要。

# Reduce
reduce_template = """The following is set of summaries:
{docs}
Take these and distill it into a final, consolidated summary of the main themes. 
Helpful Answer:"""
reduce_prompt = PromptTemplate.from_template(reduce_template)

# Note we can also get this from the prompt hub, as noted above
reduce_prompt = hub.pull("rlm/map-prompt")

reduce_prompt

ChatPromptTemplate(input_variables=['docs'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['docs'], template='The following is a set of documents:\n{docs}\nBased on this list of docs, please identify the main themes \nHelpful Answer:'))])

# 创建归约链，用于生成摘要
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

# 创建一个组合文档链，将多个文档合并为一个单一的字符串，并传递给LLMChain处理
combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_chain,  # 设置最终处理链为reduce_chain
    document_variable_name="docs"  # 设置文档列表在处理中的变量名为"docs"
)

# 创建归约文档链，负责迭代处理和归约映射阶段生成的文档摘要
reduce_documents_chain = ReduceDocumentsChain(
    # 设置最终调用的处理链
    combine_documents_chain=combine_documents_chain,
    # 如果文档累积超出StuffDocumentsChain的上下文长度，则使用此链折叠文档
    collapse_documents_chain=combine_documents_chain,
    # 设置文档组合的最大令牌数
    token_max=4000,
)

这段代码通过使用LangChain框架中的不同链（Chain）类，构建了一个处理流程，用于将多个文档合并并生成一个摘要。每个链都有其特定的职责，如reduce_chain用于生成摘要，combine_documents_chain用于合并文档，而reduce_documents_chain则负责根据令牌数限制对文档进行迭代归约处理。

将我们的地图链和还原链合二为一：

# 通过映射链处理文档，然后将结果组合起来，创建一个MapReduceDocumentsChain实例
map_reduce_chain = MapReduceDocumentsChain(
    # 映射链，用于将每个文档映射成摘要
    llm_chain=map_chain,
    # 归约链，用于将映射阶段的摘要合并成一个全局摘要
    reduce_documents_chain=reduce_documents_chain,
    # 在llm_chain中放置文档的变量名
    document_variable_name="docs",
    # 是否在输出中返回映射步骤的结果，默认为False，即不返回中间步骤的结果
    return_intermediate_steps=False,
)

# 创建一个字符文本分割器实例，用于将文档分割成指定大小的块
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000,  # 每个文档块的大小
    chunk_overlap=0  # 分块时不重叠
)
# 使用文本分割器将文档分割成多个块
split_docs = text_splitter.split_documents(docs)

这段代码首先定义了一个MapReduceDocumentsChain，它将通过map_chain处理每个文档，并将结果传递给reduce_documents_chain进行归约。document_variable_name指定了在llm_chain中用于存储文档的变量名。return_intermediate_steps参数设置为False，表示不在输出中包含映射步骤的中间结果。

接下来，代码创建了一个CharacterTextSplitter实例，用于将文档分割成大小不超过1000个字符的块，且分割时不重叠。split_documents方法根据这个分割器的设置将文档docs分割成多个块，这些块将被用于后续的映射和归约处理。

Created a chunk of size 1003, which is longer than the specified 1000

print(map_reduce_chain.run(split_docs))

Based on the list of documents provided, the main themes can be identified as follows:

1. LLM-powered autonomous agents: The documents discuss the concept of building agents with LLM as their core controller and highlight the potential of LLM beyond generating written content. They explore the capabilities of LLM as a general problem solver.

2. Agent system overview: The documents provide an overview of the components that make up a LLM-powered autonomous agent system, including planning, memory, and tool use. Each component is explained in detail, highlighting its role in enhancing the agent's capabilities.

3. Planning: The documents discuss how the agent breaks down large tasks into smaller subgoals and utilizes self-reflection to improve the quality of its actions and results.

4. Memory: The documents explain the importance of both short-term and long-term memory in an agent system. Short-term memory is utilized for in-context learning, while long-term memory allows the agent to retain and recall information over extended periods.

5. Tool use: The documents highlight the agent's ability to call external APIs for additional information and resources that may be missing from its pre-trained model weights. This includes accessing current information, executing code, and retrieving proprietary information.

6. Case studies and proof-of-concept examples: The documents provide examples of how LLM-powered autonomous agents can be applied in various domains, such as scientific discovery and generative agent simulations. These case studies serve as examples of the capabilities and potential applications of such agents.

7. Challenges: The documents acknowledge the challenges associated with building and utilizing LLM-powered autonomous agents, although specific challenges are not mentioned in the given set of documents.

8. Citation and references: The documents include a citation and reference section, indicating that the information presented is based on existing research and sources.

Overall, the main themes in the provided documents revolve around LLM-powered autonomous agents, their components and capabilities, planning, memory, tool use, case studies, and challenges.