第八章：LangGraph - 用状态机驾驭复杂 Agent 流程

（initial）

已于 2025-04-29 10:46:29 修改

阅读量2k

点赞数 77

分类专栏：从基石到前沿：大模型生态框架全景剖析与实战文章标签：人工智能 LangGraph 框架

于 2025-04-28 10:00:00 首次发布

本文链接：https://blog.csdn.net/YPeng_Gao/article/details/147554587

版权

从基石到前沿：大模型生态框架全景剖析与实战专栏收录该内容

17 篇文章

订阅专栏

章节引导：在上一章，我们认识到标准 AgentExecutor 在处理复杂逻辑、状态管理和错误恢复方面的局限性。当我们需要 Agent 具备条件分支、回溯重规划、甚至与人协作的能力时，AgentExecutor 的固定线性循环就显得力不从心。本章，我们将引入 LangChain 生态中的一个重量级成员——LangGraph。LangGraph 并非要取代基础 Agent，而是为构建更复杂、更可靠、更可控的 Agent 应用提供了一种全新的范式。它借鉴了状态机 (State Machine) 的思想，允许我们将 Agent 的执行流程显式地建模为一个图 (Graph)，其中状态在节点间清晰地传递和更新。这为我们驾驭 Agent 的行为提供了前所未有的能力。我们将学习 LangGraph 的核心概念，并通过动手实验，体验如何将简单的 ReAct Agent 迁移至 LangGraph，以及如何利用它构建具备反思、重规划或人机交互等高级能力的复杂 Agent 流。准备好迎接 Agent 开发的思维升级吧！
在这里插入图片描述

重要提示：相关库与版本
LangGraph 是一个独立的库 (pip install langgraph)，它与 LangChain 核心库 (langchain-core) 紧密集成。本章示例基于 LangGraph 和 LangChain 的较新版本，请确保你的环境已相应更新。

8.1 为何需要 LangGraph？超越 AgentExecutor 的局限

第七章的 AgentExecutor 展示了 Agent 的基本运作模式，但也暴露了其在复杂场景下的不足：

隐式状态管理: Agent 的内部状态（如思考历史 agent_scratchpad）管理较为模糊，开发者难以在流程中精确检查、修改或基于状态进行复杂决策。
固定线性循环: 其核心是一个预设的“LLM -> [Tool] -> LLM -> …”循环，难以优雅地实现：
- 条件分支: 例如，“如果工具 A 失败，则尝试工具 B”。
- 回溯/重规划: 例如，“如果当前路径无效，退回重新思考”。
- 人机交互: 例如，“在关键步骤暂停，等待用户确认”。
复杂错误恢复困难: 实现涉及多次尝试、切换策略或用户干预的健壮错误恢复机制非常笨拙。

这些局限性使得 AgentExecutor 更适合逻辑相对简单的 Agent。对于需要精细流程控制、明确状态管理、高可靠性和灵活交互的复杂 Agent，我们需要 LangGraph。

LangGraph 的核心思想：状态图 (State Graph)

LangGraph 将 Agent 工作流程建模为一个图 (Graph)，本质上是一个状态机：在这里插入图片描述

状态 (State): Agent 在任一时刻的所有相关信息被显式定义在一个状态对象中。
节点 (Nodes): 图中的节点代表流程中的处理步骤（调用 LLM、执行工具等），接收当前状态，执行操作，并返回状态更新。
边 (Edges): 定义状态如何在节点间流动和转换。条件边 (Conditional Edges) 是其精髓，允许根据当前状态的值决定流程走向，实现分支、循环等复杂逻辑。

这种方式使得 Agent 的状态和流程变得显式、透明、可追踪、可控且可定制。

8.2 LangGraph 核心组件与构建模块

掌握 LangGraph 的关键在于理解以下构建块：

1. `State`: 定义图的状态

这是在图中流转的数据结构。

常用 TypedDict: (来自 typing 或 typing_extensions)

为字典的键指定类型。
与 StatefulGraph 配合，LangGraph 能自动合并节点返回的部分更新字典。
使用 typing_extensions.Annotated 和 operator.add 可以方便地实现列表状态的自动追加。

from typing import TypedDict, List, Sequence, Optional, Union, Tuple
from typing_extensions import Annotated # For Python < 3.11 for operator.add typing
import operator
from langchain_core.messages import BaseMessage
from langchain_core.agents import AgentAction, AgentFinish

class AgentState(TypedDict):
    input: str
    chat_history: Optional[Sequence[BaseMessage]]
    agent_outcome: Optional[Union[AgentAction, AgentFinish]]
    # Annotated[...] 让 LangGraph 知道如何合并：将新元组追加到列表
    intermediate_steps: Annotated[Sequence[Tuple[AgentAction, str]], operator.add]
    error_message: Optional[str] # 可以添加用于错误处理的状态

Pydantic BaseModel:
- 提供更强的数据校验。
- 注意：StatefulGraph 的自动状态合并对 Pydantic 模型支持有限。通常，返回更新的节点需要显式地构造并返回一个新的 Pydantic 状态对象，或者使用更高级的自定义状态更新机制 (Reducer)。对于入门和常见场景，TypedDict 更直接。

2. Nodes: 定义处理节点

节点是执行具体工作的单元。

形式：普通 Python 函数或 LangChain Runnable。
输入：接收当前 State 字典。
操作：调用 LLM、执行工具、处理数据等。
输出：返回一个字典，包含对 State 的部分更新。LangGraph 负责合并这些更新（对于 StatefulGraph + TypedDict 通常自动进行）。
错误处理: 生产级节点应包含健壮的错误处理。可以在 State 中增加 error 字段，节点捕获异常后将错误信息存入该字段，然后通过条件边将流程导向错误处理节点。

# 示例 Node 函数 (伪代码，重点在于输入输出)
def my_node_function(state: AgentState) -> dict:
    current_input = state["input"]
    # ... 执行一些操作 ...
    new_data = "processed data"
    # 只返回需要更新的部分
    return {"processed_data_field": new_data} # 假设 State 中有 processed_data_field

3. Edges: 定义流程转换

边连接节点，决定执行顺序和逻辑。

graph.add_node(name: str, node: Union[Callable, Runnable]): 添加节点。
graph.set_entry_point(name: str): 指定图的起始节点。
graph.add_edge(start_name: str, end_name: str): 定义无条件转换，从 start_name 到 end_name。
graph.add_conditional_edges(start_name: str, condition_func: Callable, path_map: Dict[str, str]): (核心!) 定义条件分支。
- condition_func(state: State) -> str: 接收当前状态，返回一个字符串，该字符串必须是 path_map 中的一个键。
- path_map: 一个字典，将 condition_func 返回的键映射到对应的下一个节点的名称 (值)。
END (从 langgraph.graph 导入): 一个特殊的字符串常量，表示流程结束。条件边或普通边可以指向 END。
graph.set_finish_point(name: str): (较少用) 将某个节点标记为结束点。

4. `StatefulGraph` / `Graph`: 构建图实例

StatefulGraph(YourStateType): 推荐与 TypedDict 状态一起使用，提供方便的状态自动合并功能。
Graph(): 更通用，如果使用 Pydantic 或需要完全自定义状态更新逻辑时使用。
实例化后，调用 add_node, set_entry_point, add_edge, add_conditional_edges 定义图的完整结构。

5. `compile()`: 编译为 `Runnable`

app = graph.compile(...): 将定义好的图编译成一个标准的 LangChain Runnable 对象。
编译后的 app 可以像任何 LCEL 链一样被调用 (invoke, stream, batch)。
可以传入 checkpointer 参数启用状态持久化。

8.3 动手实验（一）：将 ReAct Agent 迁移至 LangGraph

目标：通过迁移熟悉 ReAct 流程，直观感受 LangGraph 在流程显式化、状态可控性和可调试性方面的优势。

(代码实现与 8.2 节中的示例代码基本一致，整合如下)

# --- 完整迁移示例代码 ---
from typing import TypedDict, List, Tuple, Annotated, Sequence, Optional, Union
from typing_extensions import TypedDict, Annotated # For older Python
import operator
from langchain_core.agents import AgentAction, AgentFinish
from langchain_core.messages import BaseMessage
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain.agents import create_react_agent
from langchain import hub
from langchain_core.tools import tool
import os
from dotenv import load_dotenv

load_dotenv()

# --- 0. 定义工具 (复用第七章的健壮版本) ---
class WeatherToolInput(BaseModel):
    city: str = Field(description="The city name.")
    unit: str = Field(default="celsius", description="Unit (celsius or fahrenheit).")

@tool(args_schema=WeatherToolInput)
def get_weather_robust(city: str, unit: str = "celsius") -> str:
    """Gets the current weather for a specific city. Handles potential errors."""
    # ... (省略内部实现，同 7.2 节) ...
    print(f"--- Getting weather for {city} in {unit} ---")
    try:
        if unit not in ["celsius", "fahrenheit"]:
            raise ValueError("Invalid unit.")
        if city.lower() == "london": return f"15 degrees {unit}."
        if city.lower() == "tokyo": return f"22 degrees {unit}."
        raise ValueError(f"API Error: Could not find city: {city}")
    except ValueError as e: return f"Error: {str(e)}"
    except Exception as e: return f"Unexpected error: {str(e)}"

tools = [get_weather_robust]
tool_map = {t.name: t for t in tools}

# --- 1. 定义 State ---
class ReactAgentState(TypedDict):
    input: str
    chat_history: Optional[Sequence[BaseMessage]] = None # Default to None
    agent_outcome: Optional[Union[AgentAction, AgentFinish]] = None
    intermediate_steps: Annotated[Sequence[Tuple[AgentAction, str]], operator.add] = [] # Default empty list

# --- 2. 定义 Nodes ---
try:
    llm = ChatOpenAI(model="gpt-4o", temperature=0)
    react_prompt = hub.pull("hwchase17/react")
    react_agent_runnable = create_react_agent(llm, tools, react_prompt)
except Exception as e: print(f"Error setting up agent runnable: {e}"); exit()

def run_agent(state: ReactAgentState) -> dict:
    """Calls the ReAct agent runnable."""
    print("--- Node: run_agent ---")
    inputs = {"input": state["input"], "intermediate_steps": state["intermediate_steps"]}
    agent_outcome = react_agent_runnable.invoke(inputs)
    return {"agent_outcome": agent_outcome}

def execute_tools(state: ReactAgentState) -> dict:
    """Executes the tool."""
    print("--- Node: execute_tools ---")
    agent_action = state["agent_outcome"]
    if not isinstance(agent_action, AgentAction): raise ValueError("State has no action.")
    tool_to_use = tool_map.get(agent_action.tool)
    if not tool_to_use: raise ValueError(f"Tool '{agent_action.tool}' not found.")
    observation = tool_to_use.invoke(agent_action.tool_input)
    # State update is handled by Annotated
    return {"intermediate_steps": (agent_action, observation)}

# --- 3. 定义 Conditional Edge Logic ---
def should_continue(state: ReactAgentState) -> str:
    """Decides whether to continue or finish."""
    print("--- Condition: should_continue ---")
    if isinstance(state["agent_outcome"], AgentFinish):
        print("Decision: FINISH")
        return "end"
    else:
        print("Decision: CONTINUE")
        return "continue"

# --- 4. 构建 Graph ---
print("\n--- Building LangGraph ---")
workflow = StateGraph(ReactAgentState)
workflow.add_node("agent", run_agent)
workflow.add_node("action", execute_tools)
workflow.set_entry_point("agent")
workflow.add_conditional_edges(
    "agent",
    should_continue,
    {"continue": "action", "end": END}
)
workflow.add_edge("action", "agent") # Loop back after action

# --- 5. 编译 Graph ---
app = workflow.compile()
print("LangGraph compiled.")

# --- 6. 运行 Graph ---
print("\n--- Running LangGraph (Normal Case) ---")
inputs = {"input": "What is the weather in London?"}
try:
    for event in app.stream(inputs, {"recursion_limit": 10}): print(event); print("-" * 10)
except Exception as e: print(f"Error invoking graph (normal): {e}")

print("\n--- Running LangGraph (Tool Error Case) ---")
inputs_error = {"input": "What is the weather in Paris?"}
try:
    for event in app.stream(inputs_error, {"recursion_limit": 10}): print(event); print("-" * 10)
except Exception as e: print(f"Error invoking graph (error): {e}")

代码对比与调试对比

显式流程 vs. 隐式循环: LangGraph 的代码（节点、边、条件）明确定义了 ReAct 循环，而 AgentExecutor 则将其封装。
状态可控性: LangGraph 的每个节点都接收 state 字典，便于检查和调试。
可调试性: LangGraph 的显式结构结合 LangSmith 或 Python 调试器，更容易定位问题。

8.4 动手实验（二）：使用 LangGraph 构建更复杂的 Agent 流

目标：实践利用 LangGraph 的条件分支和循环能力，构建具备反思、重规划或人机交互能力的 Agent。

场景选择：带反思的 Agent (Self-Correction/Reflection)

架构设计（状态图）：

代码实现 (关键部分):

from langgraph.checkpoint.memory import MemorySaver # For checkpointing
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

# --- 1. 扩展 State ---
class ReflectiveAgentState(ReactAgentState): # 继承自 ReactAgentState
    initial_answer: Optional[str] = None
    reflection: Optional[str] = None
    is_final_good: Optional[bool] = None

# --- 2. 定义新 Nodes ---
def reflect_on_answer(state: ReflectiveAgentState) -> dict:
    """Node to reflect on the quality of the initial answer."""
    print("--- Node: reflect_on_answer ---")
    # Ensure agent_outcome is AgentFinish and has output
    if not isinstance(state.get("agent_outcome"), AgentFinish):
        return {"is_final_good": True} # Skip reflection if not a final answer yet
    initial_answer = state["agent_outcome"].return_values.get("output", "")
    if not initial_answer:
         return {"is_final_good": True} # Skip if no answer

    reflection_prompt_tmpl = """Critique the following answer based on the original query '{input}'. Is it complete, accurate, and directly addresses the question? Respond with 'Critique: [Your critique]' and then on a new line 'Final Answer Quality: [GOOD/BAD]'.

Answer to critique:
{answer_to_reflect}"""
    reflection_prompt = PromptTemplate.from_template(reflection_prompt_tmpl)
    # Use a capable LLM for reflection
    reflect_llm = ChatOpenAI(model="gpt-4o", temperature=0.1)
    reflection_chain = reflection_prompt | reflect_llm | StrOutputParser()

    try:
        reflection_output = reflection_chain.invoke({
            "input": state["input"],
            "answer_to_reflect": initial_answer
        })
        print(f"Reflection Output:\n{reflection_output}")
        critique = reflection_output.split("Critique:")[-1].split("Final Answer Quality:")[0].strip()
        is_good_str = reflection_output.split("Final Answer Quality:")[-1].strip().upper()
        is_good = "GOOD" in is_good_str
    except Exception as e:
        print(f"Error during reflection: {e}")
        critique = "Reflection failed."
        is_good = True # Default to good if reflection fails to avoid loops

    # Store results in state
    return {
        "initial_answer": initial_answer,
        "reflection": critique,
        "is_final_good": is_good
    }

# --- 3. 定义新的 Conditional Edge Logic ---
def route_after_reflection(state: ReflectiveAgentState) -> str:
    """Decide whether to finish or retry based on reflection."""
    print("--- Condition: route_after_reflection ---")
    if state.get("is_final_good", True): # Default to True if not set
        print("Decision: FINISH after reflection")
        return "end"
    else:
        print("Decision: REPLAN based on reflection")
        # To replan, just route back to the agent node.
        # The agent's prompt or state handling might need adjustments
        # to explicitly consider the 'reflection' field if needed.
        return "replan"

# --- 4. 构建新的 Graph ---
print("\n--- Building Reflective Graph ---")
reflective_workflow = StateGraph(ReflectiveAgentState)

reflective_workflow.add_node("agent", run_agent)
reflective_workflow.add_node("action", execute_tools)
reflective_workflow.add_node("reflect", reflect_on_answer) # Add reflection node

reflective_workflow.set_entry_point("agent")

# Conditional edge after agent runs
reflective_workflow.add_conditional_edges(
    "agent",
    should_continue, # Decides action or potential finish
    {
        "continue": "action",
        "end": "reflect" # If agent thinks it's finished, go to reflect first
    }
)
# Edge after action goes back to agent
reflective_workflow.add_edge("action", "agent")
# Conditional edge after reflection
reflective_workflow.add_conditional_edges(
    "reflect",
    route_after_reflection, # Decides based on reflection quality
    {
        "replan": "agent", # If bad, loop back to agent
        "end": END         # If good, finish the graph
    }
)

# --- 5. 编译并运行 ---
# Optional: Add checkpointing
checkpointer = MemorySaver()
reflective_app = reflective_workflow.compile(checkpointer=checkpointer)
print("Reflective LangGraph compiled.")

print("\n--- Running Reflective Graph ---")
inputs_reflect = {"input": "Explain LangGraph simply."}
config = {"configurable": {"thread_id": "user-reflect-1"}} # thread_id for checkpointer
try:
    for event in reflective_app.stream(inputs_reflect, config=config, stream_mode="values"):
         # stream_mode="values" gives the state after each node
         state_after_node = event
         print(f"\nState after node '{list(event.keys())[0]}':")
         # print(state_after_node) # Print the full state dictionary
         print("{")
         for key, value in state_after_node[list(event.keys())[0]].items():
              print(f"  '{key}': {repr(value)[:100]}{'...' if len(repr(value)) > 100 else ''}")
         print("}")
         print("-" * 20)
except Exception as e:
    print(f"Error invoking reflective graph: {e}")

代码解读：

通过扩展 State、添加 reflect 节点和修改条件边，我们实现了反思循环。
如果 Agent 初步完成 (AgentFinish)，流程会先进入 reflect 节点。
reflect 节点调用 LLM 进行批判性评估。
根据评估结果 (is_final_good)，route_after_reflection 条件边决定是最终结束 (END) 还是返回 agent 节点进行重规划。

8.5 LangGraph 的实用工具：可视化、调试、持久化与部署

LangGraph 配备了强大的辅助工具：

可视化:

ASCII 图: graph.get_graph().print_ascii()
图片输出: 需安装 pip install pygraphviz matplotlib 或 pip install mermaid. 使用 graph.get_graph().draw_mermaid_png() 或 draw_png() 生成流程图。

# try:
#     png_bytes = reflective_app.get_graph().draw_mermaid_png()
#     with open("reflective_agent_graph.png", "wb") as f: f.write(png_bytes)
# except Exception as e: print(f"Could not draw graph: {e}")

调试:
- LangSmith: 无缝集成。追踪每个节点的状态输入/输出、内部调用。
- Python 调试器: 在节点函数设置断点检查 State。
- stream(..., stream_mode="values"): 获取每个节点执行后的完整状态快照。
持久化 (Checkpointing):
- 作用: 允许在图执行过程中自动保存状态。通过在 compile() 时传入 checkpointer 参数启用（如 checkpointer=MemorySaver() 使用内存存储，或 SqliteSaver, RedisSaver 等用于持久化存储）。
- 价值:
  - 中断与恢复: 对于长时间运行或需要等待外部输入的流程（如人机交互），可以中断并在之后从保存点恢复，避免从头开始。
  - 容错性: 意外中断后可从最近检查点恢复。
  - 状态追踪: 查看历史执行的中间状态。
部署：
- 编译后的 app 是标准 Runnable。
- 可使用 LangServe 轻松将其部署为 REST API。

8.6 总结：LangGraph——生产级 Agent 的可靠基石

本章，我们深入学习了 LangGraph，一个基于状态机思想的强大 Agent 流程编排框架。我们理解了它如何通过显式状态和流程控制解决 AgentExecutor 的局限性，掌握了其核心组件和构建方法，并通过实验体验了其在构建可靠、可控、可调试的复杂 Agent（如带反思循环）方面的巨大优势。

LangGraph 是构建需要复杂逻辑、人机协作、健壮错误处理、易于维护和迭代的生产级 Agent 应用的理想选择。

相较于 Airflow 等通用工作流引擎，LangGraph 更专注于 LLM Agent 场景，原生理解 Agent 状态和 LangChain 组件。

掌握了 LangGraph，我们就具备了精细控制单个 Agent 复杂行为的能力。但这还不是终点。许多现实任务需要多个 Agent 协作才能高效完成。下一章，我们将进入多 Agent 系统的世界，学习如何让不同的 Agent 进行对话、分工和协作，首先从 AutoGen 框架开始。

内容同步在我的公众号：智语Bot