大模型 | 自进化Agent:兼具反思与记忆增强能力

大模型在动态环境中持续决策、长期记忆缺乏以及有限的上下文窗口等方面仍面临挑战:

  • 元学习、多任务学习等方法来增强LLMs的迁移能力和适应性;
  • 针对有限记忆存储的问题,MemGPT和MemoryBank采用了不同的策略管理记忆;
  • 但这些方法通常针对特定任务或场景,缺乏一个通用框架。

提出了一个创新框架——具有反思和记忆增强能力的自进化智能体SAGE)。SAGE框架包括三个智能体用户、助手和检查器。通过整合迭代反馈、反思机制和基于艾宾浩斯遗忘曲线的记忆优化机制,显著提升了智能体处理多任务和长跨度信息的能力。智能体通过自进化能够适应性地调整策略、优化信息存储和传输,并有效减少认知负荷。

【SAGE框架的示意图】

  • 迭代反馈:助手根据检查器的反馈不断优化输出,直到检查器认为输出正确或达到迭代限制。

  • 反思机制:助手分析任务成功和失败的经验,并将这些经验存储在记忆中,以便在未来任务中做出更好的决策。

  • 记忆语法(MemorySyntax):结合艾宾浩斯遗忘曲线和语言知识,优化助手的记忆和外部存储管理,有效处理和保留重要信息。

【使用SAGE框架的HotpotQA示例的插图】

实验结果表明,SAGE框架在多个基准测试中显著提高了模型性能,尤其在较小模型上表现突出。在AgentBench测试中,SAGE框架不仅增强了像GPT-3.5和GPT-4这样的强大基线模型的性能,还显著提高了开源模型(qwen-7b、codellama-7b、llama2-7b、chatglm2-6b) 的性能。

【在是否使用SAGE框架的情况下,AgentBench的基线性能】

通过反馈机制和记忆优化,SAGE框架有效减少了复杂任务中的基本错误和逻辑错误。

六项任务中各种执行结果的分布。(CLE:超出上下文限制,TLE:超出任务限制)。 超出任务限制是任务未能完成的主要原因,这指出了在有限时间框架内,大型语言模型(LLM)Agent在推理和决策方面存在局限性。

在使用GPT和SAGE框架的TriviaQA案例研究

User:

Answer the question based on the given passages. Only give me the answer and do not output any
other words.The following are given passages:
History of canals in Great Britain - London Canal Museum History of canals in Great Britain Home >
History > Canals in Britain In the history of canals, Britain was not a pioneer. The Chinese can claim
that the Grand Canal of China was one of the first, in the tenth century,a. . . . . .
Answer the question based on the given passages. Only give me the answer and do not output any
other words.
Question: Which metal is produced by the Bessemer Process?
Answer:

Assistant:

The content of the file seems to be an HTML document related to the administrative division of Turkey,
including population statistics, charts, and maps. This file contains scripts, metadata, and structured
data typical for a web page.
Checker:
Clear Objectives: When answering user questions, the focus should be on directly addressing the
question without introducing irrelevant information. Document Handling: Before confirming whether
the document content is relevant to the question, detailed aspects of the document should not be
directly addressed.

Reflection:

Issue Description: The user wants to understand why the system outputs irrelevant HTML document
content based on user input and seeks detailed feedback and reflection. Reasons for the Error:
Misinterpretation of Input: The system mistakenly assumed that the user uploaded a file and attempted
to read its content, whereas the user only provided text input. Processing Logic Error: Without
clearly determining the user’s intent, the system directly parsed and outputted file content. User
Experience Problems: Irrelevant Output: The user expects concise, relevant answers, but the system
provided lengthy and irrelevant HTML content, affecting the user experience. Poor Communication:
Failure to accurately understand and respond to the user’s real needs resulted in off-topic responses
In-depth Reflection and Improvement Suggestions: Clarify User Intent: Accurate Parsing of User
Input: Enhance the system’s natural language processing capabilities to accurately distinguish whether
the user uploaded a file or simply provided text input. User Feedback Mechanism: Implement a user
feedback mechanism to promptly adjust the understanding of user intent. For example, confirm the
issue to verify if the user uploaded a file. Optimize Processing Logic: Layered Processing Flow: In
system design, separate file processing logic from text response logic to avoid misjudgments. For
instance, when a user uploads a file, process the file content separately; when a user inputs text, provide
relevant answers directly. Pre-processing Check: Prior to handling files, perform pre-processing checks
to confirm if the file content meets expectations. . . . . . .

LLMs with SAGE response: Steel.

Ground truth
Steel.
Self-evolving Agents with reflective and memory-augmented abilities
https://arxiv.org/pdf/2409.00872
  • 4
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值