大模型Agent最新论文及源码合集，覆盖构建、应用、评估

最新推荐文章于 2025-03-28 11:18:37 发布

深度之眼

最新推荐文章于 2025-03-28 11:18:37 发布

阅读量2.7k

点赞数 5

分类专栏：深度学习干货人工智能干货文章标签：人工智能 AGI Agent

本文链接：https://blog.csdn.net/weixin_42645636/article/details/133889828

版权

深度学习干货同时被 2 个专栏收录

669 篇文章

订阅专栏

人工智能干货

642 篇文章

订阅专栏

人们对于通用人工智能（AGI）的追求可以追溯到1950 年代中期，当时的AI研究者对机器拥有人类思维能力抱有很高的期望，但是随着研究的深入，他们发现想实现这个目标比最初设想的困难许多。到如今，AGI仍然有很长的路要走。

不过值得高兴的是，在今年的各大顶会中，有关自主智能体的研究有了许多突破性进展，以往困扰AI Agent研究者的社会交互性和智能性问题都随着大语言模型（LLM）的发展有了新的解决方向。

为方便大家了解AI Agent领域的最新研究进展，我这回整理了52篇2023最新大模型智能体相关的论文，包括LLM-based Agent 的构建、应用、评估等方面。

需要论文及源代码的同学看文末

综述（2篇）

1.A Survey on Large Language Model-based Autonomous Agents

大型语言模型基础上的自主智能体综述

简述：论文首先讨论了LLM驱动自主智能体的构建，其中，作者提出了一个统一的框架，概括了大多数已有的工作。然后，全面概述了LLM驱动自主智能体在社会科学、自然科学和工程学领域的广泛应用。最后，深入探讨了LLM驱动自主智能体常用的评估策略。在前人研究的基础上，作者同时提出了该领域的几个挑战和未来方向。

2.The Rise and Potential of Large Language ModelBased Agents: A Survey

大型语言模型驱动智能体的兴起与潜力

简述：论文首先阐述了智能体从哲学起源到在人工智能领域的发展，以及大型语言模型作为智能体基础的合理性。在此基础上，提出了一个通用的包含大脑、感知和行动模块的智能体框架，可应用于不同任务。接着探讨了智能体在单智能体、多智能体和人机协作等方面的广泛应用。此外，还讨论了智能体社会中的行为、个性、社会现象等，以及对人类社会的启示。最后,讨论了该领域的关键问题和未来方向。

构建（22篇）

1.CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society

使用交流型智能体探索大规模语言模型社会的“心智”

简述：为了解决实现自主合作的挑战，作者提出了一个称为角色扮演的新颖交流型智能体框架。该方法涉及使用开端提示来引导聊天代理完成任务，同时保持与人类意图的一致性。文中展示了如何使用角色扮演生成对话数据，以研究聊天代理的行为和能力，为调查对话语言模型提供了宝贵的资源。

2.Agent Instructs Large Language Models to be General Zero-Shot Reasoners

指示大型语言模型成为通用零样本推理者

简述：本文提出通过让一个专门设计的指导代理与大型语言模型进行互动，来指导并增强这些模型在零样本条件下的通用语言理解和推理能力，在多个数据集上的评估表明，这种方法可以推广到大多数任务，并取得了SOTA的零样本性能。

3.Reflexion: language agents with verbal reinforcement learning

反思：带有言语强化学习的语言代理

简述：这篇论文提出了一种名为 Reflexion 的新框架，通过语言反馈而不是权重更新来增强语言代理，代理会对任务反馈进行口头反思并记录在记忆中，以诱导后续试验中的更好决策。该框架在各种任务上取得明显优于基准的效果，为语言代理提供了一种快速高效的试错学习机制。

4.AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
5.Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph
6.SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks.
7.Tree of Thoughts: Deliberate Problem Solving with Large Language Models.
8.AVIS: Autonomous Visual Information Seeking with Large Language Models
9.Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond
10.Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models
11.Learning Distributed Representations of Sentences from Unlabelled Data
12.A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity
13.HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
14.Large Language Models as Tool Makers
15.InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language
16.AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
17.InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
18.PandaGPT: One Model To Instruction-Follow Them All
19.Visual Instruction Tuning
20.MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
21.LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
22.Agents: An Open-source Framework for Autonomous Language Agents

应用（26篇）

1.WebArena: A Realistic Web Environment for Building Autonomous Agents

WebArena：用于构建自主代理的真实网络环境

简述：本文构建了一个高度真实可重现的网站环境，包含电商、社交、协作开发和内容管理等四个常见领域，并设计了一系列模拟人类日常互联网使用的基准任务，用来评估自主代理完成复杂语言命令的能力。实验集成了推理后行动等最近技术的代理模型，结果显示当前最先进的基于GPT-4的语言模型，在这个真实场景中的端到端任务成功率仅有10.59%，完成复杂任务仍面临巨大挑战。

2.3D-LLM: Injecting the 3D World into Large Language Models

将3D世界注入大型语言模型

简述：本文提出了一种将三维世界知识注入大型语言模型的方法，构建了一种全新的三维语言模型(3D-LLM)。这种模型可以接受三维点云及其特征作为输入，并可以执行与三维相关的各种任务，如三维字幕、三维问答、三维定位等。研究设计了三种提示机制收集了丰富的三维-语言训练数据，并利用多视图渲染的三维特征提取器和二维视觉语言模型作为骨干网络进行模型训练。

3.InterAct: Exploring the Potentials of ChatGPT as a Cooperative Agent

探索ChatGPT作为协作代理的潜力

简述：本研究论文深入探讨了OpenAI的ChatGPT与具身代理系统的集成，评估了其对交互式决策基准的影响。我们参考了人们根据自己的独特优势承担不同角色的概念，并提出了InterAct方法。在这种方法中，作者通过各种提示来喂给ChatGPT，分配它诸如检查员和分类员等多个角色，然后将它们与原始语言模型集成。研究显示，该方法在AlfWorld中取得了98%的显着成功率。

4.The Hitchhiker's Guide to Program Analysis: A Journey with Large Language Models
5.Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling
6.SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models
7.ChatLLM Network: More brains, More intelligence
8.ProAgent: Building Proactive Cooperative AI with Large Language Models
9.MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
10.ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
11.A Virtual Conversational Agent for Teens with Autism Spectrum Disorder: Experimental Results and Design Lessons
12.Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models
13.Multi-Turn Dialogue Agent as Sales' Assistant in Telemarketing
14.Agents: An Open-source Framework for Autonomous Language Agents
15.Improving Factuality and Reasoning in Language Models through Multiagent Debate
16.Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback
17.Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents
18.RoCo: Dialectic Multi-Robot Collaboration with Large Language Models
19.Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks
20.ChatMOF: An Autonomous AI System for Predicting and Generating Metal-Organic Frameworks
21.WebGPT: Browser-assisted question-answering with human feedback
22.Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents
23.Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
24.ScienceWorld: Is your Agent Smarter than a 5th Grader?
25.CGMI: Configurable General Multi-Agent Interaction Framework
26.SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks