科研学习|论文解读——智能体最新研究进展-CSDN博客

本文链接：https://blog.csdn.net/admin_maxin/article/details/144585413

从2024-12-13到2024-12-18的45篇文章中精选出5篇优秀的工作分享

Can Modern LLMs Act as Agent Cores in Radiology~Environments?
Achieving Collective Welfare in Multi-Agent Reinforcement Learning via Suggestion Sharing
A systematic review of norm emergence in multi-agent systems
Agent-based Video Trimming
GROOT-2: Weakly Supervised Multi-Modal Instruction Following Agents

一、Can Modern LLMs Act as Agent Cores in Radiology~Environments?

Authors: Qiaoyu Zheng, Chaoyi Wu, Pengcheng Qiu, Lisong Dai, Ya Zhang, Yanfeng Wang, Weidi Xie

1.1 论文摘要

Advancements in large language models (LLMs) have paved the way for LLM-based agent systems that offer enhanced accuracy and interpretability across various domains. Radiology, with its complex analytical requirements, is an ideal field for the application of these agents. This paper aims to investigate the pre-requisite question for building concrete radiology agents which is, ‘Can modern LLMs act as agent cores in radiology environments?’ To investigate it, we introduce RadABench with three-fold contributions: First, we present RadABench-Data, a comprehensive synthetic evaluation dataset for LLM-based agents, generated from an extensive taxonomy encompassing 6 anatomies, 5 imaging modalities, 10 tool categories, and 11 radiology tasks. Second, we propose RadABench-EvalPlat, a novel evaluation platform for agents featuring a prompt-driven workflow and the capability to simulate a wide range of radiology toolsets. Third, we assess the performance of 7 leading LLMs on our benchmark from 5 perspectives with multiple metrics. Our findings indicate that while current LLMs demonstrate strong capabilities in many areas, they are still not sufficiently advanced to serve as the central agent core in a fully operational radiology agent system. Additionally, we identify key factors influencing the performance of LLM-based agent cores, offering insights for clinicians on how to apply agent systems in real-world radiology practices effectively. All of our code and data are open-sourced in https://github.com/MAGIC-AI4Med/RadABench.