[晓理紫]每日论文分享(有中文摘要，源码或项目地址)--强化学习

最新推荐文章于 2024-05-11 22:50:04 发布

晓理紫

最新推荐文章于 2024-05-11 22:50:04 发布

阅读量898

点赞数 18

分类专栏：最新论文和会议信息推送文章标签：人工智能深度学习

本文链接：https://blog.csdn.net/u011573853/article/details/136357427

版权

最新论文和会议信息推送专栏收录该内容

85 篇文章 6 订阅

订阅专栏

专属领域论文订阅

关注{晓理紫|小李子}，每日更新论文，如感兴趣，请转发给有需要的同学，谢谢支持

如果你感觉对你有所帮助，请关注我，每日准时为你推送最新论文。

分类:

大语言模型LLM
视觉模型VLM
扩散模型
视觉语言导航VLN
强化学习 RL
模仿学习 IL
机器人
开放词汇，检测分割

== Reinforcement Learning @ RL @ RLHF ==

标题: Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data

作者: Chongyi Zheng, Benjamin Eysenbach, Homer Walke

PubTime: 2024-02-26

Downlink: http://arxiv.org/abs/2306.03346v2

Project: https://chongyi-zheng.github.io/stable_contrastive_rl|

GitHub: https://github.com/chongyi-zheng/stable_contrastive_rl|

中文摘要: 主要依赖于自我监督学习的机器人系统有可能减少学习控制策略所需的人工注释和工程工作量。与先前的机器人系统利用来自计算机视觉（CV）和自然语言处理（NLP）的自我监督技术的方式相同，我们的工作建立在先前的工作基础上，表明强化学习（RL）本身可以被视为自我监督的问题：学习在没有人类指定的奖励或标签的情况下达到任何目标。尽管看起来很有吸引力，但很少有（如果有的话）先前的工作证明了自我监督的RL方法如何实际部署在机器人系统上。通过首先研究这个任务的一个具有挑战性的模拟版本，我们发现了关于架构和超参数的设计决策，这些决策将成功率提高了 $KaTeX parse error: Undefined control sequence: \倍 at position 2: 2\̲倍̲$ 。这些发现为我们的主要结果奠定了基础：我们证明了基于对比学习的自我监督RL算法可以解决现实世界中基于图像的机器人操纵任务，任务由训练后提供的单个目标图像指定。

摘要: Robotic systems that rely primarily on self-supervised learning have the potential to decrease the amount of human annotation and engineering effort required to learn control strategies. In the same way that prior robotic systems have leveraged self-supervised techniques from computer vision (CV) and natural language processing (NLP), our work builds on prior work showing that the reinforcement learning (RL) itself can be cast as a self-supervised problem: learning to reach any goal without human-specified rewards or labels. Despite the seeming appeal, little (if any) prior work has demonstrated how self-supervised RL methods can be practically deployed on robotic systems. By first studying a challenging simulated version of this task, we discover design decisions about architectures and hyperparameters that increase the success rate by $\times$ . These findings lay the groundwork for our main result: we demonstrate that a self-supervised RL algorithm based on contrastive learning can solve real-world, image-based robotic manipulation tasks, with tasks being specified by a single goal image provided after training.

标题: Contrastive Difference Predictive Coding

作者: Chongyi Zheng, Ruslan Salakhutdinov, Benjamin Eysenbach

PubTime: 2024-02-26

Downlink: http://arxiv.org/abs/2310.20141v2

Project: https://chongyi-zheng.github.io/td_infonce|

GitHub: https://github.com/chongyi-zheng/td_infonce|

中文摘要: 对未来的预测和推理是许多时间序列问题的核心。例如，目标条件强化学习可以被视为预测未来可能访问哪些状态的学习表示。虽然以前的方法使用对比预测编码来建模时间序列数据，但学习编码长期相关性的表示通常需要大量数据。在本文中，我们介绍了对比预测编码的时差版本，它将不同的时间序列数据拼接在一起，以减少学习未来事件预测所需的数据量。我们应用这种表示学习方法来导出目标条件RL的非策略算法。实验表明，与以往的RL方法相比，我们的方法在成功率中值上提高了2倍，并且能够更好地应对随机环境。在表格设置中，我们表明我们的方法比后继表示的样本效率高20美元，比标准（蒙特卡罗）版本的对比预测编码的样本效率高1500美元。

摘要: Predicting and reasoning about the future lie at the heart of many time-series questions. For example, goal-conditioned reinforcement learning can be viewed as learning representations to predict which states are likely to be visited in the future. While prior methods have used contrastive predictive coding to model time series data, learning representations that encode long-term dependencies usually requires large amounts of data. In this paper, we introduce a temporal difference version of contrastive predictive coding that stitches together pieces of different time series data to decrease the amount of data required to learn predictions of future events. We apply this representation learning method to derive an off-policy algorithm for goal-conditioned RL. Experiments demonstrate that, compared with prior RL methods, ours achieves $\times$ median improvement in success rates and can better cope with stochastic environments. In tabular settings, we show that our method is about $20× \times$ more sample efficient than the successor representation and $1500× \times$ more sample efficient than the standard (Monte Carlo) version of contrastive predictive coding.

标题: Brain Effective Connectivity Learning with Deep Reinforcement Learning

作者: Yilin Lu, Jinduo Liu, Junzhong Ji

PubTime: 2022-12

Downlink: https://ieeexplore.ieee.org/document/9995284/

Journal: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

中文摘要: 近年来，使用功能磁共振成像（fMRI）数据推断不同大脑区域之间的大脑有效连接性（EC）是神经信息学中一项重要的高级研究。然而，由于神经成像数据的高噪声，当前的方法总是表现不佳。在本文中，我们提出了一种具有深度强化学习的有效连接学习方法，称为EC-DRL，旨在从fMRI数据中更准确地识别大脑有效连接。该方法基于actor-critic算法框架，使用编码器——解码器模型作为actor网络。更具体地说，编码器采用Transformer模型结构，解码器采用具有注意机制的双向长短期记忆网络。对模拟fMRI数据和真实世界fMRI数据的大量实验结果表明，与最先进的方法相比，EC-DRL可以更好地推断有效连接性。

摘要: In recent years, using functional magnetic resonance imaging (fMRI) data to infer brain effective connectivity (EC) between different brain regions is an important advanced study in neuroinformatics. However, current methods always perform not well due to the high noise of neuroimaging data. In this paper, we propose an effective connectivity learning method with deep reinforcement learning, called EC-DRL, aiming to more accurately identify the brain effective connectivity from fMRI data. The proposed method is based on the actor-critic algorithm framework, using the encoder-decoder model as the actor network. More specifically, the encoder adopts the Transformer model structure, and the decoder uses a bidirectional long-short-term memory network with an attention mechanism. A large number of experimental results on simulated fMRI data and real-world fMRI data show that EC-DRL can better infer effective connectivity compared to the state-of-the-art methods.

标题: Implementation of Quantum Deep Reinforcement Learning Using Variational Quantum Circuits

作者: S Lokes, C Sakthi Jay Mahenthar, S Parvatha Kumaran

PubTime: 2022-10

Downlink: https://ieeexplore.ieee.org/document/10041479/

Journal: 2022 International Conference on Trends in Quantum Computing and Emerging Business Technologies (TQCEBT)

中文摘要: 随着技术的发展，解决机器学习和深度学习等复杂计算问题的需求激增。然而，即使是最强大的经典超级计算机也很难执行这些任务。量子计算的进步正在引领研究人员和科技巨头努力寻找更好的量子电路来完成机器学习任务。量子机器学习（QML）的当前工作确保了更少的内存消耗和更少的模型参数。然而，由于深度量子电路的不灵活性，在现有的量子计算设备上模拟经典的深度学习方法是费力的。因此，为有噪声的中尺度量子（NISQ）器件的QML设计可行的量子算法是必不可少的。提出的工作旨在通过将目标网络和经验重放重塑为VQC的表示，探索用于基于深度Q网络的强化学习的变分量子电路（VQC）。此外，为了减少模型参数，量子信息编码方案得到了比经典神经网络更好的结果。通过用目标网络和经验回放近似深度Q值函数，VQCs被用于策略选择和决策强化学习。

摘要: With the evolution of technology, the need to solve complex computational problems like machine learning and deep learning has shot up. However, even the most powerful classical supercomputers find it difficult to execute these tasks. Advancements in quantum computing are leading researchers and tech-giants who strive for better quantum circuits to do machine learning tasks. Current works on Quantum Machine Learning (QML) ensure less memory consumption and reduced model parameters. However, it is strenuous to simulate the classical deep learning approach on existing quantum computing devices due to the inflexibility of Deep quantum circuits. Consequently, designing viable quantum algorithms for QML for noisy intermediate-scale quantum (NISQ) devices is essential. The proposed work aims to explore Variational Quantum Circuits (VQC) for Deep Q network-based Reinforcement Learning by remodeling the target network and experience replay into a representation of VQC. In addition, to the reduction in model parameters, quantum information encoding schemes are used to achieve better results than classical neural networks. VQCs are employed for policyselection and decision-making reinforcement learning by approximating the deep Q-value function with target network and experience replay.

标题: Application of Deep Reinforcement Learning in UAVs: A Review

作者: Ruihui Wang, Li Xu

PubTime: 2022-08

Downlink: https://ieeexplore.ieee.org/document/10034357/

Journal: 2022 34th Chinese Control and Decision Conference (CCDC)

中文摘要: 深度强化学习是人工智能领域最重要的分支之一。它具有很强的高纬度数据处理能力，主要用于解决控制和决策问题。无人机领域的许多问题都涉及控制和决策，因此深度强化学习在无人机领域有着广泛的应用。首先，本文介绍了经典强化学习的原理，并扩展到深度强化学习的原理。其次，本文描述了无人机领域中的控制和决策问题，并进一步介绍了深度强化学习在这些问题中的应用。最后，对深度强化学习在该领域的未来发展方向进行了展望，并为未来的相关研究提供了参考。

摘要: Deep reinforcement learning is one of the most important branches in the field of artificial intelligence. It has strong high-latitude data processing capabilities and is mainly used to solve control and decision-making problems. Many problems in the field of unmanned aerial vehicle(UAV) involve control and decision-making, so deep reinforcement learning has a wide range of applications in the field of UAVs. First, this article introduces the principles of classic reinforcement learning and extends to the principles of deep reinforcement learning. Secondly, this article describes the control and decision-making problems in the field of UAVs, and further introduces the application of deep reinforcement learning to these problems. Finally, this article looks forward to the future development direction of deep reinforcement learning in this field, and provides references for future related research.

标题: Efficient and Stable Information Directed Exploration for Continuous Reinforcement Learning

作者: Mingzhe Chen, Xi Xiao, Wanpeng Zhang

PubTime: 2022-05

Downlink: https://ieeexplore.ieee.org/document/9746211/

Journal: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

中文摘要: 本文研究了强化学习算法的探索——开发困境。我们将信息导向抽样（一种测量策略信息增益的探索框架）应用于连续强化学习。为了稳定非策略学习过程并进一步提高样本效率，我们建议使用随机学习目标，并动态调整神经网络模型不同部分的更新数据比。实验表明，我们的方法比现有方法有了显著的改进，并且成功地完成了高度稀疏奖励信号的任务。

摘要: In this paper, we investigate the exploration-exploitation dilemma of reinforcement learning algorithms. We adapt the information directed sampling, an exploration framework that measures the information gain of a policy, to the continuous reinforcement learning. To stabilize the off-policy learning process and further improve the sample efficiency, we propose to use a randomized learning target and to dynamically adjust the update-to-data ratio for different parts of the neural network model. Experiments show that our approach significantly improves over existing methods and successfully completes tasks with highly sparse reward signals.

== Imitation Learning ==

标题: Leveraging Demonstrator-Perceived Precision for Safe Interactive Imitation Learning of Clearance-Limited Tasks

作者: Hanbit Oh, Takamitsu Matsubara

PubTime: 2024-04

Downlink: https://ieeexplore.ieee.org/document/10438830/

Journal: IEEE Robotics and Automation Letters

Project: http://www.w3.org/1998/Math/MathML|http://www.w3.org/1999/xlink|http://www.w3.org/1998/Math/MathML|http://www.w3.org/1999/xlink|

中文摘要: 交互式模仿学习是一种高效、无模型的方法，通过这种方法，机器人可以通过重复迭代学习策略的执行来学习任务，并通过查询人类演示来收集数据。然而，为间隙有限的任务（如工业插入）部署不成熟的策略会带来重大的碰撞风险。对于这样的任务，机器人应该检测碰撞风险，并在碰撞即将发生时通过将控制权交给人类来请求干预。前者需要精确的环境模型，这种需要极大地限制了IIL应用的范围。相比之下，人类通过调整自己的行为来避免执行任务时的碰撞，从而隐含地展示了环境的精确性。受人类行为的启发，这封信提出了一种新颖的交互式学习方法，该方法使用<italic xmlns：mml=”http：//www.w3.org/1998/Math/MathML”xmlns：xlink=”http：//www.w3.org/1999/xlink”>演示者感知精度作为人类干预的标准，称为演示者感知精度感知交互式模仿学习（DPIIL）。DPIIL通过观察人类演示中展示的速度——精度权衡来捕捉精度，并将控制权交给人类，以避免在估计高精度的状态下发生碰撞。DPIIL提高了交互式策略学习的安全性，并确保了效率，而无需明确提供环境的精确信息。我们通过模拟和真实机器人实验评估了DPIIL的有效性，这些实验训练了UR5e 6自由度机械臂来执行装配任务。我们的结果显著提高了训练的安全性，我们的最佳表现优于其他学习方法。

摘要: Interactive imitation learning is an efficient, model-free method through which a robot can learn a task by repetitively iterating an execution of a learning policy and a data collection by querying human demonstrations. However, deploying unmatured policies for clearance-limited tasks, like industrial insertion, poses significant collision risks. For such tasks, a robot should detect the collision risks and request intervention by ceding control to a human when collisions are imminent. The former requires an accurate model of the environment, a need that significantly limits the scope of IIL applications. In contrast, humans implicitly demonstrate environmental precision by adjusting their behavior to avoid collisions when performing tasks. Inspired by human behavior, this letter presents a novel interactive learning method that uses
demonstrator-perceived precision
as a criterion for human intervention called Demonstrator-perceived Precision-aware Interactive Imitation Learning (DPIIL). DPIIL captures precision by observing the speed-accuracy trade-off exhibited in human demonstrations and cedes control to a human to avoid collisions in states where high precision is estimated. DPIIL improves the safety of interactive policy learning and ensures efficiency without explicitly providing precise information of the environment. We assessed DPIIL’s effectiveness through simulations and real-robot experiments that trained a UR5e 6-DOF robotic arm to perform assembly tasks. Our results significantly improved training safety, and our best performance compared favorably with other learning methods.

标题: Graph-Based Distributed Control in Vehicular Communications Networks

作者: Jikui Zhao, Yudi Dong, Huaxia Wang

PubTime: 2023-06

Downlink: https://ieeexplore.ieee.org/document/10201143/

Journal: 2023 IEEE 97th Vehicular Technology Conference (VTC2023-Spring)

中文摘要: 本文提出了一种在车载网络中分配分布式频谱和电源资源的新算法，包括车对车（V2V）和车对基础设施（V2I）通信链路。所提出的算法利用模仿学习来训练遵循车辆系统局部结构的分布式策略，同时模仿集中式策略。这种方法保证了未来车辆通信网络中可靠、有效和智能的通信和控制。通过Matlab仿真和机器学习实验对所提出的模型进行了评估，结果表明所提出的方案比传统的全局优化方法更有效，因为它具有较小的传输开销，并且通过在车载网络中使用基于图的分布式模式来提高网络性能。

摘要: This paper proposes a novel algorithm for allocating distributed spectrum and power resources in vehicular networks, including both vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communication links. The proposed algorithm utilizes imitation learning to train distributed policies that adhere to the local structure of the vehicular system while imitating a centralized policy. This approach guarantees dependable, effective, and intelligent communication and control in the future generation of vehicular communication networks. The proposed model is evaluated through Matlab simulations and machine learning experiments, which demonstrate that the proposed scheme is more effective than traditional global optimization approaches, as it has a small transmission overhead and improves network performance by using graph-based distributed mode in vehicular networks.

标题: Multi-Agent Path Finding Using Imitation-Reinforcement Learning with Transformer

作者: Lin Chen, Yaonan Wang, Zhiqiang Miao

PubTime: 2022-12

Downlink: https://ieeexplore.ieee.org/document/10011833/

Journal: 2022 IEEE International Conference on Robotics and Biomimetics (ROBIO)

中文摘要: 多智能体路径寻找是一个为多个智能体从起始位置到目标无冲突地寻找最优路径集的问题，这对于大规模机器人系统至关重要。将模仿和强化学习应用于MAPF问题的求解，取得了一定的效果，为大型机器人系统的路径规划问题提供了可行的解决方案。目前的方法通过引入图形神经网络和agent之间的通信，提高了复杂环境下分布式策略引导agent规划路径的性能，但大大降低了系统的鲁棒性。本文通过引入Transformer model，开发了一种新的模仿强化学习框架，使算法能够在复杂环境中表现良好，而不依赖于代理之间的通信。与同类方法相比，实验表明，该方法训练的策略引导智能体从初始位置无碰撞地行驶到目标位置，并取得更好的性能。

摘要: Multi-Agent Path Finding is a problem of finding the optimal set of paths for multiple agents from the starting position to the goal without conflict, which is essential to large-scale robotic systems. Imitation and reinforcement learning are applied to solve the MAPF problem and have achieved certain results, which provides a feasible solution for the path planning problem of large-scale robot systems. The current method improves the performance of distributed strategy-guided agent planning paths in complex environments by introducing the communication between graph neural networks and agents but dramatically reduces the system’s robustness. This paper develops a novel imitation reinforcement learning framework by introducing Transformer, which enables algorithms to perform well in complex environments without relying on communication between agents. Compared with its counterparts, experiments show that the policy trained by our method guides the agent to drive from the initial position to the goal without collision and achieve better performance.

标题: From One Hand to Multiple Hands: Imitation Learning for Dexterous Manipulation From Single-Camera Teleoperation

作者: Yuzhe Qin, Hao Su, Xiaolong Wang

PubTime: 2022-10

Downlink: https://ieeexplore.ieee.org/document/9849105/

Journal: IEEE Robotics and Automation Letters

中文摘要: 我们提出从人类演示中用多指机器人手执行灵巧操作的模仿学习，并将策略转移到真实的机器人手。我们介绍了一种新颖的单摄像头遥操作系统，仅用iPad和电脑就能有效地收集3D演示。我们系统的一个关键贡献是，我们为模拟器中的每个用户构建了一个定制的机器人手，这是一个类似于操作员手的相同结构的机械手。它提供了一个直观的界面，避免了不稳定的人手重定向数据收集，导致大规模和高质量的数据。一旦收集了数据，定制的机器人手轨迹可以转换为不同的指定机器人手（制造的模型），以生成训练演示。通过使用我们的数据进行模仿学习，我们显示出比具有多个复杂操作任务的基线有很大的改进。重要的是，我们表明，当转移到真实的机器人时，我们学习的策略明显更加稳健。

摘要: We propose to perform imitation learning for dexterous manipulation with multi-finger robot hand from human demonstrations, and transfer the policy to the real robot hand. We introduce a novel single-camera teleoperation system to collect the 3D demonstrations efficiently with only an iPad and a computer. One key contribution of our system is that we construct a customized robot hand for each user in the simulator, which is a manipulator resembling the same structure of the operator’s hand. It provides an intuitive interface and avoid unstable human-robot hand retargeting for data collection, leading to large-scale and high quality data. Once the data is collected, the customized robot hand trajectories can be converted to different specified robot hands (models that are manufactured) to generate training demonstrations. With imitation learning using our data, we show large improvement over baselines with multiple complex manipulation tasks. Importantly, we show our learned policy is significantly more robust when transferring to the real robot.

标题: Type-2 Fuzzy Model-Based Movement Primitives for Imitation Learning

作者: Da Sun, Qianfang Liao, Amy Loutfi

PubTime: 2022-08

Downlink: https://ieeexplore.ieee.org/document/9729561/

Journal: IEEE Transactions on Robotics

中文摘要: 模仿学习是机器人技能学习领域的一个重要方向。它提供了一个用户友好和简单的解决方案，将人类演示转移到机器人身上。在本文中，我们将模糊理论融入到模仿学习中，开发了一种新的方法，称为基于第二类模糊模型的运动原语（T2FMP）。在该方法中，使用一组数据驱动的2型模糊模型来描述演示的输入输出关系。基于模糊模型，T2FMP可以有效地再现轨迹，而无需高计算成本或繁琐的参数设置。此外，它能很好地处理演示的变化，并对噪声具有鲁棒性。此外，我们开发了赋予T2FMP轨迹调制和叠加的扩展，以实现对各种场景的实时轨迹适应。超越现有的模仿学习方法，我们进一步扩展T2FMP来调节轨迹，以避免在非结构化、非凸和用噪声异常值检测的环境中发生碰撞。进行了几个实验来验证我们方法的有效性。

摘要: Imitation learning is an important direction in the area of robot skill learning. It provides a user-friendly and straightforward solution to transfer human demonstrations to robots. In this article, we integrate fuzzy theory into imitation learning to develop a novel method called type-2 fuzzy model-based movement primitives (T2FMP). In this method, a group of data-driven type-2 fuzzy models are used to describe the input–output relationships of demonstrations. Based on the fuzzy models, T2FMP can efficiently reproduce the trajectory without high computational costs or cumbersome parameter settings. Besides, it can well handle the variation of the demonstrations and is robust to noise. In addition, we develop extensions that endow T2FMP with trajectory modulation and superposition to achieve real-time trajectory adaptation to various scenarios. Going beyond existing imitation learning methods, we further extend T2FMP to regulate the trajectory to avoid collisions in the environment that is unstructured, nonconvex, and detected with noisy outliers. Several experiments are performed to validate the effectiveness of our method.

标题: Synthetic Data Generation using Imitation Training

作者: Aman Kishore, Tae Eun Choe, Junghyun Kwon

PubTime: 2021-10

Downlink: https://ieeexplore.ieee.org/document/9607682/

Journal: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

摘要: We propose a strategic approach to generate synthetic data in order to improve machine learning algorithms such as Deep Neural Networks (DNN). Utilization of synthetic data has shown promising results yet there are no specific rules or recipes on how to generate and cook synthetic data. We propose imitation training as a guideline of synthetic data generation to add more underrepresented entities and balance the data distribution for DNN to handle corner cases and resolve long tail problems. The proposed imitation training has a circular process with three main steps: First, the existing system is evaluated and failure cases such as false positive and false negative detections are sorted out; Secondly, synthetic data imitating such failure cases is created with domain randomization; Thirdly, we train a net-work with the existing data and the newly added synthetic data; We repeat these three steps until the evaluation metric converges. We validated the approach by experimenting on object detection in autonomous driving.

== Embodied Artificial Intelligence@robotic agent@human robot interaction ==

标题: Understanding Social Robots: Attribution of Intentional Agency to Artificial and Biological Bodies

作者: Tom Ziemke

PubTime: 2023-08

Downlink: https://ieeexplore.ieee.org/document/10302102/

Journal: Artificial Life

中文摘要: 机器人人工智能（AI）和人工生命的许多研究都集中在自主代理上，作为人工智能的具体化和情境化方法。这种系统通常被视为克服了许多与传统计算主义人工智能和认知科学相关的哲学问题，如基础问题（Harnad）或缺乏意向性（Searle），因为它们具有传统人工智能被认为缺乏的物理和感觉运动基础。例如，机器人割草机和自动驾驶汽车或多或少地可靠地避开障碍物，接近充电站，等等——因此可能被认为具有某种形式的人为意图或有意方向性。然而，应该注意的是，机器人与人共享物理环境的事实并不一定意味着它们与人类处于相同的感知和社会世界。对于遇到社交互动系统（如社交机器人或自动车辆）的人来说，这提出了一个巨大的挑战，即将它们解释为理解和预测其行为的有意代理，但也要记住人造身体的意向性与它们的自然对应物有着根本的不同。这一方面需要“暂停怀疑”，但另一方面也需要“暂停信仰”的能力。这种（归因的）人工意向性的双重性质在具身人工智能和社会机器人研究中得到了相当肤浅的解决。因此，有人认为，Bourgine和Varela关于人工生命是自主系统实践的概念需要以社会互动自主系统的实践来补充，以更好地理解人工和生物身体之间的差异及其在人与技术之间的社会互动背景下的影响为指导。

摘要: Much research in robotic artificial intelligence (AI) and Artificial Life has focused on autonomous agents as an embodied and situated approach to AI. Such systems are commonly viewed as overcoming many of the philosophical problems associated with traditional computationalist AI and cognitive science, such as the grounding problem (Harnad) or the lack of intentionality (Searle), because they have the physical and sensorimotor grounding that traditional AI was argued to lack. Robot lawn mowers and self-driving cars, for example, more or less reliably avoid obstacles, approach charging stations, and so on—and therefore might be considered to have some form of artificial intentionality or intentional directedness. It should be noted, though, that the fact that robots share physical environments with people does not necessarily mean that they are situated in the same perceptual and social world as humans. For people encountering socially interactive systems, such as social robots or automated vehicles, this poses the nontrivial challenge to interpret them as intentional agents to understand and anticipate their behavior but also to keep in mind that the intentionality of artificial bodies is fundamentally different from their natural counterparts. This requires, on one hand, a “suspension of disbelief ” but, on the other hand, also a capacity for the “suspension of belief.” This dual nature of (attributed) artificial intentionality has been addressed only rather superficially in embodied AI and social robotics research. It is therefore argued that Bourgine and Varela’s notion of Artificial Life as the practice of autonomous systems needs to be complemented with a practice of socially interactive autonomous systems, guided by a better understanding of the differences between artificial and biological bodies and their implications in the context of social interactions between people and technology.

标题: Explorative Synthetic Biology in AI: Criteria of Relevance and a Taxonomy for Synthetic Models of Living and Cognitive Processes

作者: Luisa Damiano, Pasquale Stano

PubTime: 2023-08

Downlink: https://ieeexplore.ieee.org/document/10302132/

Journal: Artificial Life

中文摘要: 本文从两个方面探讨了特刊“人工智能中的生物学：认知的硬件、软件和湿件建模的新前沿”的主题。它解决了硬件、软件和湿件模型与生物认知的科学理解的相关性问题，并阐明了合成生物学（被解释为认知的合成探索）可以为人工智能（AI）提供的贡献。本文提出的研究工作基于这样一种想法，即生物和认知过程的硬件、软件和湿件模型的相关性——即这些模型可以对生命和认知的科学理解做出的具体贡献——仍然不清楚，主要是因为缺乏明确的标准来评估合成模型可以以何种方式支持生物和认知现象的实验探索。我们的文章利用控制论和自生认识论的元素来定义一个参考框架，用于生命和认知的综合研究，能够生成一套评估标准和相关形式的分类，用于综合模型，能够克服其评估在目标过程的单纯模仿和完全复制之间的贫瘠、传统的两极分化。在这些工具的基础上，我们尝试性地绘制了合成生物学可以产生的生活和认知过程的湿件模型的相关性形式，并概述了将合成生物学技术应用于（具体化）人工智能研究领域的“组织相关方法”发展的规划方向。

摘要: This article tackles the topic of the special issue “Biology in AI: New Frontiers in Hardware, Software and Wetware Modeling of Cognition” in two ways. It addresses the problem of the relevance of hardware, software, and wetware models for the scientific understanding of biological cognition, and it clarifies the contributions that synthetic biology, construed as the synthetic exploration of cognition, can offer to artificial intelligence (AI). The research work proposed in this article is based on the idea that the relevance of hardware, software, and wetware models of biological and cognitive processes—that is, the concrete contribution that these models can make to the scientific understanding of life and cognition—is still unclear, mainly because of the lack of explicit criteria to assess in what ways synthetic models can support the experimental exploration of biological and cognitive phenomena. Our article draws on elements from cybernetic and autopoietic epistemology to define a framework of reference, for the synthetic study of life and cognition, capable of generating a set of assessment criteria and a classification of forms of relevance, for synthetic models, able to overcome the sterile, traditional polarization of their evaluation between mere imitation and full reproduction of the target processes. On the basis of these tools, we tentatively map the forms of relevance characterizing wetware models of living and cognitive processes that synthetic biology can produce and outline a programmatic direction for the development of “organizationally relevant approaches” applying synthetic biology techniques to the investigative field of (embodied) AI.

标题: Robotic Backchanneling in Online Conversation Facilitation: A Cross-Generational Study

作者: Sota Kobuki, Katie Seaborn, Seiki Tokunaga

PubTime: 2023-08

Downlink: https://ieeexplore.ieee.org/document/10309362/

Journal: 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)

中文摘要: 日本面临着许多与其老龄化社会相关的挑战，包括人口认知能力下降的速度越来越快，以及护理人员短缺。人们已经开始努力探索使用人工智能（AI）的解决方案，特别是可以与人交流的社交智能代理和机器人。然而，很少有关于这些药物在各种日常情况下与老年人相容性的研究。为此，我们进行了一项用户研究，以评估一个机器人，该机器人作为旨在防止认知能力下降的群体对话协议的促进者。我们对机器人进行了改造，使其使用反向沟通，这是一种自然的人类说话方式，以增加机器人的接受能力和群体对话体验的乐趣。我们对年轻人和老年人进行了一项跨代研究。定性分析表明，年轻人认为反向通道版本的机器人比非反向通道机器人更友好、更值得信赖、更容易接受。最后，我们发现机器人的反向沟通引发了老年参与者的非语言反向沟通。

摘要: Japan faces many challenges related to its aging society, including increasing rates of cognitive decline in the population and a shortage of caregivers. Efforts have begun to explore solutions using artificial intelligence (AI), especially socially embodied intelligent agents and robots that can communicate with people. Yet, there has been little research on the compatibility of these agents with older adults in various everyday situations. To this end, we conducted a user study to evaluate a robot that functions as a facilitator for a group conversation protocol designed to prevent cognitive decline. We modified the robot to use backchannelling, a natural human way of speaking, to increase receptiveness of the robot and enjoyment of the group conversation experience. We conducted a cross-generational study with young adults and older adults. Qualitative analyses indicated that younger adults perceived the backchannelling version of the robot as kinder, more trustworthy, and more acceptable than the non-backchannelling robot. Finally, we found that the robot’s backchannelling elicited nonverbal backchanneling in older participants.

标题: Optimization of Humanoid Robot Designs for Human-Robot Ergonomic Payload Lifting

作者: Carlotta Sartore, Lorenzo Rapetti, Daniele Pucci

PubTime: 2022-11

Downlink: https://ieeexplore.ieee.org/document/10000222/

Journal: 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)

中文摘要: 当人类和人形机器人进行物理协作时，人体工程学是一个需要考虑的关键因素。假设一个给定的仿人机器人，目前存在几种控制体系结构来解决人机工程学物理人——机器人协作。本文进一步将机器人硬件参数作为协同有效载荷提升问题的优化变量。参数化机器人运动学和动力学的变量保证了它们的物理一致性，优化问题考虑了人体模型。通过利用所提出的建模框架，交互的人机工程学被最大化，这里由代理的能量消耗给出。在求解相关优化问题时，考虑了机器人运动学、动力学、硬件约束和人体几何形状。所提出的方法用于确定ergoCub机器人设计的最佳硬件参数，ergoCub机器人是一种具有一定程度的具体化智能的人形机器人，用于与人类进行人机工程学交互。对于优化问题，起点是iCub仿人机器人。所获得的机器人设计相对于射程限制在0.8-1.2米的iCub机器人达到0.8-1.5米范围内的高度负载。机器人能量消耗减少约33%，同时保持了人体工效学，总体上改善了交互性。

摘要: When a human and a humanoid robot collaborate physically, ergonomics is a key factor to consider. Assuming a given humanoid robot, several control architectures exist nowadays to address ergonomic physical human-robot collaboration. This paper takes one step further by considering robot hardware parameters as optimization variables in the problem of collaborative payload lifting. The variables that parametrize robot’s kinematics and dynamics ensure their physical consistency, and the human model is considered in the optimization problem. By leveraging the proposed modelling framework, the ergonomy of the interaction is maximized, here given by the agents’ energy expenditure. Robot kinematic, dynamics, hardware constraints and human geometries are considered when solving the associated optimization problem. The proposed methodology is used to identify optimum hardware parameters for the design of the ergoCub robot, a humanoid possessing a degree of embodied intelligence for ergonomic interaction with humans. For the optimization problem, the starting point is the iCub humanoid robot. The obtained robot design reaches loads at heights in the range of 0.8 - 1.5 m with respect to the iCub robot whose range is limited to 0.8-1.2 m. The robot energy expenditure is decreased by about 33%, meanwhile, the human ergonomy is preserved, leading overall to an improved interaction.

标题: Embodied-AI Wheelchair Framework with Hands-free Interface and Manipulation

作者: Jesse F. Leaman, Zongming Yang, Yasmine N. Elglaly

PubTime: 2022-10

Downlink: https://ieeexplore.ieee.org/document/9945310/

Journal: 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

摘要: Assistive robots can be found in hospitals and rehabilitation clinics, where they help patients maintain a positive disposition. Our proposed robotic mobility solution combines state of the art hardware and software to provide a safer, more independent, and more productive lifestyle for people with some of the most severe disabilities. New hardware includes, a retractable roof, manipulator arm, a hard backpack, a number of sensors that collect environmental data and processors that generate 3D maps for a hands-free human-machine interface.The proposed new system receives input from the user via head tracking or voice command, and displays information through augmented reality into the user’s field of view. The software algorithm will use a novel cycle of self-learning artificial intelligence that achieves autonomous navigation while avoiding collisions with stationary and dynamic objects. The prototype will be assembled and tested over the next three years and a publicly available version could be ready two years thereafter.

标题: Can you Empathize with Me? Development of a 360° Video-Training to Enhance Residents’ Empathic Abilities

作者: Maria Sansoni, Sabrina Bartolotta, Andrea Gaggioli

PubTime: 2022-10

Downlink: https://ieeexplore.ieee.org/document/9967557/

Journal: 2022 IEEE International Conference on Metrology for Extended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE)

摘要: Empathic care is an important element of the patient-provider relationship. However, several factors can undermine physicians’ ability to be empathic (e.g., burnout). The aim of this study is therefore to describe a 360° video-based VR protocol for residents who work in cancer care that combines embodied perspective-taking and theoretical empathy training to target empathy communication and burnout characteristics. To reach this goal, we will use 360° technology, storytelling, and embodied perspective-taking to let the resident experience the perspective of a patient who is receiving a cancer diagnosis. Psycho-educational training will be also employed to support participants in acquiring information about how to be empathic while communicating with a patient. We expect that this training may promote empathy and improve patient-provider communication while buffering the risk of burnout.

== Object Detection@ Segmentation@Open vocabulary detection@SAM ==

标题: FairSeg: A Large-Scale Medical Image Segmentation Dataset for Fairness Learning Using Segment Anything Model with Fair Error-Bound Scaling

作者: Yu Tian, Min Shi, Yan Luo

PubTime: 2024-02-27

Downlink: http://arxiv.org/abs/2311.02189v3

Project: https://ophai.hms.harvard.edu/harvard-fairseg10k|

GitHub: https://github.com/Harvard-Ophthalmology-AI-Lab/FairSeg|

中文摘要: 近年来，人工智能模型中的公平性获得了更多关注，尤其是在医学领域，因为医学模型中的公平性对人们的福祉和生活至关重要。需要高质量的医疗公平数据集来促进公平学习研究。现有的医学公平性数据集都用于分类任务，没有公平性数据集可用于医学分割，而医学分割是与分类同等重要的临床任务，它可以提供器官异常的详细空间信息，供临床医生评估。在本文中，我们提出了第一个用于医学分割的公平性数据集，名为Harvard-FairSeg，包含10,000个受试者样本。此外，我们提出了一种公平的误差界缩放方法，使用分段任意模型（SAM），用每个身份组中的误差界上限重新加权损失函数。我们预计，通过明确处理每个身份组中具有高训练错误的困难情况，可以提高分割性能公平性。为了促进公平比较，我们利用一种新的公平尺度细分性能指标来比较公平背景下的细分指标，如公平尺度骰子系数。通过全面的实验，我们证明了我们的公平误差限制缩放方法与最先进的公平学习模型相比具有更好或相当的公平性能。数据集和代码可通过https：//ophia.hms.harvard.edu/harvard-fairseg10k。公开访问。

摘要: Fairness in artificial intelligence models has gained significantly more attention in recent years, especially in the area of medicine, as fairness in medical models is critical to people’s well-being and lives. High-quality medical fairness datasets are needed to promote fairness learning research. Existing medical fairness datasets are all for classification tasks, and no fairness datasets are available for medical segmentation, while medical segmentation is an equally important clinical task as classifications, which can provide detailed spatial information on organ abnormalities ready to be assessed by clinicians. In this paper, we propose the first fairness dataset for medical segmentation named Harvard-FairSeg with 10,000 subject samples. In addition, we propose a fair error-bound scaling approach to reweight the loss function with the upper error-bound in each identity group, using the segment anything model (SAM). We anticipate that the segmentation performance equity can be improved by explicitly tackling the hard cases with high training errors in each identity group. To facilitate fair comparisons, we utilize a novel equity-scaled segmentation performance metric to compare segmentation metrics in the context of fairness, such as the equity-scaled Dice coefficient. Through comprehensive experiments, we demonstrate that our fair error-bound scaling approach either has superior or comparable fairness performance to the state-of-the-art fairness learning models. The dataset and code are publicly accessible via https://ophai.hms.harvard.edu/harvard-fairseg10k.

标题: Overcoming Dimensional Collapse in Self-supervised Contrastive Learning for Medical Image Segmentation

作者: Jamshid Hassanpour, Vinkle Srivastav, Didier Mutter

PubTime: 2024-02-27

Downlink: http://arxiv.org/abs/2402.14611v2

Project: https://biomedicalimaging.org/2024/)|

GitHub: https://github.com/CAMMA-public/med-moco|

中文摘要: 当标记数据量有限时，自我监督学习（SSL）方法取得了巨大成功。在SSL中，模型通过解决借口任务来学习健壮的特征表示。一个这样的借口任务是对比学习，它涉及形成相似和不相似的输入样本对，指导模型区分它们。在这项工作中，我们研究了对比学习在医学图像分析领域的应用。我们的发现表明，MoCo v2，一种最先进的对比学习方法，在应用于医学图像时遇到了维度坍缩。这归因于医学图像之间共享的高度图像间相似性。为了解决这个问题，我们提出了两个关键的贡献：局部特征学习和特征去相关。局部特征学习提高了模型关注图像局部区域的能力，而特征去相关消除了特征之间的线性相关性。我们的实验结果表明，我们的贡献显著增强了模型在医学分割下游任务中的性能，无论是在线性评估还是完全微调设置中。这项工作说明了有效地使SSL技术适应医学成像任务特征的重要性。源代码将在以下网址公开：https://github.com/CAMMA-public/med-moco

摘要: Self-supervised learning (SSL) approaches have achieved great success when the amount of labeled data is limited. Within SSL, models learn robust feature representations by solving pretext tasks. One such pretext task is contrastive learning, which involves forming pairs of similar and dissimilar input samples, guiding the model to distinguish between them. In this work, we investigate the application of contrastive learning to the domain of medical image analysis. Our findings reveal that MoCo v2, a state-of-the-art contrastive learning method, encounters dimensional collapse when applied to medical images. This is attributed to the high degree of inter-image similarity shared between the medical images. To address this, we propose two key contributions: local feature learning and feature decorrelation. Local feature learning improves the ability of the model to focus on the local regions of the image, while feature decorrelation removes the linear dependence among the features. Our experimental findings demonstrate that our contributions significantly enhance the model’s performance in the downstream task of medical segmentation, both in the linear evaluation and full fine-tuning settings. This work illustrates the importance of effectively adapting SSL techniques to the characteristics of medical imaging tasks. The source code will be made publicly available at: https://github.com/CAMMA-public/med-moco

标题: Probabilistic 3D Multi-Object Cooperative Tracking for Autonomous Driving via Differentiable Multi-Sensor Kalman Filter

作者: Hsu-kuang Chiu, Chien-Yi Wang, Min-Hung Chen

PubTime: 2024-02-26

Downlink: http://arxiv.org/abs/2309.14655v2

Project: https://eddyhkchiu.github.io/dmstrack.github.io/|https://eddyhkchiu.github.io/dmstrack.github.io/|

GitHub: https://github.com/eddyhkchiu/DMSTrack/|https://github.com/eddyhkchiu/DMSTrack/|

中文摘要: 目前最先进的自动驾驶汽车主要依靠每个单独的传感器系统来执行感知任务。这种框架的可靠性可能会受到遮挡或传感器故障的限制。为了解决这个问题，最近的研究提出使用车对车（V2V）通信与他人共享感知信息。然而，大多数相关工作只关注协同检测，而协同跟踪是一个探索不足的研究领域。一些最近的数据集，如V2V4Real，提供了3D多对象协作跟踪基准。然而，他们提出的方法主要使用协作检测结果作为标准的基于单传感器卡尔曼滤波器的跟踪算法的输入。在他们的方法中，来自不同互联自动驾驶汽车（CAV）的不同传感器的测量不确定性可能无法正确估计，以利用基于卡尔曼滤波器的跟踪算法的理论最优性。本文提出了一种新的基于可微多传感器卡尔曼滤波器的自动驾驶三维多目标协同跟踪算法。我们的算法学习估计每个检测的测量不确定性，可以更好地利用基于卡尔曼滤波器的跟踪方法的理论特性。实验结果表明，与V2V4Real中最先进的方法相比，我们的算法将跟踪精度提高了17%，而通信成本仅为0.037倍。我们的代码和视频可在https：//github.com/eddyhkchiu/dmstrack/和https：//eddyhkchiu.github.io/dmstrack.github.io/。获得

摘要: Current state-of-the-art autonomous driving vehicles mainly rely on each individual sensor system to perform perception tasks. Such a framework’s reliability could be limited by occlusion or sensor failure. To address this issue, more recent research proposes using vehicle-to-vehicle (V2V) communication to share perception information with others. However, most relevant works focus only on cooperative detection and leave cooperative tracking an underexplored research field. A few recent datasets, such as V2V4Real, provide 3D multi-object cooperative tracking benchmarks. However, their proposed methods mainly use cooperative detection results as input to a standard single-sensor Kalman Filter-based tracking algorithm. In their approach, the measurement uncertainty of different sensors from different connected autonomous vehicles (CAVs) may not be properly estimated to utilize the theoretical optimality property of Kalman Filter-based tracking algorithms. In this paper, we propose a novel 3D multi-object cooperative tracking algorithm for autonomous driving via a differentiable multi-sensor Kalman Filter. Our algorithm learns to estimate measurement uncertainty for each detection that can better utilize the theoretical property of Kalman Filter-based tracking methods. The experiment results show that our algorithm improves the tracking accuracy by 17% with only 0.037x communication costs compared with the state-of-the-art method in V2V4Real. Our code and videos are available at https://github.com/eddyhkchiu/DMSTrack/ and https://eddyhkchiu.github.io/dmstrack.github.io/ .

标题: GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts

作者: Haoran Geng, Helin Xu, Chengyang Zhao

PubTime: 2023-06

Downlink: https://ieeexplore.ieee.org/document/10203924/

Journal: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

中文摘要: 多年来，研究人员一直致力于可概括的对象感知和操作，其中跨类别的可概括性是非常可取的，但探索不足。在这项工作中，我们建议通过可概括和可操作的部分（GAParts）来学习这种跨类别的技能。通过识别和定义9个GAPart类（盖子、把手等。）在27个对象类别中，我们构建了一个大规模的以零件为中心的交互式数据集GAPartNet，其中我们为1,166个对象上的8,489个零件实例提供了丰富的零件级注释（语义、姿势）。基于GAPartNet，我们研究了三个跨类别任务：零件分割、零件姿态估计和基于零件的对象操作。鉴于可见和不可见对象类别之间的显著领域差距，我们通过集成对抗性学习技术，从领域泛化的角度提出了一种鲁棒的3D分割方法。我们的方法远远优于所有现有的方法，无论是在可见还是不可见的类别上。此外，利用零件分割和姿态估计结果，我们利用GAPart姿态定义设计了基于零件的操纵启发式算法，该算法可以很好地推广到模拟器和真实世界中的未知对象类别。

摘要: For years, researchers have been devoted to generalizable object perception and manipulation, where cross-category generalizability is highly desired yet underexplored. In this work, we propose to learn such cross-category skills via Generalizable and Actionable Parts (GAParts). By identifying and defining 9 GAPart classes (lids, handles, etc.) in 27 object categories, we construct a large-scale part-centric interactive dataset, GAPartNet, where we provide rich, part-level annotations (semantics, poses) for 8,489 part instances on 1,166 objects. Based on GAPartNet, we investigate three cross-category tasks: part segmentation, part pose estimation, and partbased object manipulation. Given the significant domain gaps between seen and unseen object categories, we propose a robust 3D segmentation method from the perspective of domain generalization by integrating adversarial learning techniques. Our method outperforms all existing methods by a large margin, no matter on seen or unseen categories. Furthermore, with part segmentation and pose estimation results, we leverage the GAPart pose definition to design part-based manipulation heuristics that can generalize well to unseen object categories in both the simulator and the real world.

标题: Efficient Grasp Detection Network With Gaussian-Based Grasp Representation for Robotic Manipulation

作者: Hu Cao, Guang Chen, Zhijun Li

PubTime: 2023-06

Downlink: https://ieeexplore.ieee.org/document/9990918/

Journal: IEEE/ASME Transactions on Mechatronics

Project: http://www.w3.org/1998/Math/MathML|http://www.w3.org/1999/xlink|http://www.w3.org/1998/Math/MathML|http://www.w3.org/1999/xlink|

中文摘要: 深度学习方法在抓取检测领域取得了优异的成绩。然而，用于一般对象检测的基于深度学习的模型缺乏准确性和推理速度的适当平衡，导致在实时抓取任务中性能不佳。本工作提出了一种有效的抓取检测网络，该网络以n通道图像作为机器人抓取的输入。所提出的网络是一个轻量级的生成结构，用于在一个阶段进行抓取检测。具体来说，引入了基于高斯核的抓取表示来编码训练样本，体现了具有最高抓取置信度的最大中心。在瓶颈中插入感受野块以提高模型的特征辨别能力。此外，利用基于像素和通道的注意机制构建多维注意融合网络，通过抑制噪声特征和突出目标特征来融合有价值的语义信息。在康乃尔、贾卡德和扩展OCID grasp数据集上对所提出的方法进行了评估。实验结果表明，该方法具有良好的平衡精度和运行速度性能。该网络的运行速度为<inline-formula xmlns：mml=”http：//www.w3.org/1998/math/MathML”xmlns：xlink=”http：//www.w3.org/1999/xlink”> $KaTeX parse error: Undefined control sequence: \， at position 9: \text{6}\̲，̲\text{ms}$ ，在Cornell、Jacquard和extended OCID grasp数据集上实现了更好的性能，准确率分别为97.8%、95.6%和76.4%。随后，使用UR5机械臂在物理环境中获得了出色的抓取成功率。

摘要: Deep learning methods have achieved excellent results in the field of grasp detection. However, deep learning-based models for general object detection lack the proper balance of accuracy and inference speed, resulting in poor performance in real-time grasp tasks. This work proposes an efficient grasp detection network with n-channel images as inputs for robotic grasp. The proposed network is a lightweight generative structure for grasp detection in one stage. Specifically, a Gaussian kernel-based grasp representation is introduced to encode the training samples, embodying the maximum center that possesses the highest grasp confidence. A receptive field block is plugged into the bottleneck to improve the model’s feature discriminability. In addition, pixel-based and channel-based attention mechanisms are used to construct a multidimensional attention fusion network to fuse valuable semantic information, achieved by suppressing noisy features and highlighting object features. The proposed method is evaluated on the Cornell, Jacquard, and extended OCID grasp datasets. The experimental results show that our method achieves excellent balancing accuracy and running speed performance. The network gets a running speed of
$\text{6}\,\text{ms}$
, achieving better performance on the Cornell, Jacquard, and extended OCID grasp datasets with 97.8, 95.6, and 76.4% accuracy, respectively. Subsequently, an excellent grasp success rate in a physical environment is obtained using the UR5 robot arm.

标题: Impression Network for Video Object Detection

作者: Congrui Hetang

PubTime: 2023-05

Downlink: https://ieeexplore.ieee.org/document/10165600/

Journal: 2023 IEEE 3rd International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA)

中文摘要: 视频中的对象检测比静态图像中的对象检测更具挑战性。事实证明，天真地逐帧应用对象检测器可能是缓慢和不准确的——视觉线索可能会因散焦和运动模糊而减弱，导致相应帧的失败。多帧特征融合方法被证明在提高准确性方面是有效的，但是它们在很大程度上牺牲了速度。特征传播方法被证明在提高速度方面是有效的，但代价是精度较低。那么，有没有可能同时提高速度和性能呢？受人类如何利用印象从模糊帧中识别物体的启发，我们提出了一种体现自然高效特征聚合机制的印象网络。在我们的框架中，通过迭代吸收稀疏提取的帧特征来建立印象特征。印象特征在视频中一路传播，增强了低质量帧的特征。这种印象机制使得以最小的开销在稀疏关键帧之间执行远程多帧特征融合成为可能。我们证明了印象网络比ImageNet VID上的每帧检测基线精确得多，同时速度快3倍（20 fps）。

摘要: Object detection in videos is more challenging than in static images. It’s proved that naively applying object detector frame by frame can be slow and inaccurate - visual clues can get weakened by defocus and motion blur, causing failure on corresponding frames. Multi-frame feature fusion methods proved effective in improving the accuracy, but they largely sacrifice the speed. Feature propagation methods proved effective in improving the speed, at the cost of lower accuracy. So is it possible to improve speed and performance at once?Inspired by how human utilize impression to recognize objects from blurry frames, we propose Impression Network that embodies a natural and efficient feature aggregation mechanism. In our framework, an impression feature is established by iteratively absorbing sparsely extracted frame features. The impression feature is propagated all the way down the video, enhancing features of low-quality frames. This impression mechanism makes it possible to perform long-range multi-frame feature fusion among sparse keyframes with minimum overhead. We demonstrate that Impression Network is significantly more accurate than per-frame detection baseline on ImageNet VID, while being 3 times faster (20 fps).

专属领域论文订阅

关注{晓理紫|小李子}，每日更新论文，如感兴趣，请转发给有需要的同学，谢谢支持

如果你感觉对你有所帮助，请关注我，每日准时为你推送最新论文。

在这里插入图片描述

晓理紫

关注

18
点赞
踩
19

收藏

觉得还不错? 一键收藏
0
评论
[晓理紫]每日论文分享(有中文摘要，源码或项目地址)--强化学习

主要依赖于自我监督学习的机器人系统有可能减少学习控制策略所需的人工注释和工程工作量。与先前的机器人系统利用来自计算机视觉（CV）和自然语言处理（NLP）的自我监督技术的方式相同，我们的工作建立在先前的工作基础上，表明强化学习（RL）本身可以被视为自我监督的问题：学习在没有人类指定的奖励或标签的情况下达到任何目标。尽管看起来很有吸引力，但很少有（如果有的话）先前的工作证明了自我监督的RL方法如何实际部署在机器人系统上。
复制链接

扫一扫