[晓理紫]每日论文分享(有中文摘要，源码或项目地址)--强化学习等

最新推荐文章于 2024-09-20 16:06:58 发布

晓理紫

最新推荐文章于 2024-09-20 16:06:58 发布

阅读量1k

点赞数 21

分类专栏：最新论文和会议信息推送文章标签： wpf 人工智能深度学习

本文链接：https://blog.csdn.net/u011573853/article/details/136491608

版权

最新论文和会议信息推送专栏收录该内容

85 篇文章 8 订阅

订阅专栏

专属领域论文订阅

VX 关注{晓理紫|小李子}，每日更新论文，如感兴趣，请转发给有需要的同学，谢谢支持

如果你感觉对你有所帮助，请关注我，每日准时为你推送最新论文。

》》由于精力有限，今后就不在CSDN上更新最新相关论文信息，VX公号会持续更新，有需要的可以关注，谢谢支持。
在这里插入图片描述

分类:

大语言模型LLM
视觉模型VLM
扩散模型
视觉语言导航VLN
强化学习 RL
模仿学习 IL
机器人
开放词汇，检测分割

== RLHF ==

标题: Deep RL-based Volt-VAR Control and Attack Resiliency for DER-integrated Distribution Grids

作者: Kundan Kumar, Gelli Ravikumar

PubTime: 2024-02

Downlink: https://ieeexplore.ieee.org/document/10454163/

Journal: 2024 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT)

中文摘要: 将分布式能源（der）集成到电力系统中需要更先进的控制机制。用于电压——无功控制（VVC）的控制策略之一是管理电压和无功功率。随着电力系统复杂性的增加，需要开发一种利用深度强化学习（DRL）的自主鲁棒控制机制来提高电网性能并调整电压和无功功率设置。这些调整最大限度地减少了损耗，提高了电网的电压稳定性。在本文中，我们提出了一种新的方法来开发基于DRL的VVC框架和缓解技术，以防止针对DRL模型的训练控制策略的隐形白盒攻击。提出了在训练好的DRL上的缓解技术来控制智能电网上的电压违规，以增强电网的稳定性并最小化电压违规。我们提出的缓解技术为基于DRL的VVC提供了更好的控制策略，成功缓解了智能电网环境中100%的电压违规。结果表明，缓解技术增强了训练有素的DRL VVC代理的安全性和鲁棒性。

摘要: Integrating distributed energy resources (DERs) into a power system requires more advanced control mechanisms. One of the control strategies used for Volt-VAR control (VVC) is to manage voltage and reactive power. With the increase in the complexity of the power system, there is a need to develop an autonomous and robust control mechanism using deep reinforcement learning (DRL) to enhance grid performance and adjust voltage and reactive power settings. These adjustments minimize losses and enhance voltage stability in the grid. In this paper, we proposed a novel approach to develop a DRL-based VVC framework and mitigation techniques to protect against stealthy white-box attacks targeting the trained control policies of the DRL model. The mitigation technique on the trained DRL is proposed to control the voltage violations on the smart grid to enhance the stability of the grid and minimize voltage irregularities. Our proposed mitigation technique provided better control policies for DRL-based VVC, successfully mitigating 100 percent of voltage violations in the smart grid environment. The results show that the mitigation technique enhances the security and robustness of trained DRL VVC agents.

标题: Optimizing Forensic Investigation and Security Surveillance with Deep Reinforcement Learning Techniques

作者: T J Nandhini, K Thinakaran

PubTime: 2023-12

Downlink: https://ieeexplore.ieee.org/document/10452551/

Journal: 2023 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI)

中文摘要: 深度强化学习（DRL）已经成为提高高级取证分析和安全监控的准确性和效率的有用方法。这项研究给出了对DRL方法的使用进行全面评估的结果，提供了其性能指标、取证分析能力、安全监控功效和计算效率的全貌。我们的DRL模型表现令人钦佩，准确率为92%，精确度、召回率和F1分数指标证明了分类任务的强大能力。86%的平均交集（IOU）分数证明了它的空间意识。与以前的方法相比，我们的模型具有更高的检测率（95%），同时保持较低的假阳性率（3%）。通过使用DRL和经典技术的能力，混合策略产生了平衡的性能，检测率为92%。我们的DRL模型在各种监控场景中的检测率始终超过90%，包括室内、室外和夜间环境，同时保持较低的误报率。这证明了它在实际安全应用中的灵活性和可靠性。尽管DRL模型很复杂，但我们模型的训练周期是48小时，实时分析的推理时间只有每帧15毫秒。这些收益证明了我们的技术在动态安全监控系统中的实用性。

摘要: Deep Reinforcement Learning (DRL) has emerged as a useful method for improving both accuracy and efficiency in advanced forensic analysis and security surveillance. This research gives the findings of a thorough assessment of the use of DRL methodologies, providing a full picture of its performance metrics, forensic analysis capabilities, security surveillance efficacy, and computational efficiency. Our DRL model performed admirably, with an accuracy of 92% and precision, recall, and F1 score metrics demonstrating robust capability for classification tasks. The average Intersection over Union (IOU) score of 86% demonstrates its spatial awareness. When compared to previous approaches, our model had a much higher detection rate (95%) while keeping a low false positive rate (3%). The hybrid strategy produced a balanced performance, with a detection rate of 92%, by using the capabilities of both DRL and classical techniques. Our DRL model consistently outperformed 90% detection rates in various surveillance scenarios, including indoor, outdoor, and night-time environments, while retaining a low false alarm rate. This demonstrates its flexibility and dependability in real-world security applications. Despite the complexity of DRL models, our model’s training period was 48 hours, and the inference time for real-time analysis was only 15 milliseconds per frame. These gains demonstrate the utility of our technique for use in dynamic security surveillance systems.

标题: Using Forwards-Backwards Models to Approximate MDP Homomorphisms

作者: Augustine N. Mavor-Parker, Matthew J. Sargent, Christian Pehle

PubTime: 2024-03-02

Downlink: http://arxiv.org/abs/2209.06356v3

中文摘要: 强化学习代理必须通过试验和错误费力地学习什么样的状态——动作对集是值等价的——这通常需要大量的环境经验。已经提出了MDP同态，其将环境的MDP简化为抽象的MDP，从而实现更好的采样效率。因此，当可以先验地构造合适的同态时，通常通过利用从业者的环境对称性知识，已经实现了令人印象深刻的改进。我们提出了一种在离散动作空间中构造同态的新方法，该方法使用环境动力学的学习模型来推断哪些状态——动作对导致相同的状态——这可以将状态——动作空间的大小减小与原始动作空间的基数一样大的因子。在MinAtar中，我们报告了在低样本限制下，当对所有游戏和优化器进行平均时，比基于值的非策略基线提高了近4倍。

摘要: Reinforcement learning agents must painstakingly learn through trial and error what sets of state-action pairs are value equivalent – requiring an often prohibitively large amount of environment experience. MDP homomorphisms have been proposed that reduce the MDP of an environment to an abstract MDP, enabling better sample efficiency. Consequently, impressive improvements have been achieved when a suitable homomorphism can be constructed a priori – usually by exploiting a practitioner’s knowledge of environment symmetries. We propose a novel approach to constructing homomorphisms in discrete action spaces, which uses a learnt model of environment dynamics to infer which state-action pairs lead to the same state – which can reduce the size of the state-action space by a factor as large as the cardinality of the original action space. In MinAtar, we report an almost 4x improvement over a value-based off-policy baseline in the low sample limit, when averaging over all games and optimizers.

标题: Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey

作者: Hamza Kheddar, Mustapha Hemis, Yassine Himeur

PubTime: 2024-03-02

Downlink: http://arxiv.org/abs/2403.01255v1

中文摘要: 深度学习（DL）的最新进展对自动语音识别（ASR）提出了重大挑战。ASR依赖于大量的训练数据集，包括机密数据集，并且需要大量的计算和存储资源。启用自适应系统可以提高动态环境中的ASR性能。DL技术假设训练和测试数据来自同一个领域，这并不总是正确的。深度迁移学习（DTL）、联合学习（FL）和强化学习（RL）等高级DL技术解决了这些问题。DTL允许使用小而相关的数据集的高性能模型，FL支持在不拥有数据集的情况下对机密数据进行训练，RL优化动态环境中的决策，降低计算成本。这项调查对基于DTL、佛罗里达和RL的ASR框架进行了全面的回顾，旨在提供对最新发展的见解，并帮助研究人员和专业人员理解当前的挑战。此外，转换器是在提议的ASR框架中大量使用的高级DL技术，在本次调查中考虑了它们在输入ASR序列中捕获广泛依赖关系的能力。本文首先介绍了DTL、FL、RL和Transformers的背景，然后采用了一个精心设计的分类法来概述最新的方法。随后，进行批判性分析，以确定每个框架的优势和劣势。此外，还介绍了一项比较研究，以突出现有的挑战，为未来的研究机会铺平道路。

摘要: Recent advancements in deep learning (DL) have posed a significant challenge for automatic speech recognition (ASR). ASR relies on extensive training datasets, including confidential ones, and demands substantial computational and storage resources. Enabling adaptive systems improves ASR performance in dynamic environments. DL techniques assume training and testing data originate from the same domain, which is not always true. Advanced DL techniques like deep transfer learning (DTL), federated learning (FL), and reinforcement learning (RL) address these issues. DTL allows high-performance models using small yet related datasets, FL enables training on confidential data without dataset possession, and RL optimizes decision-making in dynamic environments, reducing computation costs. This survey offers a comprehensive review of DTL, FL, and RL-based ASR frameworks, aiming to provide insights into the latest developments and aid researchers and professionals in understanding the current challenges. Additionally, transformers, which are advanced DL techniques heavily used in proposed ASR frameworks, are considered in this survey for their ability to capture extensive dependencies in the input ASR sequence. The paper starts by presenting the background of DTL, FL, RL, and Transformers and then adopts a well-designed taxonomy to outline the state-of-the-art approaches. Subsequently, a critical analysis is conducted to identify the strengths and weaknesses of each framework. Additionally, a comparative study is presented to highlight the existing challenges, paving the way for future research opportunities.

标题: On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics

作者: Michal Nauman, Marek Cygan

PubTime: 2024-03-02

Downlink: http://arxiv.org/abs/2310.19527v2

中文摘要: SAC和TD3等风险感知强化学习（RL）算法在各种连续行动任务中的表现优于风险中性算法。然而，这些算法采用的悲观目标的理论基础仍未建立，这引发了关于它们正在实施的特定政策类别的问题。在这项工作中，我们应用预期效用假设，经济学中的一个基本概念，来说明风险中性和风险意识的RL目标都可以通过使用指数效用函数的预期效用最大化来解释。这种方法揭示了风险意识政策有效地最大化了价值确定性当量，使它们与传统的决策理论原则保持一致。此外，我们提出双重演员——评论家（DAC）。DAC是一种风险感知、无模型的算法，具有两个不同的参与者网络：用于时差学习的悲观参与者和用于探索的乐观参与者。我们对DAC在各种运动和操作任务中的评估表明，样品效率和最终性能都有所提高。值得注意的是，DAC虽然需要明显更少的计算资源，但在复杂的狗和人形领域中，它的性能与领先的基于模型的方法相当。

摘要: Risk-aware Reinforcement Learning (RL) algorithms like SAC and TD3 were shown empirically to outperform their risk-neutral counterparts in a variety of continuous-action tasks. However, the theoretical basis for the pessimistic objectives these algorithms employ remains unestablished, raising questions about the specific class of policies they are implementing. In this work, we apply the expected utility hypothesis, a fundamental concept in economics, to illustrate that both risk-neutral and risk-aware RL goals can be interpreted through expected utility maximization using an exponential utility function. This approach reveals that risk-aware policies effectively maximize value certainty equivalent, aligning them with conventional decision theory principles. Furthermore, we propose Dual Actor-Critic (DAC). DAC is a risk-aware, model-free algorithm that features two distinct actor networks: a pessimistic actor for temporal-difference learning and an optimistic actor for exploration. Our evaluations of DAC across various locomotion and manipulation tasks demonstrate improvements in sample efficiency and final performance. Remarkably, DAC, while requiring significantly less computational resources, matches the performance of leading model-based methods in the complex dog and humanoid domains.

标题: Assisted Learning for Organizations with Limited Imbalanced Data

作者: Cheng Chen, Jiaying Zhou, Jie Ding

PubTime: 2024-03-02

Downlink: http://arxiv.org/abs/2109.09307v4

中文摘要: 在大数据时代，许多大组织正在将机器学习集成到他们的工作管道中，以促进数据分析。然而，他们训练的模型的性能经常受到他们可用的有限和不平衡的数据的限制。在这项工作中，我们开发了一个辅助学习框架，以帮助组织提高他们的学习绩效。这些组织有足够的计算资源，但必须遵守严格的数据共享和协作政策。他们有限的不平衡数据经常导致有偏见的推理和次优决策。在辅助学习中，组织学习者从外部服务提供商处购买辅助服务，并旨在仅在几轮辅助中提高其模型性能。我们为辅助深度学习和辅助强化学习开发了有效的随机训练算法。不同于现有的需要频繁传输梯度或模型的分布式算法，我们的框架允许学习者只是偶尔与服务提供商共享信息，但仍然获得一个接近oracle性能的模型，就像所有数据都集中在一起一样。

摘要: In the era of big data, many big organizations are integrating machine learning into their work pipelines to facilitate data analysis. However, the performance of their trained models is often restricted by limited and imbalanced data available to them. In this work, we develop an assisted learning framework for assisting organizations to improve their learning performance. The organizations have sufficient computation resources but are subject to stringent data-sharing and collaboration policies. Their limited imbalanced data often cause biased inference and sub-optimal decision-making. In assisted learning, an organizational learner purchases assistance service from an external service provider and aims to enhance its model performance within only a few assistance rounds. We develop effective stochastic training algorithms for both assisted deep learning and assisted reinforcement learning. Different from existing distributed algorithms that need to frequently transmit gradients or models, our framework allows the learner to only occasionally share information with the service provider, but still obtain a model that achieves near-oracle performance as if all the data were centralized.

== Imitation Learning ==

标题: CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation

作者: Jun Wang, Yuzhe Qin, Kaiming Kuang

PubTime: 2024-03-01

Downlink: http://arxiv.org/abs/2402.14795v2

Project: https://cyber-demo.github.io|

中文摘要: 我们介绍CyberDemo，这是一种新颖的机器人模仿学习方法，它利用模拟的人类演示来完成真实世界的任务。通过在模拟环境中整合广泛的数据增强，CyberDemo在转移到真实世界时优于传统的域内真实世界演示，处理各种物理和视觉条件。不考虑其在数据收集方面的可负担性和便利性，CyberDemo在各种任务的成功率方面优于基线方法，并对以前看不见的对象表现出普遍性。例如，它可以旋转新型四瓣膜和五瓣膜，尽管人类演示只涉及三瓣膜。我们的研究证明了模拟人类演示在现实世界灵巧操作任务中的巨大潜力。更多详情可在https：//cyber-demo.github.io

摘要: We introduce CyberDemo, a novel approach to robotic imitation learning that leverages simulated human demonstrations for real-world tasks. By incorporating extensive data augmentation in a simulated environment, CyberDemo outperforms traditional in-domain real-world demonstrations when transferred to the real world, handling diverse physical and visual conditions. Regardless of its affordability and convenience in data collection, CyberDemo outperforms baseline methods in terms of success rates across various tasks and exhibits generalizability with previously unseen objects. For example, it can rotate novel tetra-valve and penta-valve, despite human demonstrations only involving tri-valves. Our research demonstrates the significant potential of simulated human demonstrations for real-world dexterous manipulation tasks. More details can be found at https://cyber-demo.github.io

标题: Imitation Learning Datasets: A Toolkit For Creating Datasets, Training Agents and Benchmarking

作者: Nathan Gavenski, Michael Luck, Odinaldo Rodrigues

PubTime: 2024-03-01

Downlink: http://arxiv.org/abs/2403.00550v1

Project: https://nathangavenski.github.io/|

中文摘要: 模仿学习领域需要专家数据来训练一项任务中的代理。最常见的情况是，这种学习方法缺乏可用数据，这导致技术在其数据集上进行测试。创建数据集是一个繁琐的过程，需要研究人员从头开始训练专家代理，记录他们的交互，并用新创建的数据测试每个基准方法。此外，为每种新技术创建新的数据集会导致评估过程缺乏一致性，因为每个数据集的状态和动作分布可能会有很大差异。作为回应，这项工作旨在通过创建模仿学习数据集来解决这些问题，这是一个工具包，允许：（i）策划专家政策，多线程支持更快的数据集创建；㈡具有精确测量的现成数据集和技术；以及（iii）共享通用模仿学习技术的实现。演示链接：https：//nathangavenski.github.io/#/il-datasets-video

摘要: Imitation learning field requires expert data to train agents in a task. Most often, this learning approach suffers from the absence of available data, which results in techniques being tested on its dataset. Creating datasets is a cumbersome process requiring researchers to train expert agents from scratch, record their interactions and test each benchmark method with newly created data. Moreover, creating new datasets for each new technique results in a lack of consistency in the evaluation process since each dataset can drastically vary in state and action distribution. In response, this work aims to address these issues by creating Imitation Learning Datasets, a toolkit that allows for: (i) curated expert policies with multithreaded support for faster dataset creation; (ii) readily available datasets and techniques with precise measurements; and (iii) sharing implementations of common imitation learning techniques. Demonstration link: https://nathangavenski.github.io/#/il-datasets-video

标题: Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent

作者: Xiaoyan Yu, Tongxu Luo, Yifan Wei

PubTime: 2024-03-01

Downlink: http://arxiv.org/abs/2402.13717v2

GitHub: https://github.com/weiyifan1023/Neeko|

中文摘要: 大型语言模型（LLMs）彻底改变了开放域对话代理，但在多角色角色扮演（MCRP）场景中遇到了挑战。为了解决这个问题，我们提出了Neeko，一个为有效的多字符模仿而设计的创新框架。与现有的方法不同，Neeko采用了动态低秩适配器（LoRA）策略，使其能够无缝地适应不同的字符。我们的框架将角色扮演过程分为代理预训练、多角色扮演和角色增量学习，有效地处理可见和不可见的角色。这种动态的方法，加上每个角色独特的LoRA块，增强了西烟子对独特属性、个性和说话模式的适应性。因此，Neeko在MCRP展示了优于大多数现有方法的性能，提供了更具吸引力和多功能的用户交互体验。代码和数据见https://github.com/weiyifan1023/Neeko。

摘要: Large Language Models (LLMs) have revolutionized open-domain dialogue agents but encounter challenges in multi-character role-playing (MCRP) scenarios. To address the issue, we present Neeko, an innovative framework designed for efficient multiple characters imitation. Unlike existing methods, Neeko employs a dynamic low-rank adapter (LoRA) strategy, enabling it to adapt seamlessly to diverse characters. Our framework breaks down the role-playing process into agent pre-training, multiple characters playing, and character incremental learning, effectively handling both seen and unseen roles. This dynamic approach, coupled with distinct LoRA blocks for each character, enhances Neeko’s adaptability to unique attributes, personalities, and speaking patterns. As a result, Neeko demonstrates superior performance in MCRP over most existing methods, offering more engaging and versatile user interaction experiences. Code and data are available at https://github.com/weiyifan1023/Neeko.

标题: PRIME: Scaffolding Manipulation Tasks with Behavior Primitives for Data-Efficient Imitation Learning

作者: Tian Gao, Soroush Nasiriany, Huihan Liu

PubTime: 2024-03-01

Downlink: http://arxiv.org/abs/2403.00929v1

中文摘要: 模仿学习已经显示出使机器人获得复杂操纵行为的巨大潜力。然而，这些算法在长期任务中具有高样本复杂性，其中复合误差在任务范围内积累。我们提出了PRIME（基于原语的数据效率模仿），这是一个基于行为原语的框架，旨在提高模仿学习的数据效率。通过将任务演示分解为原始序列来启动脚手架机器人任务，然后通过模仿学习学习高级控制策略来对原始序列进行排序。我们的实验表明，PRIME在多阶段操作任务中实现了显著的性能改进，在最先进的基线上模拟成功率提高了10-34%，在物理硬件上提高了20-48%

摘要: Imitation learning has shown great potential for enabling robots to acquire complex manipulation behaviors. However, these algorithms suffer from high sample complexity in long-horizon tasks, where compounding errors accumulate over the task horizons. We present PRIME (PRimitive-based IMitation with data Efficiency), a behavior primitive-based framework designed for improving the data efficiency of imitation learning. PRIME scaffolds robot tasks by decomposing task demonstrations into primitive sequences, followed by learning a high-level control policy to sequence primitives through imitation learning. Our experiments demonstrate that PRIME achieves a significant performance improvement in multi-stage manipulation tasks, with 10-34% higher success rates in simulation over state-of-the-art baselines and 20-48% on physical hardware.

== Embodied Artificial Intelligence@robotic agent@human robot interaction ==

标题: ROS-Causal: A ROS-based Causal Analysis Framework for Human-Robot Interaction Applications

作者: Luca Castri, Gloria Beraldo, Sariah Mghames

PubTime: 2024-02-29

Downlink: http://arxiv.org/abs/2402.16068v2

GitHub: https://github.com/lcastri/roscausal.git|

中文摘要: 在人类共享空间部署机器人需要了解附近代理和对象之间的交互。通过因果推理模拟因果关系有助于预测人类行为和机器人干预。然而，一个关键的挑战出现了，因为现有的因果发现方法目前缺乏在机器人学事实上的标准ROS生态系统内的实施，阻碍了机器人学的有效利用。为了解决这一差距，本文引入了ROS-Causal，这是一个基于ROS的框架，用于人机空间交互中的机载数据收集和因果发现。一个与ROS集成的特设模拟器说明了该方法的有效性，展示了机器人在数据收集期间板载因果模型的生成。ROS-Causal可在GitHub上获得：https：//github.com/lcastri/roscausal.git。

摘要: Deploying robots in human-shared spaces requires understanding interactions among nearby agents and objects. Modelling cause-and-effect relations through causal inference aids in predicting human behaviours and anticipating robot interventions. However, a critical challenge arises as existing causal discovery methods currently lack an implementation inside the ROS ecosystem, the standard de facto in robotics, hindering effective utilisation in robotics. To address this gap, this paper introduces ROS-Causal, a ROS-based framework for onboard data collection and causal discovery in human-robot spatial interactions. An ad-hoc simulator, integrated with ROS, illustrates the approach’s effectiveness, showcasing the robot onboard generation of causal models during data collection. ROS-Causal is available on GitHub: https://github.com/lcastri/roscausal.git.

标题: Towards a Common Understanding and Vision for Theory-Grounded Human-Robot Interaction (THEORIA)

作者: Glenda Hannibal, Nicholas Rabb, Theresa Law

PubTime: 2022-03

Downlink: https://ieeexplore.ieee.org/document/9889422/

Journal: 2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI)

中文摘要: 虽然实践知识的积累为研究人员提供了许多关于成功的人机交互（HRI）的见解，但仍然缺乏关于理论知识作用的更广泛讨论。这是不幸的，因为当渴望将这一研究领域发展成为一门成熟的科学时，明确地将HRI中的理论和理论化视为重要的贡献也是重要的。通过我们提议的为期半天的互动研讨会，我们旨在为参与者提供一个充满活力的环境，讨论HRI理论知识的“什么、为什么和如何”，因为他们分享和学习彼此的经验和能力。从长远来看，本次研讨会的成果将为一个支持性的研究社区奠定基础，鼓励研究人员对基于理论的HRI工作进行进一步思考和合作。

摘要: While the accumulation of practical knowledge provided researchers with much insight into successful human-robot interaction (HRI), a broader discussion about the role of theoretical knowledge is still lacking. It is unfortunate because it is also important to explicitly consider theory and theorizing in HRI as crucial contributions when aspiring to develop this field of research into a mature science. With our proposed interactive half-day workshop, we aim to provide a vibrant setting for the participants to discuss the “what, why, and how” of theoretical knowledge in HRI, as they share and learn from each other’s experiences and competence. In the long-term perspective, the outcome of this workshop will lay the foundation for a supportive research community that encourages researchers to reflect and collaborate further on theory-grounded HRI work.

标题: Automated Continuous Force-Torque Sensor Bias Estimation

作者: Philippe Nadeau, Miguel Rogel Garcia, Emmett Wise

PubTime: 2024-03-02

Downlink: http://arxiv.org/abs/2403.01068v1

中文摘要: 六轴力——扭矩传感器通常连接到串行机器人的手腕上，以测量作用在机器人末端执行器上的外力和扭矩。这些测量用于负载识别、接触检测和人机交互等应用。通常，从力——扭矩传感器获得的测量值比从关节扭矩读数计算的估计值更精确，因为前者独立于机器人的动态和运动学模型。然而，力——扭矩传感器测量受到随时间漂移的偏差的影响，该偏差是由温度变化、机械应力和其他因素的复合效应引起的。在这项工作中，我们提出了一种流水线，可以连续估计连接到机器人手腕上的力——扭矩传感器的偏差和偏差漂移。管道的第一个组件是卡尔曼滤波器，用于估计机器人关节的运动状态（位置、速度和加速度）。第二个组成部分是运动学模型，它将关节空间运动学映射到力——扭矩传感器的任务空间运动学。最后，第三个组件是Kalman滤波器，它估计力——扭矩传感器的偏差和偏差的漂移，假设连接到力——扭矩传感器远端的夹持器的惯性参数是确定已知的。

摘要: Six axis force-torque sensors are commonly attached to the wrist of serial robots to measure the external forces and torques acting on the robot’s end-effector. These measurements are used for load identification, contact detection, and human-robot interaction amongst other applications. Typically, the measurements obtained from the force-torque sensor are more accurate than estimates computed from joint torque readings, as the former is independent of the robot’s dynamic and kinematic models. However, the force-torque sensor measurements are affected by a bias that drifts over time, caused by the compounding effects of temperature changes, mechanical stresses, and other factors. In this work, we present a pipeline that continuously estimates the bias and the drift of the bias of a force-torque sensor attached to the wrist of a robot. The first component of the pipeline is a Kalman filter that estimates the kinematic state (position, velocity, and acceleration) of the robot’s joints. The second component is a kinematic model that maps the joint-space kinematics to the task-space kinematics of the force-torque sensor. Finally, the third component is a Kalman filter that estimates the bias and the drift of the bias of the force-torque sensor assuming that the inertial parameters of the gripper attached to the distal end of the force-torque sensor are known with certainty.

标题: Composite Distributed Learning and Synchronization of Nonlinear Multi-Agent Systems with Complete Uncertain Dynamics

作者: Emadodin Jandaghi, Dalton L. Stein, Adam Hoburg

PubTime: 2024-03-01

Downlink: http://arxiv.org/abs/2403.00987v1

中文摘要: 本文讨论了在领导——跟随框架内，在异构非线性不确定性下运行的多智能体机器人机械手系统网络中的复合同步和学习控制的挑战性问题。提出了一种新的两层分布式自适应学习控制策略，包括第一层分布式协作估计器和第二层分散确定性学习控制器。第一层的主要目标是方便每个机器人代理对领导者信息的估计。第二层负责使单个机器人代理能够跟踪期望的参考轨迹，并准确识别和学习它们的非线性不确定动力学。所提出的分布式学习控制方案代表了现有文献中的一个进步，因为它能够管理具有完全不确定动态（包括不确定质量矩阵）的机器人代理。该框架允许机器人控制与环境无关，可用于从水下到空间的各种环境中，在这些环境中识别系统动力学参数具有挑战性。利用Lyapunov方法对闭环系统的稳定性和参数收敛性进行了严格的分析。对多智能体机器人进行的数值仿真验证了该方案的有效性。当系统重新启动时，可以保存并重用已识别的非线性动力学。

摘要: This paper addresses the challenging problem of composite synchronization and learning control in a network of multi-agent robotic manipulator systems operating under heterogeneous nonlinear uncertainties within a leader-follower framework. A novel two-layer distributed adaptive learning control strategy is introduced, comprising a first-layer distributed cooperative estimator and a second-layer decentralized deterministic learning controller. The primary objective of the first layer is to facilitate each robotic agent’s estimation of the leader’s information. The second layer is responsible for both enabling individual robot agents to track desired reference trajectories and accurately identifying and learning their nonlinear uncertain dynamics. The proposed distributed learning control scheme represents an advancement in the existing literature due to its ability to manage robotic agents with completely uncertain dynamics including uncertain mass matrices. This framework allows the robotic control to be environment-independent which can be used in various settings, from underwater to space where identifying system dynamics parameters is challenging. The stability and parameter convergence of the closed-loop system are rigorously analyzed using the Lyapunov method. Numerical simulations conducted on multi-agent robot manipulators validate the effectiveness of the proposed scheme. The identified nonlinear dynamics can be saved and reused whenever the system restarts.

== Object Detection@ Segmentation@Open vocabulary detection@SAM ==

标题: TUMTraf V2X Cooperative Perception Dataset

作者: Walter Zimmer, Gerhard Arya Wardana, Suren Sritharan

PubTime: 2024-03-02

Downlink: http://arxiv.org/abs/2403.01316v1

Project: https://tum-traffic-dataset.github.io/tumtraf-v2x|

中文摘要: 合作感知为增强自动驾驶汽车的能力和改善道路安全提供了几个好处。除了车载传感器之外，使用路边传感器增加了可靠性并扩展了传感器范围。外部传感器为自动车辆提供了更高的态势感知能力，并防止遮挡。我们提出了CoopDet3D，一个合作的多模态融合模型，和TUMTraf-V2X，一个感知数据集，用于合作的3D对象检测和跟踪任务。我们的数据集包含来自五个路边传感器和四个车载传感器的2,000个标记点云和5,000个标记图像。它包括30k个3D盒子，带有轨迹id和精确的GPS和IMU数据。我们标记了八个类别，并涵盖了具有挑战性的驾驶操作的闭塞场景，如交通违规、未遂事件、超车和掉头。通过多次实验，我们表明，与车载相机——激光雷达融合模型相比，我们的CoopDet3D相机——激光雷达融合模型实现了+14.36的3D地图增加。最后，我们在我们的网站上公开我们的数据集、模型、标记工具和开发工具包：https：//tum-traffic-dataset.github.io/tumtraf-v2x。

摘要: Cooperative perception offers several benefits for enhancing the capabilities of autonomous vehicles and improving road safety. Using roadside sensors in addition to onboard sensors increases reliability and extends the sensor range. External sensors offer higher situational awareness for automated vehicles and prevent occlusions. We propose CoopDet3D, a cooperative multi-modal fusion model, and TUMTraf-V2X, a perception dataset, for the cooperative 3D object detection and tracking task. Our dataset contains 2,000 labeled point clouds and 5,000 labeled images from five roadside and four onboard sensors. It includes 30k 3D boxes with track IDs and precise GPS and IMU data. We labeled eight categories and covered occlusion scenarios with challenging driving maneuvers, like traffic violations, near-miss events, overtaking, and U-turns. Through multiple experiments, we show that our CoopDet3D camera-LiDAR fusion model achieves an increase of +14.36 3D mAP compared to a vehicle camera-LiDAR fusion model. Finally, we make our dataset, model, labeling tool, and dev-kit publicly available on our website: https://tum-traffic-dataset.github.io/tumtraf-v2x.

标题: Benchmarking Segmentation Models with Mask-Preserved Attribute Editing

作者: Zijin Yin, Kongming Liang, Bing Li

PubTime: 2024-03-02

Downlink: http://arxiv.org/abs/2403.01231v1

GitHub: https://github.com/PRIS-CV/Pascal-EA|

中文摘要: 在实践中部署分割模型时，评估它们在各种复杂场景中的行为至关重要。与以前仅考虑全局属性变化（如不利天气）的评估范式不同，我们研究了局部和全局属性变化以进行鲁棒性评估。为了实现这一点，我们构建了一个保留掩模的属性编辑流水线，通过精确控制结构信息来编辑真实图像的视觉属性。因此，原始分割标签可以被重新用于编辑的图像。使用我们的管道，我们构建了一个涵盖对象和图像属性（例如，颜色、材料、图案、风格）的基准。我们评估了各种各样的语义分割模型，从传统的闭集模型到最近的开放词汇大模型，评估了它们对不同类型变体的鲁棒性。我们发现局部和全局属性变化都会影响分割性能，并且模型的敏感性在不同的变化类型之间存在差异。我们认为局部属性与全局属性具有相同的重要性，应该在分割模型的鲁棒性评估中加以考虑。代码：https：//github.com/PRIS-CV/Pascal-EA。

摘要: When deploying segmentation models in practice, it is critical to evaluate their behaviors in varied and complex scenes. Different from the previous evaluation paradigms only in consideration of global attribute variations (e.g. adverse weather), we investigate both local and global attribute variations for robustness evaluation. To achieve this, we construct a mask-preserved attribute editing pipeline to edit visual attributes of real images with precise control of structural information. Therefore, the original segmentation labels can be reused for the edited images. Using our pipeline, we construct a benchmark covering both object and image attributes (e.g. color, material, pattern, style). We evaluate a broad variety of semantic segmentation models, spanning from conventional close-set models to recent open-vocabulary large models on their robustness to different types of variations. We find that both local and global attribute variations affect segmentation performances, and the sensitivity of models diverges across different variation types. We argue that local attributes have the same importance as global attributes, and should be considered in the robustness evaluation of segmentation models. Code: https://github.com/PRIS-CV/Pascal-EA.

标题: OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics

作者: Peiqi Liu, Yaswanth Orru, Jay Vakil

PubTime: 2024-02-29

Downlink: http://arxiv.org/abs/2401.12202v2

Project: https://ok-robot.github.io|

GitHub: https://github.com/ok-robot/ok-robot|

中文摘要: 近年来在视觉、语言和机器人领域取得了显著进展。我们现在有了能够基于语言查询识别物体的视觉模型，可以有效控制移动系统的导航系统，以及可以处理各种物体的抓取模型。尽管取得了这些进步，机器人的通用应用仍然落后，尽管它们依赖于识别、导航和抓取这些基本能力。在本文中，我们采用系统优先的方法来开发一个新的开放的基于知识的机器人框架，称为OK-Robot。通过结合用于对象检测的视觉语言模型（VLMs）、用于运动的导航图元和用于对象操作的抓取图元，OK-Robot提供了一个无需任何训练即可进行拖放操作的集成解决方案。为了评估它的性能，我们在10个真实的家庭环境中运行OK-Robot。结果表明，OK-Robot在开放式拾取和放下任务中实现了58.5%的成功率，代表了开放词汇移动操作（OVMM）的新水平，性能是先前工作的近1.8倍。在更干净、整洁的环境中，OK-Robot的性能提高到82%。然而，从OK-Robot获得的最重要的见解是，当将VLMs等开放知识系统与机器人模块相结合时，细微细节的关键作用。我们的实验和代码的视频可以在我们的网站上找到：https：//ok-robot.github.io

摘要: Remarkable progress has been made in recent years in the fields of vision, language, and robotics. We now have vision models capable of recognizing objects based on language queries, navigation systems that can effectively control mobile systems, and grasping models that can handle a wide range of objects. Despite these advancements, general-purpose applications of robotics still lag behind, even though they rely on these fundamental capabilities of recognition, navigation, and grasping. In this paper, we adopt a systems-first approach to develop a new Open Knowledge-based robotics framework called OK-Robot. By combining Vision-Language Models (VLMs) for object detection, navigation primitives for movement, and grasping primitives for object manipulation, OK-Robot offers a integrated solution for pick-and-drop operations without requiring any training. To evaluate its performance, we run OK-Robot in 10 real-world home environments. The results demonstrate that OK-Robot achieves a 58.5% success rate in open-ended pick-and-drop tasks, representing a new state-of-the-art in Open Vocabulary Mobile Manipulation (OVMM) with nearly 1.8x the performance of prior work. On cleaner, uncluttered environments, OK-Robot’s performance increases to 82%. However, the most important insight gained from OK-Robot is the critical role of nuanced details when combining Open Knowledge systems like VLMs with robotic modules. Videos of our experiments and code are available on our website: https://ok-robot.github.io

标题: Customer Relationship Management Segment Analysis System

作者: A.R. Kavitha, S Sharon Roseline, S Mispha

PubTime: 2023-12

Downlink: https://ieeexplore.ieee.org/document/10452525/

Journal: 2023 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI)

中文摘要: 客户关系管理（CRM）需要整合个人、程序和技术，目的是理解公司的客户。CRM是客户关系管理的首字母缩写，植根于悠久的原则。目前，人们普遍认识到，理解和对待客户的方式会极大地影响公司未来的成功和盈利能力。因此，企业越来越多地分配大量投资来增强他们的客户管理策略。CRM体现了获取客户、培养客户忠诚度、提高客户盈利能力、行为和满意度的综合战略。拟议的系统包含许多活动，旨在通过无缝数据库集成实现整个过程的自动化。这确保了用户友好的界面，以便与所提供的应用程序进行有效的交互。K-means聚类用于数据集，该数据集禁止对不同类别的客户进行细分，并向具有细分的客户列表的用户提供具有相同CSV数据的输出。

摘要: Customer relationship management (CRM) entails the integration of individuals, procedures, and technology with the aim of comprehending the clientele of a company. CRM, an acronym for customer relationship management, is rooted in long-standing principles. Presently, there is widespread recognition that the way in which customers are comprehended and treated significantly influences a company’s future success and profitability. Consequently, businesses are increasingly allocating substantial investments to enhance their customer management strategies. CRM embodies a comprehensive strategy for acquiring customers, fostering their loyalty, and improving customer profitability, behavior and satisfaction. The proposed system incorporates numerous activities aimed at automating the entire process through seamless database integration. This ensures a user-friendly interface for effective interaction with the provided application. K-means clustering is used for the datasets that inhibit customer segmentation on different categories and provide the output with the same CSV data to the user with the segmented list of customers.

标题: Enhancing Retinal Vascular Structure Segmentation in Images With a Novel Design Two-Path Interactive Fusion Module Model

作者: Rui Yang, Shunpu Zhang

PubTime: 2024-03-03

Downlink: http://arxiv.org/abs/2403.01362v1

中文摘要: 识别和区分视网膜中微血管和大血管的精确度对于视网膜疾病的诊断至关重要，尽管这是一个重大挑战。当前基于自动编码的分割方法遇到限制，因为它们受到编码器的约束，并且在编码阶段经历分辨率降低。无法在解码阶段恢复丢失的信息进一步阻碍了这些方法。因此，它们提取视网膜微血管结构的能力受到限制。为了解决这个问题，我们引入了Swin-Res-Net，这是一个专门的模块，旨在提高视网膜血管分割的精度。Swin-Res-Net利用Swin Transformer model，该方法使用带位移的移位窗口进行划分，以降低网络复杂性并加速模型收敛。此外，该模型将交互式融合与Res2Net架构中的功能模块结合在一起。Res2Net利用多尺度技术来扩大卷积核的感受野，从而能够从图像中提取额外的语义信息。这种组合创造了一种新的模块，增强了视网膜中微血管的定位和分离。为了提高处理血管信息的效率，我们增加了一个模块来消除编码和解码步骤之间的冗余信息。我们提出的架构产生了出色的结果，达到或超过了其他已发表的模型。AUC反映了显著的增强，在三个广泛使用的数据集（CHASE-DB1、DRIVE和STARE）的视网膜血管的像素级分割中分别实现了0.9956、0.9931和0.9946的值。此外，Swin-Res-Net的性能优于其他架构，在IOU和F1度量指标方面都表现出了卓越的性能。

摘要: Precision in identifying and differentiating micro and macro blood vessels in the retina is crucial for the diagnosis of retinal diseases, although it poses a significant challenge. Current autoencoding-based segmentation approaches encounter limitations as they are constrained by the encoder and undergo a reduction in resolution during the encoding stage. The inability to recover lost information in the decoding phase further impedes these approaches. Consequently, their capacity to extract the retinal microvascular structure is restricted. To address this issue, we introduce Swin-Res-Net, a specialized module designed to enhance the precision of retinal vessel segmentation. Swin-Res-Net utilizes the Swin transformer which uses shifted windows with displacement for partitioning, to reduce network complexity and accelerate model convergence. Additionally, the model incorporates interactive fusion with a functional module in the Res2Net architecture. The Res2Net leverages multi-scale techniques to enlarge the receptive field of the convolutional kernel, enabling the extraction of additional semantic information from the image. This combination creates a new module that enhances the localization and separation of micro vessels in the retina. To improve the efficiency of processing vascular information, we’ve added a module to eliminate redundant information between the encoding and decoding steps. Our proposed architecture produces outstanding results, either meeting or surpassing those of other published models. The AUC reflects significant enhancements, achieving values of 0.9956, 0.9931, and 0.9946 in pixel-wise segmentation of retinal vessels across three widely utilized datasets: CHASE-DB1, DRIVE, and STARE, respectively. Moreover, Swin-Res-Net outperforms alternative architectures, demonstrating superior performance in both IOU and F1 measure metrics.

标题: LiDARFormer: A Unified Transformer-based Multi-task Network for LiDAR Perception

作者: Zixiang Zhou, Dongqiangzi Ye, Weijia Chen

PubTime: 2024-03-02

Downlink: http://arxiv.org/abs/2303.12194v2

摘要: There is a recent trend in the LiDAR perception field towards unifying multiple tasks in a single strong network with improved performance, as opposed to using separate networks for each task. In this paper, we introduce a new LiDAR multi-task learning paradigm based on the transformer. The proposed LiDARFormer utilizes cross-space global contextual feature information and exploits cross-task synergy to boost the performance of LiDAR perception tasks across multiple large-scale datasets and benchmarks. Our novel transformer-based framework includes a cross-space transformer module that learns attentive features between the 2D dense Bird’s Eye View (BEV) and 3D sparse voxel feature maps. Additionally, we propose a transformer decoder for the segmentation task to dynamically adjust the learned features by leveraging the categorical feature representations. Furthermore, we combine the segmentation and detection features in a shared transformer decoder with cross-task attention layers to enhance and integrate the object-level and class-level features. LiDARFormer is evaluated on the large-scale nuScenes and the Waymo Open datasets for both 3D detection and semantic segmentation tasks, and it outperforms all previously published methods on both tasks. Notably, LiDARFormer achieves the state-of-the-art performance of 76.4% L2 mAPH and 74.3% NDS on the challenging Waymo and nuScenes detection benchmarks for a single model LiDAR-only method.