[晓理紫]每日论文分享(有中文摘要，源码或项目地址)--机器人相关、强化学习

发布时间：2024年01月21日

专属领域论文订阅

VX 扫吗关注{晓理紫|小李子}，每日更新论文，如感兴趣，请转发给有需要的同学，谢谢支持

在这里插入图片描述

分类:

大语言模型LLM
视觉模型VLM
扩散模型
视觉导航
具身智能，机器人
强化学习
开放词汇，检测分割

[晓理紫]每日论文分享(有中文摘要，源码或项目地址)

== Embodied Artificial Intelligence@robotic agent@human robot interaction ==

标题: Augmented Reality User Interface for Command, Control, and Supervision of Large Multi-Agent Teams

作者: Frank Regal, Chris Suarez, Fabian Parra

中文摘要: 多智能体人——机器人团队通过利用和结合人类和机器人的优势，可以更有效地收集各种环境的信息。在国防、搜索和救援、急救等行业，异构人机团队有望通过将人类从未知和潜在危险的情况中移除来加速数据收集和提高团队安全性。这项工作建立在AugRE的基础上，AugRE是一个基于增强现实（AR）的可扩展人机团队框架。它使用户能够本地化并与50多个自主代理通信。通过我们的努力，用户能够指挥、控制和监督大型团队中的代理，无论是视距还是非视距，而无需事先修改环境，也无需用户使用典型的硬件（即操纵杆、键盘、笔记本电脑、平板电脑等）。）在外地。演示的工作表明，早期迹象表明，将这些基于AR-HMD的用户交互模式结合起来进行指挥、控制和监督，将有助于改善人机团队协作、健壮性和信任。

摘要: Multi-agent human-robot teaming allows for the potential to gather information about various environments more efficiently by exploiting and combining the strengths of humans and robots. In industries like defense, search and rescue, first-response, and others alike, heterogeneous human-robot teams show promise to accelerate data collection and improve team safety by removing humans from unknown and potentially hazardous situations. This work builds upon AugRE, an Augmented Reality (AR) based scalable human-robot teaming framework. It enables users to localize and communicate with 50+ autonomous agents. Through our efforts, users are able to command, control, and supervise agents in large teams, both line-of-sight and non-line-of-sight, without the need to modify the environment prior and without requiring users to use typical hardware (i.e. joysticks, keyboards, laptops, tablets, etc.) in the field. The demonstrated work shows early indications that combining these AR-HMD-based user interaction modalities for command, control, and supervision will help improve human-robot team collaboration, robustness, and trust.

[Downlink:]http://arxiv.org/abs/2401.05665v1

[Project:]https://sites.google.com/view/xr-robotics-iros2023/home?authuser=0|

标题: Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction

作者: Shaunak A. Mehta, Dylan P. Losey

中文摘要: 人类可以利用物理交互来教授机器人手臂。这种物理交互有多种形式，取决于任务、用户和机器人到目前为止学到的东西。最先进的方法专注于从单一模态中学习，或者通过假设机器人具有关于人类预期任务的先验信息来组合多种交互类型。相比之下，在本文中，我们介绍了一种算法形式主义，它将从演示、纠正和偏好中学习结合起来。我们的方法对人类想要教给机器人的任务没有任何假设；相反，我们通过将人类的输入与附近的替代方案进行比较，从头开始学习奖励模型。我们首先导出一个损失函数，它训练一组奖励模型来匹配人类的演示、纠正和偏好。反馈的类型和顺序由人类老师决定：我们让机器人被动或主动地收集反馈。然后，我们应用约束优化将我们学习到的奖励转换成期望的机器人轨迹。通过模拟和用户研究，我们证明了我们提出的方法比现有的基线更准确地从物理人类交互中学习操纵任务，特别是当机器人面临新的或意想不到的目标时。我们的用户研究视频可在以下网站获得：https：//youtu.be/FSUJsTYvEKU

摘要: Humans can leverage physical interaction to teach robot arms. This physical interaction takes multiple forms depending on the task, the user, and what the robot has learned so far. State-of-the-art approaches focus on learning from a single modality, or combine multiple interaction types by assuming that the robot has prior information about the human’s intended task. By contrast, in this paper we introduce an algorithmic formalism that unites learning from demonstrations, corrections, and preferences. Our approach makes no assumptions about the tasks the human wants to teach the robot; instead, we learn a reward model from scratch by comparing the human’s inputs to nearby alternatives. We first derive a loss function that trains an ensemble of reward models to match the human’s demonstrations, corrections, and preferences. The type and order of feedback is up to the human teacher: we enable the robot to collect this feedback passively or actively. We then apply constrained optimization to convert our learned reward into a desired robot trajectory. Through simulations and a user study we demonstrate that our proposed approach more accurately learns manipulation tasks from physical human interaction than existing baselines, particularly when the robot is faced with new or unexpected objectives. Videos of our user study are available at: https://youtu.be/FSUJsTYvEKU

[Downlink:]http://arxiv.org/abs/2207.03395v2

[Project:]https://youtu.be/FSUJsTYvEKU|

标题: StROL: Stabilized and Robust Online Learning from Humans

作者: Shaunak A. Mehta, Forrest Meng, Andrea Bajcsy

中文摘要: 在当前的互动中，机器人经常需要在线学习人类的奖励功能。这种实时学习需要快速但近似的学习规则：当人类的行为有噪声或次优时，当前的近似会导致机器人学习不稳定。因此，在本文中，我们试图增强梯度下降学习规则在推断人类奖励参数时的鲁棒性和收敛性。我们将机器人的学习算法建模为基于人类偏好参数的动态系统，其中人类的真实（但未知）偏好是平衡点。这使我们能够执行李亚普诺夫稳定性分析，以推导机器人学习动力学收敛的条件。我们提出的算法（StROL）使用这些条件来学习设计鲁棒的学习规则：给定原始的学习动态，StROL输出修改的学习规则，该规则现在在更大的人类输入集下收敛到人类的真实参数。在实践中，这些自主生成的学习规则可以正确地推断出人类试图传达的内容，即使人类是嘈杂的、有偏见的和次优的。通过模拟和用户研究，我们发现StROL比最先进的在线奖励学习方法产生更准确的估计和更少的遗憾。请点击此处查看视频和代码：https://github.com/VT-Collab/StROL_RAL

摘要: Robots often need to learn the human’s reward function online, during the
current interaction. This real-time learning requires fast but approximate
learning rules: when the human’s behavior is noisy or suboptimal, current
approximations can result in unstable robot learning. Accordingly, in this
paper we seek to enhance the robustness and convergence properties of gradient
descent learning rules when inferring the human’s reward parameters. We model
the robot’s learning algorithm as a dynamical system over the human preference
parameters, where the human’s true (but unknown) preferences are the
equilibrium point. This enables us to perform Lyapunov stability analysis to
derive the conditions under which the robot’s learning dynamics converge. Our
proposed algorithm (StROL) uses these conditions to learn robust-by-design
learning rules: given the original learning dynamics, StROL outputs a modified
learning rule that now converges to the human’s true parameters under a larger
set of human inputs. In practice, these autonomously generated learning rules
can correctly infer what the human is trying to convey, even when the human is
noisy, biased, and suboptimal. Across simulations and a user study we find that
StROL results in a more accurate estimate and less regret than state-of-the-art
approaches for online reward learning. See videos and code here:
https://github.com/VT-Collab/StROL_RAL

[Downlink:]http://arxiv.org/abs/2308.09863v2

[GitHub:]https://github.com/VT-Collab/StROL_RAL|

标题: Sample-efficient Reinforcement Learning in Robotic Table Tennis

作者: Jonas Tebbe, Lukas Krauch, Yapeng Gao

中文摘要: 强化学习（RL）最近在各种计算机游戏和模拟中取得了一些令人印象深刻的成功。这些成功中的大多数都是基于代理人可以从中学习的大量情节。然而，在典型的机器人应用中，可行的尝试次数非常有限。在本文中，我们提出了一个样本有效的RL算法应用于一个乒乓球机器人的例子。在乒乓球比赛中，每一次击球都是不同的，位置、速度和旋转都不同。因此，必须根据高维连续状态空间找到精确的返回。为了使在少数试验中学习成为可能，该方法被嵌入到我们的机器人系统中。这样我们就可以使用一步到位的环境。状态空间取决于击球时的球（位置、速度、旋转），动作是击球时的球拍状态（方向、速度）。提出了一种基于行动者——批评家的确定性策略梯度算法用于加速学习。在许多具有挑战性的场景中，我们的方法在模拟和真实机器人上都具有竞争力。在不到200美元的训练中，无需预训练即可获得准确的结果。展示我们实验的视频可在https：//youtu.be/uRAtdoL6Wpw。

摘要: Reinforcement learning (RL) has achieved some impressive recent successes in
various computer games and simulations. Most of these successes are based on
having large numbers of episodes from which the agent can learn. In typical
robotic applications, however, the number of feasible attempts is very limited.
In this paper we present a sample-efficient RL algorithm applied to the example
of a table tennis robot. In table tennis every stroke is different, with
varying placement, speed and spin. An accurate return therefore has to be found
depending on a high-dimensional continuous state space. To make learning in few
trials possible the method is embedded into our robot system. In this way we
can use a one-step environment. The state space depends on the ball at hitting
time (position, velocity, spin) and the action is the racket state
(orientation, velocity) at hitting. An actor-critic based deterministic policy
gradient algorithm was developed for accelerated learning. Our approach
performs competitively both in a simulation and on the real robot in a number
of challenging scenarios. Accurate results are obtained without pre-training in
under $200$ episodes of training. The video presenting our experiments is
available at https://youtu.be/uRAtdoL6Wpw.

[Downlink:]http://arxiv.org/abs/2011.03275v4

[Project:]https://youtu.be/uRAtdoL6Wpw.|

标题: Motion Control of Interactive Robotic Arms Based on Mixed Reality Development

作者: Hanxiao Chen

中文摘要: 混合现实（MR）正在不断发展，以激发机器人的新模式

摘要: Mixed Reality (MR) is constantly evolving to inspire new patterns of robot
manipulation for more advanced Human- Robot Interaction under the 4th
Industrial Revolution Paradigm. Consider that Mixed Reality aims to connect
physical and digital worlds to provide special immersive experiences, it is
necessary to establish the information exchange platform and robot control
systems within the developed MR scenarios. In this work, we mainly present
multiple effective motion control methods applied on different interactive
robotic arms (e.g., UR5, UR5e, myCobot) for the Unity-based development of MR
applications, including GUI control panel, text input control panel,
end-effector object dynamic tracking and ROS-Unity digital-twin connection.

[Downlink:]http://arxiv.org/abs/2401.01644v1

[Project:]http://www.icca.net/,|

标题: Transferability of HRI Research: Potential and Challenges

作者: Wafa Johal

中文摘要: 随着机器人技术和人工智能的进步，机器人技术的应用正在蓬勃发展。人机交互（HRI）是机器人学的一个重要领域，因为它允许机器人更接近人类（与人类一起或为人类工作）。HRI研究成功的一个关键因素是可转移性，这是指研究成果被行业采用并为社会提供利益的能力。在本文中，我们探讨了HRI研究中可转移性的潜力和挑战。首先，我们检查了HRI研究的现状，并确定了可能导致成功结果的各种类型的贡献。其次，我们讨论了每种类型的贡献的潜在好处，并确定了可以促进行业采用HRI研究的因素。然而，我们也认识到，有几个与可转移性相关的挑战，如人力资源机构从业者所需的明确定义的工作/技能组合的多样性，缺乏行业主导的研究，以及人力资源机构研究方法缺乏标准化。我们讨论了这些挑战，并提出了潜在的解决方案，以弥合行业期望和HRI学术研究之间的差距。

摘要: With advancement of robotics and artificial intelligence, applications for robotics are flourishing. Human-robot interaction (HRI) is an important area of robotics as it allows robots to work closer to humans (with them or for them). One crucial factor for the success of HRI research is transferability, which refers to the ability of research outputs to be adopted by industry and provide benefits to society. In this paper, we explore the potentials and challenges of transferability in HRI research. Firstly, we examine the current state of HRI research and identify various types of contributions that could lead to successful outcomes. Secondly, we discuss the potential benefits for each type of contribution and identify factors that could facilitate industry adoption of HRI research. However, we also recognize that there are several challenges associated with transferability, such as the diversity of well-defined job/skill-sets required from HRI practitioners, the lack of industry-led research, and the lack of standardization in HRI research methods. We discuss these challenges and propose potential solutions to bridge the gap between industry expectations and academic research in HRI.

[Downlink:]http://arxiv.org/abs/2401.05802v1

== Reinforcement Learning @ RL ==

标题: Bridging the Gap Between Target Networks and Functional Regularization

作者: Alexandre Piche, Valentin Thomas, Joseph Marino

中文摘要: 自举是深度强化学习许多成功的背后原因。然而，通过自举学习价值函数往往会由于目标值的快速变化而导致训练不稳定。通过使用一组附加的滞后参数来估计目标值，目标网络被用来稳定训练。尽管目标网络很受欢迎，但它们对优化的影响仍然被误解。在这项工作中，我们表明，他们作为一个隐式正则化。这种正则化器具有不灵活和非凸等缺点。为了克服这些问题，我们提出了一个显式函数正则化，它是函数空间中的一个凸正则化子，并且易于调整。我们从理论上分析了我们的方法的收敛性，并从经验上证明了用更有理论基础的函数正则化方法代替目标网络导致更好的样本效率和性能改进。

摘要: Bootstrapping is behind much of the successes of Deep Reinforcement Learning.
However, learning the value function via bootstrapping often leads to unstable
training due to fast-changing target values. Target Networks are employed to
stabilize training by using an additional set of lagging parameters to estimate
the target values. Despite the popularity of Target Networks, their effect on
the optimization is still misunderstood. In this work, we show that they act as
an implicit regularizer. This regularizer has disadvantages such as being
inflexible and non convex. To overcome these issues, we propose an explicit
Functional Regularization that is a convex regularizer in function space and
can easily be tuned. We analyze the convergence of our method theoretically and
empirically demonstrate that replacing Target Networks with the more
theoretically grounded Functional Regularization approach leads to better
sample efficiency and performance improvements.

[Downlink:]http://arxiv.org/abs/2210.12282v2

[Project:]https://openreview.net/forum?id=BFvoemrmqX|

标题: Understanding the Effects of RLHF on LLM Generalisation and Diversity

作者: Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis

中文摘要: 大型语言模型（LLMs）通过从人类反馈（RLHF）的强化学习进行了微调，已被用于迄今为止一些部署最广泛的人工智能模型，如OpenAI的ChatGPT或Anthropic的Claude。%，或Meta的美洲驼-2。虽然在开发这些方法方面已经做了大量的工作，但是我们对RLHF每个阶段的优点和缺点的理解仍然有限。为了填补这一空白，我们对该过程的每个阶段（即监督微调（SFT）、奖励建模和RLHF）如何影响两个关键属性进行了广泛的分析：分布外（OOD）概括和输出多样性。考虑到这些模型被使用的真实世界场景的广泛范围，OOD泛化是至关重要的，而输出多样性是指模型生成不同输出的能力，并且对于各种用例是重要的。我们对总结和指导任务的两个基本模型进行分析，后者与当前的LLM用例高度相关。我们发现RLHF比SFT更能推广到新的输入，特别是当训练和测试之间的分布偏移变大时。然而，与SFT相比，RLHF在各种测量中显著降低了输出多样性，这意味着当前LLM微调方法在泛化和多样性之间进行了权衡。我们的结果为根据应用应该使用哪种微调方法提供了指导，并表明需要更多的研究来改善普遍性和多样性之间的权衡。

摘要: Large language models (LLMs) fine-tuned with reinforcement learning from
human feedback (RLHF) have been used in some of the most widely deployed AI
models to date, such as OpenAI’s ChatGPT or Anthropic’s Claude. % , or Meta’s
LLaMA-2. While there has been significant work developing these methods, our
understanding of the benefits and downsides of each stage in RLHF is still
limited. To fill this gap, we present an extensive analysis of how each stage
of the process (i.e.~supervised fine-tuning (SFT), reward modelling, and RLHF)
affects two key properties: out-of-distribution (OOD) generalisation and output
diversity. OOD generalisation is crucial given the wide range of real-world
scenarios in which these models are being used, while output diversity refers
to the model’s ability to generate varied outputs and is important for a
variety of use cases. We perform our analysis across two base models on both
summarisation and instruction following tasks, the latter being highly relevant
for current LLM use cases. We find that RLHF generalises better than SFT to new
inputs, particularly as the distribution shift between train and test becomes
larger. However, RLHF significantly reduces output diversity compared to SFT
across a variety of measures, implying a tradeoff in current LLM fine-tuning
methods between generalisation and diversity. Our results provide guidance on
which fine-tuning method should be used depending on the application, and show
that more research is needed to improve the tradeoff between generalisation and
diversity.

[Downlink:]http://arxiv.org/abs/2310.06452v2

[GitHub:]https://github.com/facebookresearch/rlfh-gen-div|

标题: Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents

作者: Marco Pleines, Matthias Pallasch, Frank Zimmer

中文摘要: Memory Gym提供了一套2D部分可观察的环境，即迫击炮伤害、神秘路径和灼热的聚光灯，旨在对决策代理的记忆能力进行基准测试。这些最初任务有限的环境被扩展成创新的、无止境的格式，反映了累积记忆游戏（如“我打包了我的包”）不断升级的挑战。任务设计的这一进展将重点从仅仅评估样本效率转移到探索动态、长时间场景中的记忆效率水平。为了解决可用的基于内存的深度强化学习基线中的差距，我们引入了一种将Transformer model-XL（TrXL）与近似策略优化相集成的实现。这种方法利用TrXL作为情景记忆的一种形式，采用滑动窗口技术。我们对门控循环单元（GRU）和TrXL的比较研究揭示了不同设置下的不同性能。在有限环境下，TrXL在神秘路径中表现出优越的采样效率，在迫击炮伤害中表现出色。然而，GRU在灼热的聚光灯下效率更高。最值得注意的是，在所有没完没了的任务中，GRU取得了显著的复苏，持续大幅超过TrXL。网站和源代码：https://github.com/MarcoMeter/endless-memory-gym/

摘要: Memory Gym presents a suite of 2D partially observable environments, namely
Mortar Mayhem, Mystery Path, and Searing Spotlights, designed to benchmark
memory capabilities in decision-making agents. These environments, originally
with finite tasks, are expanded into innovative, endless formats, mirroring the
escalating challenges of cumulative memory games such as ``I packed my bag’'.
This progression in task design shifts the focus from merely assessing sample
efficiency to also probing the levels of memory effectiveness in dynamic,
prolonged scenarios. To address the gap in available memory-based Deep
Reinforcement Learning baselines, we introduce an implementation that
integrates Transformer-XL (TrXL) with Proximal Policy Optimization. This
approach utilizes TrXL as a form of episodic memory, employing a sliding window
technique. Our comparative study between the Gated Recurrent Unit (GRU) and
TrXL reveals varied performances across different settings. TrXL, on the finite
environments, demonstrates superior sample efficiency in Mystery Path and
outperforms in Mortar Mayhem. However, GRU is more efficient on Searing
Spotlights. Most notably, in all endless tasks, GRU makes a remarkable
resurgence, consistently outperforming TrXL by significant margins. Website and
Source Code: https://github.com/MarcoMeter/endless-memory-gym/

[Downlink:]http://arxiv.org/abs/2309.17207v3

[GitHub:]https://github.com/MarcoMeter/endless-memory-gym/|

标题: DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality

作者: Ankur Handa, Arthur Allshire, Viktor Makoviychuk

中文摘要: 最近的工作证明了深度强化学习（RL）算法在模拟中学习复杂机器人行为的能力，包括在多指操作领域。然而，由于模拟和现实之间的差距，这种模型很难转移到现实世界中。在本文中，我们介绍了我们的技术来训练a）可以在拟人化机器人手上执行鲁棒灵巧操作的策略和b）适合于提供关于被操纵物体状态的可靠实时信息的鲁棒姿态估计器。我们的策略经过训练，可以适应模拟中的各种条件。因此，在相同的重定向任务上，我们基于视觉的策略明显优于文献中的最佳视觉策略，并且与通过运动捕捉系统给予特权状态信息的策略具有竞争力。我们的工作重申了在各种硬件和模拟器设置中灵巧操作的模拟到真实转换的可能性，在我们的例子中，是基于Allegro Hand和Isaac Gym GPU的模拟。此外，它为研究人员提供了使用常见的、负担得起的机器人手和相机实现这些结果的可能性。由此产生的视频政策及补充包括实验和演示在内的信息可以在https：//dextreme.org/

摘要: Recent work has demonstrated the ability of deep reinforcement learning (RL)
algorithms to learn complex robotic behaviours in simulation, including in the
domain of multi-fingered manipulation. However, such models can be challenging
to transfer to the real world due to the gap between simulation and reality. In
this paper, we present our techniques to train a) a policy that can perform
robust dexterous manipulation on an anthropomorphic robot hand and b) a robust
pose estimator suitable for providing reliable real-time information on the
state of the object being manipulated. Our policies are trained to adapt to a
wide range of conditions in simulation. Consequently, our vision-based policies
significantly outperform the best vision policies in the literature on the same
reorientation task and are competitive with policies that are given privileged
state information via motion capture systems. Our work reaffirms the
possibilities of sim-to-real transfer for dexterous manipulation in diverse
kinds of hardware and simulator setups, and in our case, with the Allegro Hand
and Isaac Gym GPU-based simulation. Furthermore, it opens up possibilities for
researchers to achieve such results with commonly-available, affordable robot
hands and cameras. Videos of the resulting policy and supplementary
information, including experiments and demos, can be found at
https://dextreme.org/

[Downlink:]http://arxiv.org/abs/2210.13702v2

[Project:]https://dextreme.org/|

标题: Multi-agent Reinforcement Learning for Cooperative Lane Changing of Connected and Autonomous Vehicles in Mixed Traffic

作者: Wei Zhou, Dong Chen, Jun Yan

中文摘要: 自动驾驶在过去吸引了大量的研究兴趣
二十年，因为它提供了许多潜在的好处，包括释放司机
从疲惫的驾驶和缓解交通拥堵，等等。
尽管取得了可喜的进展，但变道仍然是一个巨大的挑战
自动驾驶汽车（AV），尤其是在混合和动态交通场景中。
最近，强化学习（RL），一种强大的数据驱动控制方法，
已被广泛研究用于AVs的变道决策
取得了令人鼓舞的成果。然而，这些研究中的大多数是
侧重于单车设置，以及在变道的背景下
与人类驾驶车辆（HDV）共存的多种AVs很少收到
注意。在本文中，我们制定了车道变换决策
混合交通公路环境中多个AVs作为多agent的研究
强化学习（MARL）问题，其中每个AV进行车道变换
基于相邻AVs和hdv的运动的决策。具体来说，
提出了一种新的多智能体优势演员——评论家网络（MA2C）
局部奖励设计和参数共享方案。特别是
提出了多目标奖励函数，
驾驶舒适性和自动驾驶的安全性。综合实验
在三种不同交通密度和不同水平下进行的结果
表明我们提出的MARL框架
在以下方面始终优于几个最先进的基准
效率、安全性和驾驶员舒适性。

摘要: Autonomous driving has attracted significant research interests in the past
two decades as it offers many potential benefits, including releasing drivers
from exhausting driving and mitigating traffic congestion, among others.
Despite promising progress, lane-changing remains a great challenge for
autonomous vehicles (AV), especially in mixed and dynamic traffic scenarios.
Recently, reinforcement learning (RL), a powerful data-driven control method,
has been widely explored for lane-changing decision makings in AVs with
encouraging results demonstrated. However, the majority of those studies are
focused on a single-vehicle setting, and lane-changing in the context of
multiple AVs coexisting with human-driven vehicles (HDVs) have received scarce
attention. In this paper, we formulate the lane-changing decision making of
multiple AVs in a mixed-traffic highway environment as a multi-agent
reinforcement learning (MARL) problem, where each AV makes lane-changing
decisions based on the motions of both neighboring AVs and HDVs. Specifically,
a multi-agent advantage actor-critic network (MA2C) is developed with a novel
local reward design and a parameter sharing scheme. In particular, a
multi-objective reward function is proposed to incorporate fuel efficiency,
driving comfort, and safety of autonomous driving. Comprehensive experimental
results, conducted under three different traffic densities and various levels
of human driver aggressiveness, show that our proposed MARL framework
consistently outperforms several state-of-the-art benchmarks in terms of
efficiency, safety and driver comfort.

[Downlink:]http://arxiv.org/abs/2111.06318v2

标题: Adaptive Discounting of Training Time Attacks

作者: Ridhima Bector, Abhay Aradhya, Chai Quek

中文摘要: 对强化学习（RL）解决方案最阴险的攻击之一是训练时攻击（TTAs），它在学习行为中制造漏洞和后门。不限于简单的破坏，建设性的TTAs（C-TTAs）现在是可用的，其中攻击者将特定的目标行为强加于训练的RL代理（受害者）。然而，即使是最先进的C-TTAs也关注目标行为，如果不是因为C-TTAs利用的环境动态的特定特征，受害者可能会自然采用这些行为。在这项工作中，我们表明，即使当目标行为由于环境动态以及相对于受害者目标的非最优性而不可采用时，C-TTA也是可能的。为了在这种情况下找到有效的攻击，我们开发了一种专门的DDPG算法，我们称之为gammaDDPG，它学习这种更强版本的C-TTA。gammaDDPG根据受害者的当前行为动态改变攻击策略规划范围。这改善了整个攻击时间线的工作分配，并减少了攻击者对受害者的不确定性的影响。为了展示我们方法的特点，并更好地将结果与之前的研究联系起来，我们从最先进的C-TTA借用了一个3D网格域进行实验。代码可从“bit.ly/github-rb-gDDPG”获得。

摘要: Among the most insidious attacks on Reinforcement Learning (RL) solutions are
training-time attacks (TTAs) that create loopholes and backdoors in the learned
behaviour. Not limited to a simple disruption, constructive TTAs (C-TTAs) are
now available, where the attacker forces a specific, target behaviour upon a
training RL agent (victim). However, even state-of-the-art C-TTAs focus on
target behaviours that could be naturally adopted by the victim if not for a
particular feature of the environment dynamics, which C-TTAs exploit. In this
work, we show that a C-TTA is possible even when the target behaviour is
un-adoptable due to both environment dynamics as well as non-optimality with
respect to the victim objective(s). To find efficient attacks in this context,
we develop a specialised flavour of the DDPG algorithm, which we term
gammaDDPG, that learns this stronger version of C-TTA. gammaDDPG dynamically
alters the attack policy planning horizon based on the victim’s current
behaviour. This improves effort distribution throughout the attack timeline and
reduces the effect of uncertainty the attacker has about the victim. To
demonstrate the features of our method and better relate the results to prior
research, we borrow a 3D grid domain from a state-of-the-art C-TTA for our
experiments. Code is available at “bit.ly/github-rb-gDDPG”.

[Downlink:]http://arxiv.org/abs/2401.02652v1

== Object Detection@ Segmentation@Open vocabulary detection ==

标题: OMG-Seg: Is One Model Good Enough For All Segmentation?

作者: Xiangtai Li, Haobo Yuan, Wei Li

中文摘要: 在这项工作中，我们解决了各种分割任务，每个任务传统上都由不同的或部分统一的模型来解决。我们提出了OMG-Seg，这是一个足够好的模型，可以高效和有效地处理所有分割任务，包括图像语义、实例和全景分割，以及它们的视频对应物、开放词汇设置、提示驱动的交互式分割（如SAM）和视频对象分割。据我们所知，这是第一个在一个模型中处理所有这些任务并实现令人满意的性能的模型。我们表明，OMG-Seg是一种基于Transformer model的编码器——解码器架构，具有特定于任务的查询和输出，可以支持十多种不同的分割任务，同时显著降低各种任务和数据集的计算和参数开销。我们严格评估了合作训练中任务间的影响和相关性。代码和模型可在https：//github.com/lxtGH/OMG-Seg获得。

摘要: In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models. We propose OMG-Seg, One Model that is Good enough to efficiently and effectively handle all the segmentation tasks, including image semantic, instance, and panoptic segmentation, as well as their video counterparts, open vocabulary settings, prompt-driven, interactive segmentation like SAM, and video object segmentation. To our knowledge, this is the first model to handle all these tasks in one model and achieve satisfactory performance. We show that OMG-Seg, a transformer-based encoder-decoder architecture with task-specific queries and outputs, can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead across various tasks and datasets. We rigorously evaluate the inter-task influences and correlations during co-training. Code and models are available at https://github.com/lxtGH/OMG-Seg.

[Downlink:]http://arxiv.org/abs/2401.10229v1

[Project:]https://lxtgh.github.io/project/omg_seg/|

[GitHub:]https://github.com/lxtGH/OMG-Seg.|

标题: RAP-SAM: Towards Real-Time All-Purpose Segment Anything

作者: Shilin Xu, Haobo Yuan, Qingyu Shi

中文摘要: 由Transformer model架构推进，视觉基础模型（VFMs）在性能和泛化能力方面取得了显著进步。Segment Anything模型（SAM）是一种能够实现广义分割的出色模型。然而，大多数VFM不能实时运行，这使得很难将它们转移到几个产品中。另一方面，目前的实时分割主要有一个目的，比如对驾驶场景进行语义分割。我们认为实际应用需要不同的输出。因此，本工作探索了一种新的实时分段设置，称为实时通用分段，以在实时部署中传输VFMs。它包含三个不同的任务，包括交互式分割、全景分割和视频分割。我们的目标是使用一个模型来实时完成上述任务。我们首先对几个强基线进行基准测试。然后，我们提出了实时通用SAM（RAP-SAM）。它包含一个高效的编码器和一个高效的解耦解码器来执行提示驱动解码。此外，我们进一步探索不同的训练策略和调整方法，以进一步提高共同训练的表现。我们的代码和模型可在https：//github.com/xushilin1/RAP-SAM/获得。

摘要: Advanced by transformer architecture, vision foundation models (VFMs) achieve remarkable progress in performance and generalization ability. Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation. However, most VFMs cannot run in realtime, which makes it difficult to transfer them into several products. On the other hand, current real-time segmentation mainly has one purpose, such as semantic segmentation on the driving scene. We argue that diverse outputs are needed for real applications. Thus, this work explores a new real-time segmentation setting, named all-purpose segmentation in real-time, to transfer VFMs in real-time deployment. It contains three different tasks, including interactive segmentation, panoptic segmentation, and video segmentation. We aim to use one model to achieve the above tasks in real-time. We first benchmark several strong baselines. Then, we present Real-Time All Purpose SAM (RAP-SAM). It contains an efficient encoder and an efficient decoupled decoder to perform prompt-driven decoding. Moreover, we further explore different training strategies and tuning methods to boost co-training performance further. Our code and model are available at https://github.com/xushilin1/RAP-SAM/.

[Downlink:]http://arxiv.org/abs/2401.10228v1

[Project:]https://xushilin1.github.io/rap_sam/|

[GitHub:]https://github.com/xushilin1/RAP-SAM/.|

标题: Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive

作者: Yumeng Li, Margret Keuper, Dan Zhang

中文摘要: 尽管大规模扩散模型最近取得了进展，但布局到图像（L2I）合成任务进展甚微。当前的L2I模型要么通过文本的可编辑性差，要么生成的图像和输入布局之间的对齐弱。这限制了它们在实践中的可用性。为了减轻这一点，我们建议将对抗性监督整合到L2I扩散模型（ALDM）的传统训练管道中。具体来说，我们采用基于分割的鉴别器，该鉴别器向扩散发生器提供关于去噪图像和输入布局之间的像素级对齐的显式反馈。为了鼓励在采样步骤中一致地遵守输入布局，我们进一步引入了多步展开策略。我们不是查看单个时间步长，而是递归地展开几个步骤来模拟推理过程，并要求鉴别器在特定时间窗口内评估去噪图像与布局的对齐情况。我们的实验表明，ALDM能够实现生成图像的布局忠实性，同时允许通过文本提示进行广泛的编辑。此外，我们展示了它在实际应用中的有用性：通过文本控制合成目标分布样本，我们大大提高了语义分割模型的领域泛化能力（约1200万分）。

摘要: Despite the recent advances in large-scale diffusion models, little progress has been made on the layout-to-image (L2I) synthesis task. Current L2I models either suffer from poor editability via text or weak alignment between the generated image and the input layout. This limits their usability in practice. To mitigate this, we propose to integrate adversarial supervision into the conventional training pipeline of L2I diffusion models (ALDM). Specifically, we employ a segmentation-based discriminator which provides explicit feedback to the diffusion generator on the pixel-level alignment between the denoised image and the input layout. To encourage consistent adherence to the input layout over the sampling steps, we further introduce the multistep unrolling strategy. Instead of looking at a single timestep, we unroll a few steps recursively to imitate the inference process, and ask the discriminator to assess the alignment of denoised images with the layout over a certain time window. Our experiments show that ALDM enables layout faithfulness of the generated images, while allowing broad editability via text prompts. Moreover, we showcase its usefulness for practical applications: by synthesizing target distribution samples via text control, we improve domain generalization of semantic segmentation models by a large margin (~12 mIoU points).

[Downlink:]http://arxiv.org/abs/2401.08815v1

[Project:]https://yumengli007.github.io/ALDM/|

[GitHub:]https://github.com/boschresearch/ALDM|

标题: LESEN: Label-Efficient deep learning for Multi-parametric MRI-based Visual Pathway Segmentation

作者: Alou Diakite, Cheng Li, Lei Xie

中文摘要: 最近的研究显示了深度学习在基于多参数MRI的视觉路径（VP）分割中的潜力。然而，获取用于训练的标记数据既费力又耗时。因此，在标记样本有限的情况下开发有效的算法至关重要。在这项工作中，我们提出了一种标签有效的自集成深度学习方法（LESEN）。LESEN结合了监督和非监督损失，使学生和教师模型能够相互学习，形成一个自我集成的平均教师框架。此外，我们引入了可靠的无标记样本选择（RUSS）机制，以进一步提高LESEN的有效性。我们在人类连接体项目（HCP）数据集上的实验证明了我们的方法与最先进的技术相比的卓越性能，推进了临床和研究环境中综合分析的多模态VP分割。实现代码可在以下网址获得：https：//github.com/aldiak/semi-supervised-multimodal-visual-pathway-delineation。

摘要: Recent research has shown the potential of deep learning in multi-parametric
MRI-based visual pathway (VP) segmentation. However, obtaining labeled data for
training is laborious and time-consuming. Therefore, it is crucial to develop
effective algorithms in situations with limited labeled samples. In this work,
we propose a label-efficient deep learning method with self-ensembling (LESEN).
LESEN incorporates supervised and unsupervised losses, enabling the student and
teacher models to mutually learn from each other, forming a self-ensembling
mean teacher framework. Additionally, we introduce a reliable unlabeled sample
selection (RUSS) mechanism to further enhance LESEN’s effectiveness. Our
experiments on the human connectome project (HCP) dataset demonstrate the
superior performance of our method when compared to state-of-the-art
techniques, advancing multimodal VP segmentation for comprehensive analysis in
clinical and research settings. The implementation code will be available at:
https://github.com/aldiak/Semi-Supervised-Multimodal-Visual-Pathway-
Delineation.

[Downlink:]http://arxiv.org/abs/2401.01654v1

[GitHub:]https://github.com/aldiak/Semi-Supervised-Multimodal-Visual-Pathway-|

标题: S3Net: Innovating Stereo Matching and Semantic Segmentation with a Single-Branch Semantic Stereo Network in Satellite Epipolar Imagery

作者: Qingyuan Yang, Guanzhou Chen, Xiaoliang Tan

中文摘要: 立体匹配和语义分割是双目卫星三维重建中的重要任务。然而，以前的研究主要将这些任务视为独立的并行任务，缺乏一个完整的多任务学习框架。本文介绍了一种解决方案，单分支语义立体网络（S3Net），它创新性地将语义分割和立体匹配结合起来，使用自融合和互融合模块。与以前独立利用语义或差异信息的方法不同，我们的方法确定并利用这两个任务之间的内在联系，导致对语义信息和差异估计的更准确理解。在US3D数据集上的对比测试证明了我们的S3Net的有效性。我们的模型将语义分割中的mIoU从61.38提高到67.39，并将视差估计中的D1误差和平均端点误差（EPE）分别从10.051降低到9.579和1.439降低到1.403，超过了现有的竞争方法。我们的代码可在以下网址查阅：https://github.com/CVEO/S3Net。

摘要: Stereo matching and semantic segmentation are significant tasks in binocular
satellite 3D reconstruction. However, previous studies primarily view these as
independent parallel tasks, lacking an integrated multitask learning framework.
This work introduces a solution, the Single-branch Semantic Stereo Network
(S3Net), which innovatively combines semantic segmentation and stereo matching
using Self-Fuse and Mutual-Fuse modules. Unlike preceding methods that utilize
semantic or disparity information independently, our method dentifies and
leverages the intrinsic link between these two tasks, leading to a more
accurate understanding of semantic information and disparity estimation.
Comparative testing on the US3D dataset proves the effectiveness of our S3Net.
Our model improves the mIoU in semantic segmentation from 61.38 to 67.39, and
reduces the D1-Error and average endpoint error (EPE) in disparity estimation
from 10.051 to 9.579 and 1.439 to 1.403 respectively, surpassing existing
competitive methods. Our codes are available at:https://github.com/CVEO/S3Net.

[Downlink:]http://arxiv.org/abs/2401.01643v1

[GitHub:]https://github.com/CVEO/S3Net.|

标题: Context-Aware Interaction Network for RGB-T Semantic Segmentation

作者: Ying Lv, Zhi Liu, Gongyang Li

中文摘要: RGB-T语义分割是自动驾驶场景理解的关键技术。然而，对于现有的RGB-T语义分割方法，没有在多层次的信息交互中实现对不同模态之间互补关系的有效探索。为了解决这一问题，提出了用于RGB-T语义分割的上下文感知交互网络（CAINet），该网络构建交互空间以利用辅助任务和全局上下文进行显式引导学习。具体来说，我们提出了一个上下文感知互补推理（CACR）模块，旨在建立多模态特征与长期上下文在空间和通道维度上的互补关系。此外，考虑到全局上下文和细节信息的重要性，我们提出了全局上下文建模（GCM）模块和细节聚合（DA）模块，并引入了特定的辅助监督来明确指导上下文交互和细化分割图。在MFNet和PST900的两个基准数据集上的大量实验表明，所提出的CAINet实现了最先进的性能。代码可在https://github.com/YingLv1106/CAINet。

摘要: RGB-T semantic segmentation is a key technique for autonomous driving scenes
understanding. For the existing RGB-T semantic segmentation methods, however,
the effective exploration of the complementary relationship between different
modalities is not implemented in the information interaction between multiple
levels. To address such an issue, the Context-Aware Interaction Network
(CAINet) is proposed for RGB-T semantic segmentation, which constructs
interaction space to exploit auxiliary tasks and global context for explicitly
guided learning. Specifically, we propose a Context-Aware Complementary
Reasoning (CACR) module aimed at establishing the complementary relationship
between multimodal features with the long-term context in both spatial and
channel dimensions. Further, considering the importance of global contextual
and detailed information, we propose the Global Context Modeling (GCM) module
and Detail Aggregation (DA) module, and we introduce specific auxiliary
supervision to explicitly guide the context interaction and refine the
segmentation map. Extensive experiments on two benchmark datasets of MFNet and
PST900 demonstrate that the proposed CAINet achieves state-of-the-art
performance. The code is available at https://github.com/YingLv1106/CAINet.

[Downlink:]http://arxiv.org/abs/2401.01624v1

[GitHub:]https://github.com/YingLv1106/CAINet.|

文章来源:https://blog.csdn.net/u011573853/article/details/135722706
本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若内容造成侵权/违法违规/事实不符，请联系我的编程经验分享网邮箱：chenni525@qq.com进行投诉反馈，一经查实，立即删除！