文献阅读2An Unsupervised Domain Adaptive Approach for Multimodal 2D

(仅供个人学习参考)

An Unsupervised Domain Adaptive Approach for Multimodal 2D:提出了第一个多模态无监督域自适应框架,用于自动驾驶中使用RGB和激光雷达的二维物体检测。

Abstract— Integrating different representations from complementary sensing modalities is crucial for robust scene interpretation in autonomous driving. While deep learning architectures that fuse vision and range data for 2D object detection have thrived in recent years, the corresponding modalities can degrade in adverse weather or lighting conditions, ultimately leading to a drop in performance. Although domain adaptation methods attempt to bridge the domain gap between source and target domains, they do not readily extend to heterogeneous data distributions. In this work, we propose an unsupervised domain adaptation framework, which adapts a 2D object detector for RGB and lidar sensors to one or more target domains featuring adverse weather conditions. Our proposed approach consists of three components. First, a data augmentation scheme that simulates weather distortions is devised to add domain confusion and prevent overfitting on the source data. Second, to promote cross-domain foreground object alignment, we leverage the complementary features of multiple modalities through a multi-scale entropy-weighted domain discriminator. Finally, we use carefully designed pretext tasks to learn a more robust representation of the target domain data. Experiments performed on the DENSE dataset show that our method can substantially alleviate the domain gap under the single-target domain adaptation (STDA) setting and the less explored yet more general multi-target domain adaptation
(MTDA) setting.

 使用了知名的YOLO-V3检测网络,对其调整,由两个独立的分支组成,由基于熵的融合模块连接。不仅接收RGB和激光雷达特征图作为输入,还分别从RGB图像和稀疏激光雷达深度图中提取相应的熵图,卷积和sigmoid激活后的熵映射的任务是增强信息量最大的特征,同时抑制不相关的特征,将熵模块放卷积层之后,以及YOLO-V3的头部。

本文有如下贡献:

1.Leveraging Data Augmentations for Multiple Modalities

首先,在观察到域差距在RGB和激光雷达中表现不同后,利用多种数据增强技术对源域进行增强,使域之间的距离更近。

Lidar:Points dropout、Additive Gaussian noise、Backscatter points,

RGB:color jittering、random horizontal flipping、scaling、translation.

2. Entropy-weighted Domain Adversarial Learning

域对抗训练寻求通过 min-max game与域分类器对齐源和目标域特征。然而,在所有特征上强制执行这种对齐约束可能会导致次优性能,因为图像和深度图中的某些区域在域之间不能转移。

为了抵消领域对抗训练期间的这些负面影响,我们利用基线中现有的优势,即通过传感器的熵通道缩放特征。局部测量熵赋予图像空间中更多不确定的值更高的值,如边缘、角落和前景对象。深度特征与熵的乘法衰减了背景,同时给予前景特征更多的权重。然而,熵模块存在一个限制:到探测器的输入特征映射来自激光雷达分支,这意味着激光雷达的熵将重新加权馈电到检测头的融合特征。

相比之下,我们的目标是通过利用模态特定信息和模态共享信息来学习领域不变的前景特征。为了缓解这种负面影响,我们修改了熵融合方案,并计算了激光雷达和RGB图像每像素的最大熵:

生成的熵用于两个流,利用了图像空间中由两个传感器捕获的信息量最大的区域,减少了在不同天气条件下的不对称模态噪声。

对融合方案进行改进后,我们在图1的P3、P4和P5处增加3个k∈{1,2,3}的域判别器Dk,以对齐不同尺度下不同域的熵加权实例级特征。每个鉴别器通过最小化最小二乘损失来学习将模态融合的源和目标特征(fs和ft)分类到它们的域中:

 为了混淆鉴别器,特征提取器必须通过最小化来学习域不变特征:

 在每个鉴别器中添加一个特征匹配损失Lfm,通过最小化鉴别器中每层l的鉴别器源和目标特征之间的l2-距离来正则化训练。Lfm的定义如下:

 3.Self-supervised learning for DA

 运用图中三种方法。

总结:Mutimodal+UDA

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值