【论文阅读】Co-Occ: Coupling Explicit Feature Fusion With Volume Rendering Regularization for Multi-Modal

最新推荐文章于 2024-09-06 09:57:53 发布

骆驼穿针眼

最新推荐文章于 2024-09-06 09:57:53 发布

阅读量861

点赞数 23

分类专栏： bev 论文文章标签：论文阅读

本文链接：https://blog.csdn.net/weixin_55982578/article/details/139823965

版权

在这里插入图片描述
论文：https://arxiv.org/pdf/2404.04561v1
代码：https://github.com/Rorisis/Co-Occ?tab=readme-ov-file

Q: 这篇论文试图解决什么问题？

A: 这篇论文提出了一个名为Co-Occ的多模态3D语义占据预测框架，旨在解决自动驾驶领域中的3D语义占据预测问题。具体来说，它关注以下几个挑战：

模态异质性：不同传感器（如激光雷达（LiDAR）和摄像头）捕获的数据类型不同，导致数据融合困难。

模态不对齐：由于传感器之间的外部校准不准确，2D图像特征到3D空间的映射可能不精确。

模态交互不足：现有方法在融合不同模态数据时，可能无法充分利用两种模态的互补信息，导致重要的几何和语义信息丢失。

为了解决这些问题，Co-Occ框架结合了显式的LiDAR-摄像头特征融合和隐式的体积渲染正则化。关键思想是在特征空间中通过体积渲染桥接3D LiDAR扫描和2D图像之间的差距，同时作为物理正则化来增强LiDAR-摄像头融合的体积表示。

具体来说，Co-Occ框架包括以下几个关键组件：

几何和语义感知融合（GSFusion）模块：通过K最近邻（KNN）搜索显式地增强LiDAR特征，并通过KNN门操作选择相关的摄像头特征。

隐式体积渲染正则化：通过从摄像头发射光线并在场景中均匀采样，以及使用辅助头预测样本的密度和颜色，将颜色和深度投影回2D图像平面，并由摄像头输入图像和LiDAR派生深度图进行监督。

在这里插入图片描述

Q: 论文如何解决这个问题？

A: 论文提出了一个名为Co-Occ的多模态3D语义占据预测框架，通过以下几个关键组件和步骤来解决上述问题：

几何和语义感知融合（GSFusion）模块：

使用K最近邻（KNN）搜索在几何对齐的体素空间中找到与给定LiDAR坐标相邻的摄像头特征。

通过一个可学习的KNN门操作来获取每个LiDAR特征的语义权重，并将这些权重与LiDAR特征进行融合，以显式地增强LiDAR特征。
在这里插入图片描述

def fps_NN_fast(self, query, key, fps_num, radius, max_cluster_samples, dist_thresh, num):
    """Efficient NN search for huge amounts of query and key (suppose queries are redundant)
    
        Behavior:
        1. apply FPS on query and generate representative queries (repr_query)
        2. calculate repr_queries' distances with all keys, and get the NN key
        3. apply ball query to assign the same NN key with the group center 

    """
    # 根据 num 初始化 query_NN_key_idx
    if num == 1:
        query_NN_key_idx = torch.zeros_like(query[:, 0]).long() - 1
    else:
        query_NN_key_idx = (torch.zeros_like(query[:, 0]).long() - 1).repeat(num, 1)
    
    # 移除第一列并增加维度
    query = query[:, 1:].unsqueeze(0)
    key = key[:, 1:].unsqueeze(0)

    if num == 1:
        # 如果查询点数小于等于 fps_num
        if query.shape[1] <= fps_num:
            dist = torch.norm(query.float().unsqueeze(2) - key.float().unsqueeze(1), p=2, dim=-1)
            val, NN_key_idx = dist.squeeze(0).min(-1)
            valid_mask = val < dist_thresh
            query_NN_key_idx[valid_mask] = NN_key_idx[valid_mask]
            return query_NN_key_idx
        else:
            # 使用FPS生成代表性查询
            repr_query_idx = furthest_point_sample(query.float().contiguous(), fps_num)
            repr_query = query[:, repr_query_idx[0].long(), :]
            dist = torch.norm(repr_query.float().unsqueeze(2) - key.float().unsqueeze(1), p=2, dim=-1)
            val, NN_key_idx = dist.squeeze(0).min(-1)
            valid_mask = val < dist_thresh

            # 使用球查询分配相同的最近邻键
            query_group_idx = ball_query(0, radius, max_cluster_

最低0.47元/天解锁文章

骆驼穿针眼

关注

23
点赞
踩
22

收藏

觉得还不错? 一键收藏
0
评论
【论文阅读】Co-Occ: Coupling Explicit Feature Fusion With Volume Rendering Regularization for Multi-Modal

论文：https://arxiv.org/pdf/2404.04561v1代码：https://github.com/Rorisis/Co-Occ?
复制链接

扫一扫

专栏目录