【论文阅读】【综述】激光雷达-相机融合的道路检测方法

最新推荐文章于 2025-03-06 16:38:34 发布

麒麒哈尔

最新推荐文章于 2025-03-06 16:38:34 发布

阅读量2.8k

点赞数 3

分类专栏：论文阅读文章标签：计算机视觉自动驾驶 CNN 道路检测

本文链接：https://blog.csdn.net/wqwqqwqw1231/article/details/109905798

版权

本文概述了多种基于激光雷达和相机融合的道路检测方法，包括SNE-RoadSeg、LidCamNet等，利用CNN和深度学习技术，结合深度图像和点云数据进行道路分割和检测。此外，还探讨了使用CRF进行融合的技术，如Road Detection through CRF based LiDAR-Camera Fusion。所有方法旨在提升自动驾驶中的道路识别准确性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

文章目录

Network
CRF
- Road Detection through CRF based LiDAR-Camera Fusion
- Road detection based on the fusion of Lidar and image data
其他
- Autonomous road detection and modeling for UGVs using vision-laser data fusion
总结

本博客列举一些激光雷达-相机融合的方法，主要是针对Road Detection问题的。其实Road Detection是属于Semantic Segmentation问题的，只是需要划分的就是两类，一个是道路，一个是非道路。目前主流方法当然是CNN，需要大量的数据用来训练。另外还有就是使用CRF。

Network

SNE-RoadSeg

文章：SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentation for Accurate Freespace Detection
发表：ECCV， 2020

文章的具体解读可以看我另一篇博客。这里就简述一下其方法：
在这里插入图片描述
网络的输入是两种图，一种是RGB，另一种是Depth。文章中并没有直接用Depth，而是通过SNE的变换用Depth图提取了pixel-wise的法向量。然后用两个ResNet做Encoder，Decoder是用DenseNet的形式构建的。

虽然本文不是LiDAR和Camera的Fusion，但是Kitti不提供Depth Image，肯定是通过某种方法用LiDAR的数据恢复出来的，文章中没细说。

LidCamNet

文章：LIDAR–camera fusion for road detection using fully convolutional neural networks
发表：Robotics and Autonomous Systems， 2019

该文章提出的方法首先是将LiDAR获取的Point Cloud转为Depth Image，具体的方法可参考文章中的Section 4，思路仍然是对齐LiDAR和Camera的数据，然后把LiDAR投影上去，最后再插值/补齐空洞得到稠密的Depth Image。文章中给出了他们做的引文：Pedestrian detection combining rgb and dense lidar data, in: Intelligent Robots and Systems (IROS 2014)。得到的效果图如下：
在这里插入图片描述
然后来介绍一下网络，网络也非常简单：

Encoder和Decoder就是FCN，其中L6-L14是3x3的Dilated Convolution Layer。本文提出来Cross Fusion来融合两种数据：

也就是说，其实每个尺度的RGB和Depth的特征图都做了学习权重的pixel-wise addition。

Road segmentation with image-LiDAR data fusion in deep neural network

文章：Road segmentation with image-LiDAR data fusion in deep neural network
发表：Multimedia Tools and Applications，2019

本方法仍然是将LiDAR投影到Image上做的。具体网络如下：
在这里插入图片描述
使用ResNet-50做Encoder，得到1/4~1/32的feature map。然后再使用多个RFU来融合LiDAR数据和上采样。RFU具体见下图：

可以看到，RFU有两个作用：1）融合低分辨率的高分辨的图像，2）融合同分辨率的来自image和lidar的feature map。其中image feature maps是由Encoder输出的，LiDAR points projection则是通过投影进，然后缩放得到的。也就是说，其实是用LiDAR points投影得到了一个深度图，然后对深度图做了图像金字塔。