语义分割的目标:是将一个场景分割成几个有意义的部分,通常是用语义标记图像中的每个像素(pixel-level semantic segmentation),或者同时检测对象并进行逐像素标记(instance-level semantic segmentation)。
最近,为了统一pixel-level semantic segmentation和instance-level semantic segmentation,提出了全景分割(panoptic segmentation)。
1 多传感模式的特点
- 视觉和热成像相机:视觉(visual camera)和热成像相机(thermal camera)捕捉到的图像可以提供车辆周围环境的详细纹理信息。视觉相机对光线和天气条件很敏感;热成像相机对白天/夜间的变化更敏感,因为它们能探测到与物体热量有关的红外辐射。然而,这两种类型的相机都不能直接提供深度信息。
- LIDAR(Light Detection And Ranging):以三维点的形式给出周围环境的精确深度信息。LIDAR是主动摄影,它测量以一定频率发射的激光束的反射。激光雷达对不同的照明条件受影响较小,而且比视觉相机更少受到各种天气条件的影响,如雾和雨。典型的激光雷达无法捕捉到物体的精细纹理,且当物体距离较远时,激光雷达的点会变得稀疏。
- Radar(无线电探测和测距):Radar发射被障碍物反射的电磁波,测量信号运行时间,通过多普勒效应估计物体的径向速度、距离和角度。它们在各种光照和天气条件下都很鲁棒,但由于分辨率低,通过雷达对物体进行分类非常具有挑战性。radar在自适应巡航控制和交通拥堵辅助系统中有着广泛的应用。毫米波(mmWave)是一种短波雷达技术。
2 深度语义分割
深度语义分割的数据集 | ||
---|---|---|
Cityscape | KITTI | Toronto City |
Mapillary远景 | ApolloScape |
专注于分类的像素级语义分割 | 【3】/【4】/【5】 |
---|---|
专注于路端语义分割 | 【6】/【7】 |
专注于不同交通参与者的实例级语义分割 | 【8】/【9】/【10】 |
融合全局信息的语义分割 | 扩张卷积【11】【12】,多尺度预测【13】,以及添加条件随机场(CRFs)作为后处理步骤【14】 |
专注于语义分割的实时性 | 从操作(GFLOPs)和推理速度(fps)两个方面对几种语义分词架构的实时性进行了比较研究【15】 |
3 多模态语义分割
3.1 MULTI-MODAL DATASETS
3.2 多模态语义分割的挑战与问题
参考
- 自动驾驶深度多模态目标检测和语义分割:数据集、方法和挑战
- Deep Multi-modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges
- A. Dewan, G. L. Oliveira, and W. Burgard, “Deep semantic classification for 3d lidar data,” in IEEE/RSJ Int. Conf. Intelligent Robots and Systems, 2017, pp. 3544–3549.
- L. Schneider et al., “Multimodal neural networks: RGB-D for semantic segmentation and object detection,” in Scandinavian Conf. Image Analysis. Springer, 2017, pp. 98–109.
- LV. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., no. 12, pp. 2481–2495, 2017.
- L. Caltagirone, S. Scheidegger, L. Svensson, and M. Wahde, “Fast lidar-based road detection using fully convolutional neural networks,” in IEEE Intelligent Vehicles Symp., 2017, pp. 1019–1024.
- M. Teichmann, M. Weber, M. Zoellner, R. Cipolla, and R. Urtasun, “MultiNet: Real-time joint semantic reasoning for autonomous driving,” in IEEE Intelligent Vehicles Symp., 2018.
- B. Wu, A. Wan, X. Yue, and K. Keutzer, “SqueezeSeg: Convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3d lidar point cloud,” in IEEE Int. Conf. Robotics and Automation, May 2018, pp. 1887–1893.
- K. He, G. Gkioxari, P. Doll ́ ar, and R. Girshick, “Mask R-CNN,” in Proc. IEEE Conf. Computer Vision, 2017, pp. 2980–2988.
- J. Uhrig, E. Rehder, B. Fr ̈ ohlich, U. Franke, and T. Brox, “Box2Pix: Single-shot instance segmentation by assigning pixels to object boxes,” in IEEE Intelligent Vehicles Symp., 2018.
- L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, 2018.
- A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “ENet: A deep neural network architecture for real-time semantic segmentation,” arXiv:1606.02147 [cs.CV], 2016.
- A. Roy and S. Todorovic, “A multi-scale CNN for affordance segmentation in RGB images,” in Proc. Eur. Conf. Computer Vision. Springer, 2016, pp. 186–201.
- S. Zheng et al., “Conditional random fields as recurrent neural networks,” in Proc. IEEE Conf. Computer Vision, 2015, pp. 1529–1537.
- M. Siam, M. Gamal, M. Abdel-Razek, S. Yogamani, M. Jagersand, and H. Zhang, “A comparative study of real-time semantic segmentation for autonomous driving,” in Workshop Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 587–597.