语义分割 语义分类
Hello There! This post is about a road surface semantic segmentation approach. So the focus here is on the road surface patterns, like: what kind of pavement the vehicle is driving on or if there is any damage on the road, also the road markings and speed-bumps as well and other things that can be relevant for a vehicular navigation task.
你好! 这篇文章是关于路面语义分割方法的。 因此,这里的重点是路面模式,例如:车辆行驶在哪种路面上或道路上是否有损坏,还有道路标记和减速带以及其他与路面相关的事项车辆导航任务。
Here I will show you the step-by-step approach based on the preprint paper available at ResearchGate [1]. The Ground Truth and the experiments were made using the RTK dataset [2], with images captured with a low-cost camera, containing images of roads with different types of pavement and different conditions of pavement quality.
在这里,我将向您展示基于ResearchGate [1]上预印本的分步方法。 地面真相和实验是使用RTK数据集 [2]进行的,其中使用低成本相机捕获的图像包含了具有不同类型路面和不同路面质量条件的道路图像。
It was fun to work on it and I’m excited to share it, I hope you enjoy it too. 🤗
进行这项工作很有趣,很高兴与大家分享,希望您也喜欢。 🤗
介绍 (Introduction)
The purpose of this approach is to verify the effectiveness of using passive vision (camera) to detect different patterns on the road. For example, to identify if the road surface is an asphalt or cobblestone or an unpaved (dirt) road? This may be relevant for an intelligent vehicle, whether it is an autonomous vehicle or an Advanced Driver-Assistance System (ADAS). Depending on the type of pavement it may be necessary to adapt the way the vehicle is driven, whether for the safety of users or the conservation of the vehicle or even for the comfort of people inside the vehicle.
该方法的目的是验证使用被动视觉(摄像机)检测道路上不同模式的有效性。 例如,确定路面是沥青路面还是鹅卵石路面还是未铺砌的(污垢)道路? 这可能与智能车辆有关,无论它是自动驾驶汽车还是高级驾驶员辅助系统(ADAS)。 取决于路面的类型,可能出于驾驶员的安全性或车辆的保护,甚至为了车内人员的舒适性,需要调整车辆的行驶方式。
Another relevant factor of this approach is related to the detection of potholes and water-puddles, which could generate accidents, damage the vehicles and can be quite common in developing countries. This approach can also be useful for departments or organizations responsible for maintaining highways and roads.
此方法的另一个相关因素与坑洼和水坑的检测有关,坑坑和水坑可能会导致事故,损坏车辆,并且在发展中国家非常普遍。 这种方法对于负责维护公路和公路的部门或组织也很有用。
To achieve these objectives, Convolutional Neural Networks (CNN) were used for the semantic segmentation of the road surface, I’ll talk more about that in next sections.
为了实现这些目标,将卷积神经网络(CNN)用于路面的语义分割,我将在下一节中进一步讨论。
地面真相 (Ground Truth)
To train the neural network and to test and validate the results, a Ground Truth (GT) was created with 701 images from the RTK dataset. This GT is available on the dataset page and is composed by the following classes:
为了训练神经网络并测试和验证结果,使用来自RTK数据集中的701张图像创建了地面真实(GT)。 此GT在数据集页面上可用,并且由以下类组成:
方法和设置 (The approach and setup)
Everything done here was done using Google Colab. Which is a free Jupyter notebook environment and give us free access to GPUs and is super easy to use, also very helpful for organization and configuration. It was also used the fastai [3], the amazing deep learning library. To be more precise, the step-by-step that I will present was very much based on one of the lessons given by Jeremy Howard on one the courses about deep learning, in this case lesson3-camvid.
此处完成的所有操作均使用Google Colab完成。 这是一个免费的Jupyter笔记本环境,可让我们免费访问GPU,而且超级易用,对于组织和配置也非常有帮助。 它还使用了fastai [3],这是一个了不起的深度学习库。 更准确地说,我将逐步讲解的步骤很大程度上是基于杰里米·霍华德(Jeremy Howard)在有关深度学习的课程(在本例中为lesson3-camvid)的课程中提供的课程之一 。
The CNN architecture used was the U-NET [4], which is an architecture designed to perform the task of semantic segmentation in medical images, but successfully applied to many other approaches. In addition, ResNet [5] based encoder and a decoder are used. The experiments for this approach were done with resnet34 and resnet50.
所使用的CNN体系结构是U-NET [4],该体系结构旨在执行医学图像中的语义分割任务,但已成功应用于许多其他方法。 另外,使用基于ResNet [5]的编码器和解码器。 使用resnet34和resnet50完成了此方法的实验。
For the data augmentation step, standard options from the fastai library were used, with horizontal rotations and perspective distortion being applied. With fastai it is possible to take care to make the same variations made in the data augmentation step for both the original and mask (GT) images.
对于数据增强步骤,使用来自fastai库的标准选项,并应用了水平旋转和透视变形。 使用fastai时 ,可能要注意对原始图像和蒙版(GT)图像在数据扩充步骤中进行相同的更改。
A relevant point, which was of great importance for the definition of this approach, is that the classes of the GT are quite unbalanced, having much larger pixels of background or surface types (eg.: asphalt, paved or unpaved) than the other classes. Unlike an image classification problem, where perhaps replicating certain images from the dataset could help to balance the classes, in this case, replicating an image would imply further increasing the difference between the number of pixels from the largest to the smallest classes. Then, in the defined approach weights were used in the classes for balancing. 🤔
对于此方法的定义而言,非常重要的一点是,GT的类别非常不平衡,与其他类别相比,背景或表面类型(例如:沥青,已铺设或未铺设)的像素要大得多。 与图像分类问题不同,在图像分类问题中,也许从数据集中复制某些图像可以帮助平衡类别,在这种情况下,复制图像将意味着从最大类别到最小类别的像素数之间的差进一步增大。 然后,在定义的方法中,将权重用于类中以进行平衡。 🤔
Based on different experiments, it was realized that just applying the weights is not enough, because when improving the accuracy of the classes that contain a smaller amount of pixels, the classes that contain a larger amount of pixels (eg