From 4D Radar (Point Cloud&Radar Tensor)
RPFA-Net: a 4D RaDAR Pillar Feature Attention Network for 3D Object Detection (21’ITSC)
paper:https://ieeexplore.ieee.org/abstract/document/9564754
code: https://github.com/Nerdmust/astyx-pcdet-radar
论文精读
Abstract:
3D object detection is a crucial problem in environmental perception for autonomous driving.
在自动驾驶 环境感知中,3D目标检测是一个至关重要的难题。
Currently, most works focused on LiDAR, camera, or their fusion, while very few algorithms involve a RaDAR sensor, especially 4D RaDAR providing 3D position and velocity information.
目前,大多数工作都专注于激光雷达、相机以及他们的融合,而很少有算法 涉及雷达传感器,尤其是4D雷达提供3D坐标和速度信息。
4D RaDAR can work well in bad weather and has a higher performance than traditional 3D RaDAR, but it also contains lots of noise information and suffers measurement ambiguities.
4D雷达可以在恶劣天气下工作良好,并且相比于3D雷达具有更好的表现,但是他也包含较多的噪点,以及存在测量模糊的问题
Existing 3D object detection methods can’t judge the heading of objects by focusing on local features in sparse point clouds.
现有的3D目标检测方法不能通过聚焦在稀疏点云上的局部特征的来判断 物体的朝向。
To better overcome this problem, we propose a new method named RPFA-Net only using a 4D RaDAR, which utilizes a self-attention mechanism instead of PointNet to extract point clouds’ global features.
为了更好的克服这个问题,我们提出了一种名叫RPFA-Net只时用单个4D雷达的新方法,这种方法采用了一种自注意力机制的方法来代替pointnet网络去提取点云的全局特征。
These global features containing long-distance information can effectively improve the network’s ability to regress the heading angle of objects and enhance detection accuracy.
这些全局特征包含了长距离信息,可以有效地提升网络检测目标方位角度的回归能力,同时也能提升检测精度。
Our method’s performance is enhanced by 8.13% of 3D mAP and 5.52% of BEV mAP compared with the baseline.
相比于基线模型 我们的方法表明在3Dmap提升了8.13,以及在BEVmap提升了5.52
Extensive experiments show that RPFA-Net surpasses state-of-the-art 3D detection methods on Astyx HiRes 2019 dataset.
相关研究表明,RPFA-Net 在Astyx HiRes 2019 dataset 超越了了最先进的检测方法的冠军
The code and pre-trained models are available at https://github.com/adept-thu/RPFA-Net.git.
INTRODUCTION
引言
3D object detection is one of the most challenging prob-lems in the perception system of autonomous driving.
3D目标检测是自动驾驶目标检测系统中最具有挑战性任务中的其中之一。
The weather adaptability, detection distance, and price of LiDAR restrict the application of 3D object detection.
激光雷达的极端天气、检测距离以及价格限制了3D目标检测的应用。
In contrast,RaDAR makes up for the defects of LiDAR, and it is a sensor with more commercial prospects.
总之,雷达弥补了激光雷达的缺点,是一种具有商业价值的的传感器。
3D RaDAR can detect the horizontal position and velocity of objects, usually expressed as point clouds.
3D雷达可以检测出目标的水平位置和速度,通常以点云格式显示。
To adept the input format of the deep learning network, the data can be divided into grids or converted into bird’s eye view images.
为了适应深度学习网络的输入格式,数据可以被划分成网格或转换成鸟瞰图图像。
However, the sparsity of the RaDAR point cloud is still a challenge.
然而,雷达的点云稀疏性仍然是一个挑战。
The appearance of PointNet [2] provides a possibility for processing sparse data.
PointNet的出现提供了一种处理稀疏数据的可能性。
Andreas Danzer et al.[14] use Frustum PointNets [20] to classify, segment, and regress 3D RaDAR. The advantage of this method is that it does not need to manually design features or mesh the point cloud and can directly take the point cloud as input.
andreas Danzer等人使用视锥体 PointNets去分类、分割、回归3D雷达数据。这种方法的优势是它不需要手工设计特征或 网格化点云,并且能够直接使用点云进行输入。
However, 3D RaDAR data lack vertical information, and it can not use single-mode work in 3D object detection scene.Many researchers fuse 3D RaDAR with other sensors and add 3D RaDAR point cloud as supplementary information to the detection network.
然而,3D雷达数据缺乏垂直信息,在3维检测场景不能使用单一模式工作,许多研究者融合3D雷达和其他传感器,同时3D雷达作为补充信息给整个检测网络。
For example, CenterFusion [4] returns the center point of the object in Centernet [5]. It utilizes a truncated way to fuse RaDAR information to supplement the image features and regresses the depth, direction, and velocity of objects.
举个例子,Centerfusion 通过centernet返回了目标的中心坐标。他提供了一个截断的方式通过融合雷达信息来,以此来补充图像特征以及回归计算目标的深度、方位角、以及速度信息。
In this method, RaDAR is only an auxiliary sensor to provide the depth information and speed information, and the camera carries out the main work. Besides, RadarNet [21] is a fusion detection method of LiDAR and 3D RaDAR, which can detect dynamic objects based on early voxel fusion and late attention fusion.
在这种方法中,雷达仅作为一个辅助传感器去提供深度信息和速度信息,并且相机负责了主要的工作。此外,radarnet 是一种融合激光雷达和3D雷达的检测方法,这是一种依赖于早期体素融合和后注意力融合的动态目标检测。
But its pre-fusion extracts voxel features from point clouds with different properties and then uses channels to splice them.This method ignores the difference of point cloud between 3D RaDAR and LiDAR. Therefore, both industry and science
are actively introducing 4D RaDAR sensors. 3D RaDAR can only detect the horizontal position of objects, while 4D
RaDAR sensors can detect the absolute 3D coordinates of objects. Consequently, it is imperative to propose suitable 3D object detection methods for 4D RaDAR.
但是这种预融合的额外体素特征来自于不同属性的目标的点云,然后通过通道去拼接起来。这种方法忽略了3D雷达和激光雷达之间不同点云的区别。因此,工业界和科学界都开始积极的引入4D雷达。3D雷达仅仅了可以检测目标的水平位置,而4D雷达可以检测目标的绝对三维坐标。因此,为4D雷达提出合适的3维目标检测方法势在必行。
However, the 4D RaDAR sensor appears late, and there are few data sets. As far as we know, the only data set
including RaDAR data at present is nuScenes [3], but the 3D RaDAR used by this sensor only contains sparse 2D points,
and there are only about 100 points per frame. Therefore, it is a challenging task to identify different objects by only
using RaDAR data. If the 4D RaDAR point cloud with 3D coordinates and velocity information can be directly processed, then the method is similar to the technique of LiDAR from the above work. In the mature end-to-end 3D object detection methods based on LiDAR [15], [17], [18], [19], point clouds are usually transformed into voxel features, or features are extracted directly from point clouds. Although we can perform the same operation on 4D RaDAR point cloud, the local features obtained by the above methods do not perform well in sparse data.
然而,4D雷达出现较晚,并且有很少的数据集。据我们所知,只包含毫米波雷达的数据的目前只有nuscenes,但是只有包含2D点的三D雷达,只有每帧100点。因此通过仅仅使用雷达数据去识别不同的目标是一个具有挑战的任务。如果这4D雷达点云具有3D坐标和速度信息能直接被处理,这种方法可类似于激光雷达在这方面的工作。在激光雷达成熟的端到端的3D目标检测方法中,点云通常被处理为体素特征,或特征从点云中提取。尽管我们可以在4D雷达点云执行相同操作,但是通过上述方法获取的局部 特征在稀疏数据上上表现不好。
With the development of autonomous driving and sensor technology, it emerges new 4D RaDAR sensors and a 4D
RaDAR dataset, Astyx HiRes2019 [6], which contains 3D position and speed information of each object. The 4D point
cloud in this dataset can measure the position and contour of objects just like LiDAR point cloud, and each point contains
velocity information. Since most works focus on LiDAR-related work, works using this data set are very rare. The principle of RaDAR and LiDAR sensors is different, and their points cloud have wide variations. The 4D RaDAR point
cloud is often sparser and has many noises. Besides, due to the weak ability to describe the object’s shape, it can not judge the object’s orientation through the point cloud of a single object. The existing 3D object detection methods developed from the LiDAR dataset unable to adapt to the 4D RaDAR data set.
随着自动驾驶和传感器技术的发展,出现了新的4D雷达传感器和4D雷达数据集,如包含了每个目标的3D坐标和速度信息的数据集AH。这个数据集中的4D点云可以测量目标的轮廓和速度信息,就像雷达点云一样,同时每个点包含了速度信息。自从大多数工作聚焦于激光雷达相关的工作,所以使用这个数据的很少。激光雷达和雷达的工作原理不同,他们的点云具有较大的不同。4D雷达点云通常是稀疏且有很多噪声,此外,由于对目标形状描述能力较弱,它不能通过单一目标的点云的判别目标的朝向。现有的3维目标检测方法大多来自激光雷达 并不适用于4D毫米波雷达数据集。
This paper proposes a 4D RaDAR-based 3D object detec-tion network RPFA-Net using the self-attention mechanism to solve the above problems. RPFA-Net utilizes the self-attention mechanism to extract point clouds’ global features and it judges objects’ orientation through the global char- acteristics. Because the orientation angle is more accurate, the coincidence between the detection results and the ground truth will be higher, effectively improving detection accuracy. We conduct model training and experimental verification on the 4D RaDAR data set Astyx HiRes2019 [6]. The experimental results show that the detection accuracy of RPFA-Net is greatly improved compared with our baseline.
The structure of this work is as follows. Section II intro- duces the related work of RaDAR on object classification, semantic segmentation, and object detection. Section III shows our proposed 3D object detection method. Section IV presents the data set and training environment. Section V gives the experimental evaluation results and is followed by conclusions in section VI.
这篇文章提出了基于4D雷达的3维目标检测网络rpFAnet,该网路通过使用自注意力方法去实现上述问题。RPFA网络通过使用自注意力方法去提取点云中的全局特征 并且通过全局 判别目标的方位,因为方位角很精确,检测结果和真实目标的契合度很高,显著的提升了检测精度。我们在4D雷达数据集中进行了模型训练和消融实验。实验结果表明依赖于我们基础网络的检测精度具有显著的提升。整片工作结构,第二章是引入雷达目标分类、分割、目标检测的相关工作。第三章是展现了3维目标检测方法的目标。第四章展现了数据集和训练环境。第五章给出了是测试实验结果,最后一张给出了结论。
Multi-Class Road User Detection With 3+1D Radar in the View-of-Delft Dataset
在代尔夫特视角(View-of-Delft)数据集中利用3+1维雷达进行多类别道路使用者检测
Abstract:
摘要
Next-generation automotive radars provide elevation data in addition to range-, azimuth- and Doppler velocity.
下一代生产的自动驾驶雷达提供了除了距离、方位角、多普勒速度之外的高度数据。
In this experimental study, we apply a state-of-the-art object detector (PointPillars), previously used for LiDAR 3D data, to such 3+1D radar data (where 1D refers to Doppler).
In ablation studies, we first explore the benefits of the additional elevation information, together with that of Doppler, radar cross section and temporal accumulation, in the context of multi-class road user detection.
We subsequently compare object detection performance on the radar and LiDAR point clouds, object class-wise and as a function of distance.
To facilitate our experimental study, we present the novel View-of-Delft (VoD) automotive dataset.
It contains 8693 frames of synchronized and calibrated 64-layer LiDAR-, (stereo) camera-, and 3+1D radar-data acquired in complex, urban traffic.
It consists of 123106 3D bounding box annotations of both moving and static objects, including 26587 pedestrian, 10800 cyclist and 26949 car labels.
Our results show that object detection on 64-layer LiDAR data still outperforms that on 3+1D radar data, but the addition of elevation information and integration of successive radar scans helps close the gap.
The VoD dataset is made freely available for scientific benchmarking at https://intelligent-vehicles.org/datasets/view-of-delft/.