Motivation
- Recent methods like pointpillar or MVF highly rely on anchors to predict parameters of bounding boxes, but authors think that it is unessesarry and uneffctive to do in this way for two reasons:
- hyperparameters of anchors need to be fine-tuned case-by-case for different tasks or datasets
- anchors are sparsely distributed in each scene, leading to significant imbalance problems
- So authors propose to predict bbox for each pillar rather than anchors.
- They also propose to use cylindrical view to avoid perspective distortions
- They use bilinear interpolation to prevent spatial aliasing
Method
- Overview of their method
- a point cloud is first projected into BEV and CYV respectively, then for each view they utilize CNNs to extract their features. Third, multi-view features are aggregated in the same way as MVF. Next, pointwise features are projected back to BEV map again which is passed through an additional CNN. Finally, the final predictions are made by a classification network and a regression network respectively, which is based on birds-eye view pillars.
Experience
-
Effectiveness about pillar-based predictions
-
Effectiveness of CYV