作者 | eyesighting 编辑 | 自动驾驶之星
原文链接:https://zhuanlan.zhihu.com/p/624241501
点击下方卡片,关注“自动驾驶之心”公众号
戳我-> 领取自动驾驶近15个方向学习路线
本文只做学术分享,如有侵权,联系删文
写在前面的话:
随着自动驾驶技术的不断发展,从BEV Transformer 已经卷到端到端技术,我们将会带来一系列相关的论文,和大家一起探讨学习!
1.BirdEyesView综述论文
BEVPerceptionReviewEvaluationRecipe
题目:Vision-Centric BEV Perception: A Survey
名称:以视觉为中心的 BEV 感知:一项调查
论文:https://arxiv.org/abs/2208.02797
SurroundViewVision3DDetSurvey
题目:Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe
名称:深入研究鸟瞰感知的恶魔:回顾、评估和秘诀
论文:https://arxiv.org/abs/2209.05324
代码:https://github.com/OpenPerceptionX/BEVPerception-Survey-Recipe
VisionBEVPerceptionSurvey
题目:Surround-View Vision-based 3D Detection for Autonomous Driving: A Survey
名称:基于环视视觉的自动驾驶 3D 检测:一项调查
论文:https://arxiv.org/abs/2302.06650
VisionRadarFusionRobBEVDetSurvey
题目:Vision-RADAR fusion for Robotics BEV Detections: A Survey
名称:基于环视视觉的自动驾驶 3D 检测:一项调查
论文:https://arxiv.org/abs/2302.06643
2.BirdEyesView开源算法
360BEV
题目:360BEV: Panoramic Semantic Mapping for Indoor Bird's-Eye View
名称:360BEV:室内鸟瞰全景语义映射
论文:https://arxiv.org/abs/2303.11910
代码:https://github.com/jamycheung/360BEV
BEVDepth
题目:BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection
名称:BEVDepth:为多视图 3D 对象检测获取可靠的深度
论文:https://arxiv.org/abs/2206.10092
代码:https://github.com/Megvii-BaseDetection/BEVDepth
BEVDet
题目:BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View
名称:BEVDet:鸟瞰图中的高性能多相机 3D 目标检测
论文:https://arxiv.org/abs/2112.11790
代码:https://github.com/HuangJunJie2017/BEVDet
BEVDistill
题目:BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection
名称:BEVDistill:用于多视图 3D 对象检测的跨模态 BEV 蒸馏
论文:https://arxiv.org/abs/2211.09386
代码:https://github.com/zehuichen123/BEVDistill
BEVerse
题目:BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving
名称:BEVerse:以视觉为中心的自动驾驶鸟瞰图的统一感知和预测
论文:https://arxiv.org/abs/2205.09743
代码:https://github.com/zhangyp15/BEVerse
BEVFeatSitch
题目:Understanding Bird's-Eye View of Road Semantics using an Onboard Camera
名称:使用车载摄像头了解道路语义的鸟瞰图
论文:https://arxiv.org/abs/2012.03040
代码:https://github.com/ybarancan/BEV_feat_stitch
BEVFormer
题目:BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
名称:BEVFormer:通过时空变换器从多相机图像中学习鸟瞰图表示
论文:https://arxiv.org/abs/2203.17270
代码:https://github.com/zhiqi-li/BEVFormer
BEVFormerV2
题目:BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
名称:BEVFormer v2:通过透视监督使现代图像主干适应鸟瞰图识别
论文:https://arxiv.org/abs/2211.10439
代码:https://github.com/zhiqi-li/BEVFormer
BEVFusion
题目:BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework
名称:BEVFusion:一个简单而强大的 LiDAR-相机融合框架
论文:https://arxiv.org/abs/2205.13790
代码:https://github.com/ADLab-AutoDrive/BEVFusion
BEV-LaneDet
题目:BEV-LaneDet: a Simple and Effective 3D Lane Detection Baseline
名称:BEV-LaneDet:一种简单有效的 3D 车道检测基线
论文:https://arxiv.org/abs/2210.06006
代码:https://github.com/gigo-team/bev_lane_det
BEVPlace
题目:BEVPlace: Learning LiDAR-based Place Recognition using Bird's Eye View Images
名称:BEVPlace:使用鸟瞰图像学习基于 LiDAR 的地点识别
论文:https://arxiv.org/abs/2302.14325
代码:https://github.com/zjuluolun/BEVPlace
BEVSimDet
题目:BEVSimDet: Simulated Multi-modal Distillation in Bird's-Eye View for Multi-view 3D Object Detection
名称:BEVSimDet:用于多视图 3D 对象检测的鸟瞰图中的模拟多模态蒸馏
论文:https://arxiv.org/abs/2303.16818
代码:https://github.com/ViTAE-Transformer/BEVSimDet
BEVStereo
题目:BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stereo
名称:BEVStereo:使用动态时间立体增强多视图 3D 对象检测中的深度估计
论文:https://arxiv.org/abs/2209.10248
代码:https://github.com/Megvii-BaseDetection/BEVStereo
Cam2BEV
题目:A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's Eye View
名称:一种 Sim2Real 深度学习方法,用于将图像从多个车载摄像头转换为鸟瞰图的语义分割图像
论文:https://arxiv.org/abs/2005.04078
代码:https://github.com/ika-rwth-aachen/Cam2BEV
DeepIPC
题目:DeepIPC: Deeply Integrated Perception and Control for an Autonomous Vehicle in Real Environments
名称:DeepIPC:真实环境中自动驾驶汽车的深度集成感知和控制
论文:https://arxiv.org/abs/2207.09934
代码:https://github.com/oskarnatan/DeepIPC
DETR3D
题目:DETR3D 3D Object Detection from Multi-view Images via 3D-to-2D Queries
名称:DETR3D:通过 3D 到 2D 查询从多视图图像中检测 3D 对象
论文:https://arxiv.org/abs/2110.06922、
代码:https://github.com/WangYueFt/detr3d
Fast-BEV
题目:Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception
名称:Fast-BEV:迈向实时车载鸟瞰图感知
论文:https://arxiv.org/abs/2301.07870
代码:https://github.com/Sense-GVT/Fast-BEV
FIERY
题目:FIERY: Future Instance Prediction in Bird's-Eye View from Surround Monocular Cameras
名称:FIERY:环绕单目相机鸟瞰未来实例预测
论文:https://arxiv.org/abs/2104.10490
代码:https://github.com/wayveai/fiery
GKT-BEV
题目:Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer
名称:通过几何引导的内核变换器进行高效且稳健的 2D 到 BEV 表示学习
论文:https://arxiv.org/abs/2206.04584
代码:https://github.com/hustvl/GKT
HoPMV3D
题目:Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
名称:通过历史对象预测对多视图 3D 对象检测器进行时间增强训练
论文:https://arxiv.org/abs/2304.00967
代码:https://github.com/Sense-X/HoP
Img2Maps
题目:Translating Images into Maps
名称:将图像转化为地图
论文:https://arxiv.org/abs/2110.00966
代码:https://github.com/avishkarsaha/translating-images-into-maps
LaRa-BEV
题目:LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation
名称:LaRa:用于多相机鸟瞰图语义分割的潜在和射线
论文:https://arxiv.org/abs/2206.13294
代码:https://github.com/valeoai/LaRa
LSS
题目:Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D
名称:Lift、Splat、Shoot:通过隐式取消投影到 3D 对来自任意相机装备的图像进行编码
论文:https://arxiv.org/abs/2008.05711
代码:https://nv-tlabs.github.io/lift-splat-shoot
MetaBEV
题目:MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation
名称:MetaBEV:解决 BEV 检测和地图分割的传感器故障
论文:https://arxiv.org/abs/2304.09801
代码:https://chongjiange.github.io/metabev.html
MotionNet
题目:MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps
名称:MotionNet:基于鸟瞰图的自动驾驶联合感知和运动预测
论文:https://arxiv.org/abs/2003.06754
代码:https://github.com/pxiangwu/MotionNet
PersFormer
题目:PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark
名称:PersFormer:通过 Perspective Transformer 和 OpenLane Benchmark 进行 3D 车道检测
论文:https://arxiv.org/abs/2203.11089
代码:https://github.com/OpenPerceptionX/OpenLane
PETR
题目:PETR Position Embedding Transformation for Multi-View 3D Object Detection
名称:PETR:用于多视图 3D 对象检测的位置嵌入变换
论文:https://arxiv.org/abs/2203.05625
代码:https://github.com/megvii-research/PETR
PETRv2
题目:PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
名称:PETRv2:多相机图像 3D 感知的统一框架
论文:https://arxiv.org/abs/2206.01256
代码:https://github.com/megvii-research/PETR
PolarBEV
题目:Vision-based Uneven BEV Representation Learning with Polar Rasterization and Surface Estimation
名称:具有极光栅化和表面估计的基于视觉的不均匀 BEV 表示学习
论文:https://arxiv.org/abs/2207.01878
代码:https://github.com/SuperZ-Liu/PolarBEV
RoboBEV
题目:RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions
名称:RoboBEV:在腐败下实现稳健的鸟瞰图感知
论文:https://arxiv.org/abs/2304.06719
代码:https://github.com/Daniel-xsy/RoboBEV
STSU
题目:Structured Bird's-Eye-View Traffic Scene Understanding from Onboard Images
名称:基于车载图像的结构化鸟瞰交通场景理解
论文:https://arxiv.org/abs/2110.01997
代码:https://github.com/ybarancan/STSU
TiG-BEV
题目:TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning
名称:TiG-BEV:通过目标内部几何学习进行多视图 BEV 3D 对象检测
论文:https://arxiv.org/abs/2212.13979
代码:https://github.com/ADLab3Ds/TiG-BEV
TransFusion
题目:TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers
名称:TransFusion:用于使用 Transformer 进行 3D 对象检测的稳健 LiDAR-相机融合
论文:https://arxiv.org/abs/2203.11496
代码:https://github.com/XuyangBai/TransFusion
UniDistill
题目:UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View
名称:UniDistill:用于鸟瞰 3D 对象检测的通用跨模态知识蒸馏框架
论文:https://arxiv.org/abs/2303.15083
代码:https://github.com/megvii-research/CVPR2023-UniDistill
3.Occupancy综述论文
GridCentricFusPepADSurvey
题目:Grid-Centric Traffic Scenario Perception for Autonomous Driving: A Comprehensive Review
名称:自动驾驶的以网格为中心的交通场景感知:综合回顾
论文 :https://arxiv.org/abs/2303.01212
4.Occupancy开源算法
MonoScene
题目:MonoScene: Monocular 3D Semantic Scene Completion
名称:MonoScene:单目3D语义场景补全
论文:https://arxiv.org/abs/2112.00726
代码:https://github.com/astra-vision/MonoScene
OccDepth
题目:OccDepth: A Depth-Aware Method for 3D Semantic Scene Completion
名称:OccDepth:一种用于 3D 语义场景补全的深度感知方法
论文:https://arxiv.org/abs/2302.13540
代码:https://github.com/megvii-research/OccDepth
OccFormer
题目:OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction
名称:OccFormer:用于基于视觉的 3D 语义占用预测的双路径转换器
论文:https://arxiv.org/abs/2304.05316
代码:https://github.com/zhangyp15/OccFormer
OpenOccupancy
题目:OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception
名称:OpenOccupancy:周围语义占用感知的大规模基准
论文:https://arxiv.org/abs/2303.03991
代码:https://github.com/JeffWang987/OpenOccupancy
SimpleOccupancy
题目:A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving
名称:自动驾驶中 3D 占用率估计的简单尝
论文:https://arxiv.org/abs/2303.10076
代码:https://github.com/GANWANSHUI/SimpleOccupancy
StereoVoxelNet
题目:StereoVoxelNet: Real-Time Obstacle Detection Based on Occupancy Voxels from a Stereo Camera Using Deep Neural Networks
名称:StereoVoxelNet:基于使用深度神经网络的立体相机占用体素的实时障碍物检测
论文:https://arxiv.org/abs/2209.08459
代码:https://github.com/RIVeR-Lab/stereovoxelnet
SurroundOcc
题目:SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving
名称:SurroundOcc:自动驾驶的多摄像头 3D 占用预测
论文:https://arxiv.org/abs/2303.09551
代码:https://github.com/weiyithu/SurroundOcc
TPVFormer
题目:Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction
名称:基于视觉的 3D 语义占用预测的三视角视图
论文:https://arxiv.org/pdf/2302.07817
代码:https://github.com/wzzheng/TPVFormer
VoxFormer
题目:VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion
名称:VoxFormer:用于基于相机的 3D 语义场景完成的稀疏体素变换器
论文:https://arxiv.org/abs/2302.12251
代码:https://github.com/NVlabs/VoxFormer
总结
先前的的前融合、后融合感知方案要逐渐被淘汰了。基于多传感器融合、多任务、时序的BEV、Occupancy的大统一算法框架会有力推动L3/L4大规模落地。配套的芯片、工具链、方法论、产业链关系、商业模式也会发生变化。
投稿作者为『自动驾驶之心知识星球』特邀嘉宾,欢迎加入交流!
① 全网独家视频课程
BEV感知、BEV模型部署、BEV目标跟踪、毫米波雷达视觉融合、多传感器标定、多传感器融合、多模态3D目标检测、车道线检测、轨迹预测、在线高精地图、世界模型、点云3D目标检测、目标跟踪、Occupancy、cuda与TensorRT模型部署、大模型与自动驾驶、Nerf、语义分割、自动驾驶仿真、传感器部署、决策规划、轨迹预测等多个方向学习视频(扫码即可学习)
② 国内首个自动驾驶学习社区
国内最大最专业,近3000人的交流社区,已得到大多数自动驾驶公司的认可!涉及30+自动驾驶技术栈学习路线,从0到一带你入门自动驾驶感知(2D/3D检测、语义分割、车道线、BEV感知、Occupancy、多传感器融合、多传感器标定、目标跟踪)、自动驾驶定位建图(SLAM、高精地图、局部在线地图)、自动驾驶规划控制/轨迹预测等领域技术方案、大模型、端到端等,更有行业动态和岗位发布!欢迎扫描下方二维码,加入自动驾驶之心知识星球,这是一个真正有干货的地方,与领域大佬交流入门、学习、工作、跳槽上的各类难题,日常分享论文+代码+视频

③【自动驾驶之心】技术交流群
自动驾驶之心是首个自动驾驶开发者社区,聚焦感知、定位、融合、规控、标定、端到端、仿真、产品经理、自动驾驶开发、自动标注与数据闭环多个方向,目前近60+技术交流群,欢迎加入!
自动驾驶感知:目标检测、语义分割、BEV感知、毫米波雷达视觉融合、激光视觉融合、车道线检测、目标跟踪、Occupancy、深度估计、transformer、大模型、在线地图、点云处理、模型部署、CUDA加速等技术交流群;
多传感器标定:相机在线/离线标定、Lidar-Camera标定、Camera-Radar标定、Camera-IMU标定、多传感器时空同步等技术交流群;
多传感器融合:多传感器后融合技术交流群;
规划控制与预测:规划控制、轨迹预测、避障等技术交流群;
定位建图:视觉SLAM、激光SLAM、多传感器融合SLAM等技术交流群;
三维视觉:三维重建、NeRF、3D Gaussian Splatting技术交流群;
自动驾驶仿真:Carla仿真、Autoware仿真等技术交流群;
自动驾驶开发:自动驾驶开发、ROS等技术交流群;
其它方向:自动标注与数据闭环、产品经理、硬件选型、求职面试、自动驾驶测试等技术交流群;
扫码添加汽车人助理微信邀请入群,备注:学校/公司+方向+昵称(快速入群方式)
④【自动驾驶之心】全平台矩阵