自动驾驶BEV&Occupancy论文（五十篇左右）及算法总结-CSDN博客

作者 | eyesighting 编辑 | 自动驾驶之星

原文链接：https://zhuanlan.zhihu.com/p/624241501

点击下方卡片，关注“自动驾驶之心”公众号

戳我-> 领取自动驾驶近15个方向学习路线

>>点击进入→自动驾驶之心『BEV&占用网络』技术交流群

本文只做学术分享，如有侵权，联系删文

写在前面的话:

随着自动驾驶技术的不断发展，从BEV Transformer 已经卷到端到端技术，我们将会带来一系列相关的论文，和大家一起探讨学习！

1.BirdEyesView综述论文

BEVPerceptionReviewEvaluationRecipe

题目：Vision-Centric BEV Perception: A Survey

名称：以视觉为中心的 BEV 感知：一项调查

论文：https://arxiv.org/abs/2208.02797

SurroundViewVision3DDetSurvey

题目：Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe

名称：深入研究鸟瞰感知的恶魔：回顾、评估和秘诀

论文：https://arxiv.org/abs/2209.05324

代码：https://github.com/OpenPerceptionX/BEVPerception-Survey-Recipe

VisionBEVPerceptionSurvey

题目：Surround-View Vision-based 3D Detection for Autonomous Driving: A Survey

名称：基于环视视觉的自动驾驶 3D 检测：一项调查

论文：https://arxiv.org/abs/2302.06650

VisionRadarFusionRobBEVDetSurvey

题目：Vision-RADAR fusion for Robotics BEV Detections: A Survey

名称：基于环视视觉的自动驾驶 3D 检测：一项调查

论文：https://arxiv.org/abs/2302.06643

2.BirdEyesView开源算法

360BEV

题目：360BEV: Panoramic Semantic Mapping for Indoor Bird's-Eye View

名称：360BEV：室内鸟瞰全景语义映射

论文：https://arxiv.org/abs/2303.11910

代码：https://github.com/jamycheung/360BEV

BEVDepth

题目：BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection

名称：BEVDepth：为多视图 3D 对象检测获取可靠的深度

论文：https://arxiv.org/abs/2206.10092

代码：https://github.com/Megvii-BaseDetection/BEVDepth

BEVDet

题目：BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

名称：BEVDet：鸟瞰图中的高性能多相机 3D 目标检测

论文：https://arxiv.org/abs/2112.11790

代码：https://github.com/HuangJunJie2017/BEVDet

BEVDistill

题目：BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection

名称：BEVDistill：用于多视图 3D 对象检测的跨模态 BEV 蒸馏

论文：https://arxiv.org/abs/2211.09386

代码：https://github.com/zehuichen123/BEVDistill

BEVerse

题目：BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

名称：BEVerse：以视觉为中心的自动驾驶鸟瞰图的统一感知和预测

论文：https://arxiv.org/abs/2205.09743

代码：https://github.com/zhangyp15/BEVerse

BEVFeatSitch

题目：Understanding Bird's-Eye View of Road Semantics using an Onboard Camera

名称：使用车载摄像头了解道路语义的鸟瞰图

论文：https://arxiv.org/abs/2012.03040

代码：https://github.com/ybarancan/BEV_feat_stitch

BEVFormer

题目：BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

名称：BEVFormer：通过时空变换器从多相机图像中学习鸟瞰图表示

论文：https://arxiv.org/abs/2203.17270

代码：https://github.com/zhiqi-li/BEVFormer

BEVFormerV2

题目：BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

名称：BEVFormer v2：通过透视监督使现代图像主干适应鸟瞰图识别

论文：https://arxiv.org/abs/2211.10439

代码：https://github.com/zhiqi-li/BEVFormer

BEVFusion

题目：BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework

名称：BEVFusion：一个简单而强大的 LiDAR-相机融合框架

论文：https://arxiv.org/abs/2205.13790

代码：https://github.com/ADLab-AutoDrive/BEVFusion

BEV-LaneDet

题目：BEV-LaneDet: a Simple and Effective 3D Lane Detection Baseline

名称：BEV-LaneDet：一种简单有效的 3D 车道检测基线

论文：https://arxiv.org/abs/2210.06006

代码：https://github.com/gigo-team/bev_lane_det

BEVPlace

题目：BEVPlace: Learning LiDAR-based Place Recognition using Bird's Eye View Images

名称：BEVPlace：使用鸟瞰图像学习基于 LiDAR 的地点识别

论文：https://arxiv.org/abs/2302.14325

代码：https://github.com/zjuluolun/BEVPlace

BEVSimDet

题目：BEVSimDet: Simulated Multi-modal Distillation in Bird's-Eye View for Multi-view 3D Object Detection

名称：BEVSimDet：用于多视图 3D 对象检测的鸟瞰图中的模拟多模态蒸馏

论文：https://arxiv.org/abs/2303.16818

代码：https://github.com/ViTAE-Transformer/BEVSimDet

BEVStereo

题目：BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stereo

名称：BEVStereo：使用动态时间立体增强多视图 3D 对象检测中的深度估计

论文：https://arxiv.org/abs/2209.10248

代码：https://github.com/Megvii-BaseDetection/BEVStereo

Cam2BEV

题目：A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's Eye View

名称：一种 Sim2Real 深度学习方法，用于将图像从多个车载摄像头转换为鸟瞰图的语义分割图像

论文：https://arxiv.org/abs/2005.04078

代码：https://github.com/ika-rwth-aachen/Cam2BEV

DeepIPC

题目：DeepIPC: Deeply Integrated Perception and Control for an Autonomous Vehicle in Real Environments

名称：DeepIPC：真实环境中自动驾驶汽车的深度集成感知和控制

论文：https://arxiv.org/abs/2207.09934

代码：https://github.com/oskarnatan/DeepIPC

DETR3D

题目：DETR3D 3D Object Detection from Multi-view Images via 3D-to-2D Queries

名称：DETR3D：通过 3D 到 2D 查询从多视图图像中检测 3D 对象

论文：https://arxiv.org/abs/2110.06922、

代码：https://github.com/WangYueFt/detr3d

Fast-BEV

题目：Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception

名称：Fast-BEV：迈向实时车载鸟瞰图感知

论文：https://arxiv.org/abs/2301.07870

代码：https://github.com/Sense-GVT/Fast-BEV

FIERY

题目：FIERY: Future Instance Prediction in Bird's-Eye View from Surround Monocular Cameras

名称：FIERY：环绕单目相机鸟瞰未来实例预测

论文：https://arxiv.org/abs/2104.10490

代码：https://github.com/wayveai/fiery

GKT-BEV

题目：Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer

名称：通过几何引导的内核变换器进行高效且稳健的 2D 到 BEV 表示学习

论文：https://arxiv.org/abs/2206.04584

代码：https://github.com/hustvl/GKT

HoPMV3D

题目：Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction

名称：通过历史对象预测对多视图 3D 对象检测器进行时间增强训练

论文：https://arxiv.org/abs/2304.00967

代码：https://github.com/Sense-X/HoP

Img2Maps

题目：Translating Images into Maps

名称：将图像转化为地图

论文：https://arxiv.org/abs/2110.00966

代码：https://github.com/avishkarsaha/translating-images-into-maps

LaRa-BEV

题目：LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation

名称：LaRa：用于多相机鸟瞰图语义分割的潜在和射线

论文：https://arxiv.org/abs/2206.13294

代码：https://github.com/valeoai/LaRa

LSS

题目：Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D

名称：Lift、Splat、Shoot：通过隐式取消投影到 3D 对来自任意相机装备的图像进行编码

论文：https://arxiv.org/abs/2008.05711

代码：https://nv-tlabs.github.io/lift-splat-shoot

MetaBEV

题目：MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

名称：MetaBEV：解决 BEV 检测和地图分割的传感器故障

论文：https://arxiv.org/abs/2304.09801

代码：https://chongjiange.github.io/metabev.html

MotionNet

题目：MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps

名称：MotionNet：基于鸟瞰图的自动驾驶联合感知和运动预测

论文：https://arxiv.org/abs/2003.06754

代码：https://github.com/pxiangwu/MotionNet

PersFormer

题目：PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark

名称：PersFormer：通过 Perspective Transformer 和 OpenLane Benchmark 进行 3D 车道检测

论文：https://arxiv.org/abs/2203.11089

代码：https://github.com/OpenPerceptionX/OpenLane

PETR

题目：PETR Position Embedding Transformation for Multi-View 3D Object Detection

名称：PETR：用于多视图 3D 对象检测的位置嵌入变换

论文：https://arxiv.org/abs/2203.05625

代码：https://github.com/megvii-research/PETR

PETRv2

题目：PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images

名称：PETRv2：多相机图像 3D 感知的统一框架

论文：https://arxiv.org/abs/2206.01256

代码：https://github.com/megvii-research/PETR

PolarBEV

题目：Vision-based Uneven BEV Representation Learning with Polar Rasterization and Surface Estimation

名称：具有极光栅化和表面估计的基于视觉的不均匀 BEV 表示学习

论文：https://arxiv.org/abs/2207.01878

代码：https://github.com/SuperZ-Liu/PolarBEV

RoboBEV

题目：RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions

名称：RoboBEV：在腐败下实现稳健的鸟瞰图感知

论文：https://arxiv.org/abs/2304.06719

代码：https://github.com/Daniel-xsy/RoboBEV

STSU

题目：Structured Bird's-Eye-View Traffic Scene Understanding from Onboard Images

名称：基于车载图像的结构化鸟瞰交通场景理解

论文：https://arxiv.org/abs/2110.01997

代码：https://github.com/ybarancan/STSU

TiG-BEV

题目：TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning

名称：TiG-BEV：通过目标内部几何学习进行多视图 BEV 3D 对象检测

论文：https://arxiv.org/abs/2212.13979

代码：https://github.com/ADLab3Ds/TiG-BEV

TransFusion

题目：TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers

名称：TransFusion：用于使用 Transformer 进行 3D 对象检测的稳健 LiDAR-相机融合

论文：https://arxiv.org/abs/2203.11496

代码：https://github.com/XuyangBai/TransFusion

UniDistill

题目：UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View

名称：UniDistill：用于鸟瞰 3D 对象检测的通用跨模态知识蒸馏框架

论文：https://arxiv.org/abs/2303.15083

代码：https://github.com/megvii-research/CVPR2023-UniDistill

3.Occupancy综述论文

GridCentricFusPepADSurvey

题目：Grid-Centric Traffic Scenario Perception for Autonomous Driving: A Comprehensive Review

名称：自动驾驶的以网格为中心的交通场景感知：综合回顾

论文：https://arxiv.org/abs/2303.01212

4.Occupancy开源算法

MonoScene

题目：MonoScene: Monocular 3D Semantic Scene Completion

名称：MonoScene：单目3D语义场景补全

论文：https://arxiv.org/abs/2112.00726

代码：https://github.com/astra-vision/MonoScene

OccDepth

题目：OccDepth: A Depth-Aware Method for 3D Semantic Scene Completion

名称：OccDepth：一种用于 3D 语义场景补全的深度感知方法

论文：https://arxiv.org/abs/2302.13540

代码：https://github.com/megvii-research/OccDepth

OccFormer

题目：OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

名称：OccFormer：用于基于视觉的 3D 语义占用预测的双路径转换器

论文：https://arxiv.org/abs/2304.05316

代码：https://github.com/zhangyp15/OccFormer

OpenOccupancy

题目：OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception

名称：OpenOccupancy：周围语义占用感知的大规模基准

论文：https://arxiv.org/abs/2303.03991

代码：https://github.com/JeffWang987/OpenOccupancy

SimpleOccupancy

题目：A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving

名称：自动驾驶中 3D 占用率估计的简单尝

论文：https://arxiv.org/abs/2303.10076

代码：https://github.com/GANWANSHUI/SimpleOccupancy

StereoVoxelNet

题目：StereoVoxelNet: Real-Time Obstacle Detection Based on Occupancy Voxels from a Stereo Camera Using Deep Neural Networks

名称：StereoVoxelNet：基于使用深度神经网络的立体相机占用体素的实时障碍物检测

论文：https://arxiv.org/abs/2209.08459

代码：https://github.com/RIVeR-Lab/stereovoxelnet

SurroundOcc

题目：SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

名称：SurroundOcc：自动驾驶的多摄像头 3D 占用预测

论文：https://arxiv.org/abs/2303.09551

代码：https://github.com/weiyithu/SurroundOcc

TPVFormer

题目：Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction

名称：基于视觉的 3D 语义占用预测的三视角视图

论文：https://arxiv.org/pdf/2302.07817

代码：https://github.com/wzzheng/TPVFormer

VoxFormer

题目：VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion

名称：VoxFormer：用于基于相机的 3D 语义场景完成的稀疏体素变换器

论文：https://arxiv.org/abs/2302.12251

代码：https://github.com/NVlabs/VoxFormer

总结

先前的的前融合、后融合感知方案要逐渐被淘汰了。基于多传感器融合、多任务、时序的BEV、Occupancy的大统一算法框架会有力推动L3/L4大规模落地。配套的芯片、工具链、方法论、产业链关系、商业模式也会发生变化。

投稿作者为『自动驾驶之心知识星球』特邀嘉宾，欢迎加入交流！

① 全网独家视频课程

BEV感知、BEV模型部署、BEV目标跟踪、毫米波雷达视觉融合、多传感器标定、多传感器融合、多模态3D目标检测、车道线检测、轨迹预测、在线高精地图、世界模型、点云3D目标检测、目标跟踪、Occupancy、cuda与TensorRT模型部署、大模型与自动驾驶、Nerf、语义分割、自动驾驶仿真、传感器部署、决策规划、轨迹预测等多个方向学习视频（扫码即可学习）

网页端官网：www.zdjszx.com

② 国内首个自动驾驶学习社区

国内最大最专业，近3000人的交流社区，已得到大多数自动驾驶公司的认可！涉及30+自动驾驶技术栈学习路线，从0到一带你入门自动驾驶感知（2D/3D检测、语义分割、车道线、BEV感知、Occupancy、多传感器融合、多传感器标定、目标跟踪）、自动驾驶定位建图（SLAM、高精地图、局部在线地图）、自动驾驶规划控制/轨迹预测等领域技术方案、大模型、端到端等，更有行业动态和岗位发布！欢迎扫描下方二维码，加入自动驾驶之心知识星球，这是一个真正有干货的地方，与领域大佬交流入门、学习、工作、跳槽上的各类难题，日常分享论文+代码+视频