CVPR 2025 | 自动驾驶论文总结

3Ｄ视觉工坊

于 2025-05-11 00:04:59 发布

阅读量169

点赞数

文章标签：自动驾驶人工智能机器学习

原文链接：https://mp.weixin.qq.com/s?__biz=MzU1MjY4MTA1MQ==&mid=2247727188&idx=3&sn=2335e2e73bd8424a15b1ef7a3fe4158c&chksm=fa2131b2f589ba70a8e6a40098d6755acfd06efb8269623219382ca7e9a633c359a0f0e46321&scene=126&sessionid=0

版权

点击下方卡片，关注「3D视觉工坊」公众号
选择星标，干货第一时间送达

编辑：3D视觉工坊

来源：https://zhuanlan.zhihu.com/p/1903225674010976913

「3D视觉从入门到精通」知识星球(点开有惊喜) ！星球内新增20多门3D视觉系统课程、入门环境配置教程、多场顶会直播、顶会论文最新解读、3D视觉算法源码、求职招聘等。想要入门3D视觉、做项目、搞科研，欢迎扫码加入！

前言

本文总结了CVPR-2025自动驾驶论文，总计50篇论文，可作为科研、开发的参考资料。

1.AD/自动驾驶

SplatAD

题目：SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving

名称：SplatAD：实时激光雷达和相机渲染与3D高斯泼溅自动驾驶

论文：https://arxiv.org/abs/2411.16816

代码：https://github.com/carlinds/splatad

单位：Zenseact(瑞典)、Chalmers(瑞典)

出版：CVPR 2025

OmniDrive

题目：OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning

名称：OmniDrive：具有 3D 感知、推理和规划功能的自动驾驶整体 LLM-Agent 框架

论文：https://arxiv.org/abs/2405.01533

代码：https://github.com/NVlabs/OmniDrive

单位：北理工、NVIDAI、华科

出版：CVPR 2025

CityWalker

题目：CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos

名称：CityWalker：通过网络视频学习具象城市导航

论文：https://arxiv.org/abs/2411.17820

代码：https://github.com/ai4ce/CityWalker

单位：纽约大学

出版：CVPR 2025

CarPlanner

题目：CarPlanner: Consistent Auto-regressive Trajectory Planning for Large-scale Reinforcement Learning in Autonomous Driving

名称：CarPlanner：用于自动驾驶大规模强化学习的一致性自回归轨迹规划

论文：https://arxiv.org/abs/2502.19908

代码：

单位：浙大、菜鸟网络

出版：CVPR 2025

UniScene

题目：UniScene: Unified Occupancy-centric Driving Scene Generation

名称：UniScene：以占用为中心的统一驾驶场景生成

论文：https://arxiv.org/abs/2412.05435

代码：

单位：上交、东方理工、清华、旷视

出版：CVPR 2025

DepthCrafter

题目：DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

名称：DepthCrafter：为开放世界视频生成一致的长深度序列

论文：https://arxiv.org/abs/2409.02095

代码：https://github.com/Tencent/DepthCrafter

单位：腾讯AILab、港科大、腾信ARCLab

出版：CVPR 2025

LiMoE

题目：LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes

名称：LiMoE：汽车场景中的 LiDAR 表征学习器的混合体

论文：https://arxiv.org/abs/2501.04004

代码：https://github.com/Xiangxu-0103/LiMoE

单位：南京大学、国立新加坡、上海AILab

出版：CVPR 2025

MonoTAKD

题目：MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

名称：MonoTAKD：单目 3D 物体检测的教学助理知识蒸馏

论文：https://arxiv.org/abs/2404.04910

代码：https://github.com/hoiliu-0801/MonoTAKD

单位：阳明交通大学、华盛顿大学

出版：CVPR 2025

DiffusionDrive

题目：DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving

名称：DiffusionDrive：端到端自动驾驶的截断扩散模型

论文：https://arxiv.org/abs/2411.15139

代码：https://github.com/hustvl/DiffusionDrive

单位：华科、地平线

出版：CVPR 2025

LLMDet

题目：LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models

名称：LLMDet：在大型语言模型的监督下学习强大的开放词汇对象检测器

论文：https://arxiv.org/abs/2501.18954

代码：https://github.com/iSEE-Laboratory/LLMDet

单位：

出版：CVPR 2025

LSceneLLM

题目：LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences

名称：LSceneLLM：使用自适应视觉偏好增强对大型 3D 场景的理解

论文：https://arxiv.org/abs/2412.01292

代码：

单位：华南理工、腾信RobXLab、东北大学

出版：CVPR 2025

CDSegNet

题目：An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models

名称：具有单步条件扩散模型的端到端稳健点云语义分割网络

论文：https://arxiv.org/abs/2411.16308

代码：https://github.com/QWTforGithub/CDSegNet

单位：南理工、清华、山东大学、上交

出版：CVPR 2025

V2X-R

题目：V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection with Denoising Diffusion

名称：V2X-R：协同 LiDAR-4D 雷达融合，用于去噪扩散的 3D 物体检测

论文：https://arxiv.org/abs/2411.08402

代码：https://github.com/ylwhxht/V2X-R

单位：厦门大学、纵目科技、上交

出版：CVPR 2025

MomAD

题目：Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving

名称：不要摇晃方向盘：端到端自动驾驶中的动量感知规划

论文：https://arxiv.org/abs/2503.03125

代码：https://github.com/adept-thu/MomAD

单位：北交通、地平线、清华

出版：CVPR 2025

FlexDrive

题目：FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering

名称：FlexDrive：实现驾驶场景重建与渲染中的轨迹灵活性

论文：https://arxiv.org/abs/2502.21093

代码：

单位：港中文、中科院自动化所、北航

出版：CVPR 2025

DriveScape

题目：DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation

名称：DriveScape：面向高分辨率可控多视图驾驶视频生成

论文：https://arxiv.org/abs/2409.05463

代码：

单位：商汤、东北大学

出版：CVPR 2025

SplatFlow

题目：SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving

名称：SplatFlow：自动驾驶神经运动流场中的自监督动态高斯散斑

论文：https://arxiv.org/abs/2411.15482

代码：

单位：普渡大学、微软

出版：CVPR 2025

T2SG

题目：T2SG: Traffic Topology Scene Graph for Topology Reasoning in Autonomous Driving

名称：T2SG：用于自动驾驶拓扑推理的交通拓扑场景图

论文：https://arxiv.org/abs/2411.18894

代码：

单位：北京邮电大学

出版：CVPR 2025

GoalFlow

题目：GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

名称：目标流：基于目标驱动的多模态轨迹生成匹配用于端到端自动驾驶

论文：https://arxiv.org/abs/2503.05689

代码：https://github.com/YvanYin/GoalFlow

单位：中国科学院大学、地平线、南京大学、华科、上海AILab

出版：CVPR 2025

VisionPAD

题目：VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving

名称：VisionPAD：以视觉为中心的自动驾驶预训练范例

论文：https://arxiv.org/abs/2411.14716

代码：

单位：深圳智能网络院、港中文深圳、港科大、华为NoahsArkLab

出版：CVPR 2025

DiMA

题目：Distilling Multi-modal Large Language Models for Autonomous Driving

名称：为自动驾驶提炼多模态大型语言模型

论文：https://arxiv.org/abs/2501.09757

代码：

单位：约翰斯·霍普金斯大学、高通AI

出版：CVPR 2025

ReconDreamer

题目：ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration

名称：ReconDreamer：通过在线恢复构建驾驶场景重建的世界模型

论文：https://arxiv.org/abs/2411.19548

代码：https://github.com/GigaAI-research/ReconDreamer

单位：GigaAI、北大、理想、中科院自动化所

出版：CVPR 2025

StreetCrafter

题目：StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models

名称：StreetCrafter：使用可控视频扩散模型进行街景合成

论文：https://arxiv.org/abs/2412.13188

代码：https://github.com/zju3dv/street_crafter

单位：浙大、理想、康奈尔大学

出版：CVPR 2025

DriveDreamer4D

题目：DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation

名称：DriveDreamer4D：世界模型是 4D 驾驶场景表示的有效数据机器

论文：https://arxiv.org/abs/2410.13571

代码：https://github.com/GigaAI-research/DriveDreamer4D

单位：GigaAI、中科院自动化所、理想、北大、TUM

出版：CVPR 2025

DrivingSphere

题目：DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation

名称：DrivingSphere：为闭环仿真构建高保真 4D 世界

论文：https://arxiv.org/abs/2411.11252

代码：https://yanty123.github.io/DrivingSphere/

单位：澳门大学、理想汽车、北理工

出版：CVPR 2025

UniVAD

题目：UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection

名称：UniVAD：一种无需训练的统一小样本视觉异常检测模型

论文：https://arxiv.org/abs/2412.03342

代码：https://github.com/FantasticGNU/UniVAD

单位：中科院自动化所

出版：CVPR 2025

UniScene

题目：UniScene: Unified Occupancy-centric Driving Scene Generation

名称：UniScene：以占用为中心的统一驾驶场景生成

论文：https://arxiv.org/abs/2412.05435

代码：

单位：上交、东方理工、清华、旷视

出版：CVPR 2025

2.E2E端到端

GoalFlow

题目：GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

名称：GoalFlow：端到端自动驾驶中多模式轨迹生成的目标驱动流匹配

论文：https://arxiv.org/abs/2503.05689

代码：https://github.com/YvanYin/GoalFlow

单位：

出版：CVPR 2025

Don'tShakeTheWheel

题目：Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving

名称：不要摇晃方向盘：端到端自动驾驶中的动量感知规划

论文：https://arxiv.org/abs/2503.03125

代码：

单位：北京交通、地平线

出版：CVPR 2025

3.BEV/鸟瞰图

BEVDiffuser

题目：BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance

名称：BEVDiffuser：基于地面实况指导的 BEV 去噪即插即用扩散模型

论文：https://arxiv.org/abs/2502.19694

代码：

单位：博世北美、博世AI

出版：CVPR 2025

ForestLPR

题目：ForestLPR: LiDAR Place Recognition in Forests Attentioning Multiple BEV Density Images

名称：ForestLPR：关注多幅 BEV 密度图像的森林中的 LiDAR 位置识别

论文：https://arxiv.org/abs/2503.04475

代码：

单位：上海交大

出版：CVPR 2025

CorrBEV

题目：CorrBEV:Multi-View 3D Object Detection by Correlation Learning with Multi-modal Prototypes

名称：CorrBEV：通过多模态原型的关联学习实现多视图 3D 物体检测

论文：https://cvpr.thecvf.com/virtual/2025/poster/34617

代码：

单位：

出版：CVPR 2025

4.Det/检测

PO3AD

题目：PO3AD: Predicting Point Offsets toward Better 3D Point Cloud Anomaly Detection

名称：PO3AD：预测点偏移以实现更好的 3D 点云异常检测

论文：https://arxiv.org/abs/2412.12617

代码：

单位：

出版：CVPR 2025

SearchDetect

题目：Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval

名称：搜索和检测：通过网络图像检索进行无需训练的长尾对象检测

论文：https://arxiv.org/abs/2409.18733

代码：

单位：

出版：CVPR 2025

MonoTAKD

题目：MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

名称：MonoTAKD：单目 3D 物体检测的教学助理知识提炼

论文：https://arxiv.org/abs/2404.04910

代码：https://github.com/hoiliu-0801/MonoTAKD

单位：

出版：CVPR 2025

5.Lane/车道线

GLane3D

题目：GLane3D : Detecting Lanes with Graph of 3D Keypoints

名称：GLane3D：用3D关键点图检测车道

论文：https://cvpr.thecvf.com/virtual/2025/poster/33089

代码：

单位：

出版：CVPR 2025 Poster

6.Tracking/跟踪

MambaVLT

题目：MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking

名称：MambaVLT：用于视觉语言跟踪的时间演化多模态状态空间模型

论文：https://arxiv.org/abs/2411.15459

代码：

单位：哈工大深圳、深圳鹏程Lab

出版：CVPR 2025

MITracker

题目：MITracker: Multi-View Integration for Visual Object Tracking

名称：MITracker：用于视觉对象跟踪的多视图集成

论文：https://arxiv.org/abs/2502.20111

代码：

单位：上海科技大学、上海交大

出版：CVPR 2025

GRAE-3DMOT

题目：GRAE-3DMOT: Geometry Relation-Aware Encoder for Online 3D Multi-Object Tracking

名称：GRAE-3DMOT：用于在线 3D 多目标跟踪的几何关系感知编码器

论文：https://cvpr.thecvf.com/virtual/2025/poster/35168

代码：https://github.com/XuM007/MITracker

单位：

出版：CVPR 2025

7.OCC/占用

OccMamba

题目：OccMamba: Semantic Occupancy Prediction with State Space Models

名称：OccMamba：使用状态空间模型进行语义占用预测

论文：https://arxiv.org/abs/2408.09859

代码：

单位：中国科学技术大学、上海AILab、斯坦福

出版：CVPR 2025

GaussianWorld

题目：GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction

名称：GaussianWorld：用于流式 3D 占用率预测的高斯世界模型

论文：https://arxiv.org/abs/2412.10373

代码：https://github.com/zuosc19/GaussianWorld

单位：清华

出版：CVPR 2025

GaussianFormer-2

题目：GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction

名称：GaussianFormer-2：用于高效3D占用预测的概率高斯叠加

论文：https://arxiv.org/abs/2412.04384

代码：https://github.com/huang-yh/GaussianFormer

单位：清华、鉴智机器人

出版：CVPR 2025

VoxelSplat

题目：VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction

名称：VoxelSplat：动态高斯散斑作为占用和流量预测的有效损失

论文：https://cvpr.thecvf.com/virtual/2025/poster/33444

代码：

单位：

出版：CVPR 2025 Poster

8.MAP/地图

InteractionMap

题目：InteractionMap: Improving Online Vectorized HDMap Construction with Interaction

名称：InteractionMap：通过交互改进在线矢量化高清地图构建

论文：https://cvpr.thecvf.com/virtual/2025/poster/34320

代码：

单位：

出版：CVPR 2025

DrivingByTheRules

题目：Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map

名称：遵守规则驾驶：将交通标志规则整合到矢量化高清地图中的基准

论文：https://arxiv.org/abs/2410.23780

代码：

单位：西安交大、阿里巴巴

出版：CVPR 2025

9.Fusion/融合

V2X-R

题目：V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection with Denoising Diffusion

名称：V2X-R：协同 LiDAR-4D 雷达融合，用于去噪扩散的 3D 物体检测

论文：https://arxiv.org/abs/2411.08402

代码：https://github.com/ylwhxht/V2X-R

单位：厦门大学、上海交大

出版：CVPR 2025

RICCARDO

题目：RICCARDO: Radar Hits Prediction and Convolution for Target Detection with Radar-Camera

名称：RICCARDO：雷达命中预测和卷积，通过雷达-摄像头融合实现目标检测

论文：https://cvpr.thecvf.com/virtual/2025/poster/33054

代码：

单位：

出版：CVPR 2025

10.MTL/多任务

TADFormer

题目：TADFormer : Task-Adaptive Dynamic Transformer for Efficient Multi-Task Learning

名称：TADFormer：用于高效多任务学习的任务自适应动态Transformer

论文：https://arxiv.org/abs/2501.04293

代码：

单位：首尔私立大学

出版：CVPR 2025

11.PnC/规控

SceneTAP

题目：SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments

名称：SceneTAP：现实环境中针对视觉语言模型的场景连贯排版对抗规划器

论文：https://arxiv.org/abs/2412.00114

代码：

单位：南洋理工、阿尔伯塔大学、天津大学

出版：CVPR 2025

STVR-SSMP

题目：Spatial-Temporal Visual Representation for Self-Supervised Motion Planning

名称：自监督运动规划的时空视觉表征

论文：https://cvpr.thecvf.com/virtual/2025/poster/32619

代码：

单位：

出版：

DexDiffuser

题目：DexDiffuser: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

名称：DexDiffuser：用于自适应灵巧操作的交互感知扩散规划

论文：https://www.researchgate.net/publication/386211316_DexDiffuser_Interaction-aware_Diffusion_Planning_for_Adaptive_Dexterous_Manipulation

代码：

单位：

出版：CVPR 2025

12.Calib/标定

AutoCalib

题目：RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration Network

名称：RC-AutoCalib：端到端雷达摄像头自动校准网络

论文：https://cvpr.thecvf.com/virtual/2025/poster/35011

代码：

单位：

出版：CVPR 2025

本文仅做学术分享，如有侵权，请联系删文。

3D视觉硬件

3D视觉学习圈子

3D视觉全栈学习课程：www.3dcver.com

3D视觉交流群成立啦

点这里👇关注我，记得标星哦～

一键三连「分享」、「点赞」和「在看」

3D视觉科技前沿进展日日相见 ~