超全！CVPR 2024自动驾驶有哪些值得关注的热点方向？

最新推荐文章于 2024-09-17 21:28:52 发布

自动驾驶之心

最新推荐文章于 2024-09-17 21:28:52 发布

阅读量452

点赞数 1

文章标签：自动驾驶人工智能机器学习

原文链接：https://mp.weixin.qq.com/s?__biz=Mzg2NzUxNTU1OA==&mid=2247594844&idx=1&sn=9e9043b6531b439318036e823c128104&chksm=cfdc0bdc9a41e68d3ba5a3b2b8c33d6a2ca8e592e0ccbd38351f6d031cd0ccdb7dd70a4b24e9&scene=126&sessionid=0

版权

点击下方卡片，关注“自动驾驶之心”公众号

戳我-> 领取自动驾驶近15个方向学习路线

>>点击进入→自动驾驶之心『CVPR2024』技术交流群

编辑 | 自动驾驶之心

CVPR2024的工作陆续放出来了，自动驾驶Daily也一直再跟进，今天为大家盘点下会上优秀的工作，涉及端到端自动驾驶、大语言模型、Occupancy、SLAM、车道线检测、3D检测、协同感知、点云处理、MOT、毫米波雷达、Nerf、Gaussian Splatting等方向；

这里也推荐下我们的CVPR2024仓库链接：https://github.com/autodriving-heart/CVPR-2024-Papers-Autonomous-Driving，欢迎收藏点赞，第一时间掌握最新内容。

1) End to End | 端到端自动驾驶

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

Paper: https://arxiv.org/pdf/2312.03031.pdf
Code: https://github.com/NVlabs/BEV-Planner

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Paper: https://arxiv.org/pdf/2312.17655.pdf
Code: https://github.com/OpenDriveLab/ViDAR

PlanKD: Compressing End-to-End Motion Planner for Autonomous Driving

Paper: https://arxiv.org/pdf/2403.01238.pdf
Code: https://github.com/tulerfeng/PlanKD

VLP: Vision Language Planning for Autonomous Driving

Paper：https://arxiv.org/abs/2401.05577

2）LLM Agent | 大语言模型智能体

ChatSim: Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration

Paper: https://arxiv.org/pdf/2402.05746.pdf
Code: https://github.com/yifanlu0227/ChatSim

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

Paper: https://arxiv.org/pdf/2312.07488.pdf
Code: https://github.com/opendilab/LMDrive

MAPLM: A Real-World Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding

Code: https://github.com/LLVM-AD/MAPLM

One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models

Paper：https://arxiv.org/pdf/2403.01849.pdf
Code：https://github.com/TreeLLi/APT

PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

Paper：https://arxiv.org/pdf/2403.02781

RegionGPT: Towards Region Understanding Vision Language Model

Paper：https://arxiv.org/pdf/2403.02330

3）SSC: Semantic Scene Completion | 语义场景补全

Symphonize 3D Semantic Scene Completion with Contextual Instance Queries

Paper: https://arxiv.org/pdf/2306.15670.pdf
Code: https://github.com/hustvl/Symphonies

PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness

Paper: https://arxiv.org/pdf/2312.02158.pdf
Code: https://github.com/astra-vision/PaSCo

4）OCC: Occupancy Prediction | 占用感知

SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

Paper: https://arxiv.org/pdf/2311.12754.pdf
Code: https://github.com/huang-yh/SelfOcc

Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications

Paper: https://arxiv.org/pdf/2311.17663.pdf
Code: https://github.com/haomo-ai/Cam4DOcc

PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

Paper: https://arxiv.org/pdf/2306.10013.pdf
Code: https://github.com/Robertwyq/PanoOcc

5）车道线检测

Lane2Seq: Towards Unified Lane Detection via Sequence Generation

Paper：https://arxiv.org/abs/2402.17172

6）Pre-training | 预训练

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

Paper: https://arxiv.org/pdf/2310.08370.pdf
Code: https://github.com/Nightmare-n/UniPAD

7）AIGC | 人工智能内容生成

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

Paper: https://arxiv.org/pdf/2311.16813.pdf
Code: https://github.com/wenyuqing/panacea

SemCity: Semantic Scene Generation with Triplane Diffusion

Paper:
Code: https://github.com/zoomin-lee/SemCity

BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation

Paper: https://arxiv.org/pdf/2312.02136.pdf
Code: https://github.com/zqh0253/BerfScene

8）3D Object Detection | 三维目标检测

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection

Paper: https://arxiv.org/pdf/2312.08371.pdf
Code: https://github.com/KuanchihHuang/PTT

VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection

Code: https://github.com/skmhrk1209/VSRD

CaKDP: Category-aware Knowledge Distillation and Pruning Framework for Lightweight 3D Object Detection

Code: https://github.com/zhnxjtu/CaKDP

CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images

Paper：https://arxiv.org/abs/2403.04198
Code：https://github.com/SerCharles/CN-RMA

UniMODE: Unified Monocular 3D Object Detection

Paper：https://arxiv.org/abs/2402.18573

Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

Paper：https://arxiv.org/abs/2403.06093
Code：https://github.com/nullmax-vision/QAF2D

SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection

Paper：https://arxiv.org/abs/2403.05817
Code：https://github.com/zhanggang001/HEDNet

RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features

Paper：https://arxiv.org/pdf/2403.05061

9）Stereo Matching | 双目立体匹配

MoCha-Stereo: Motif Channel Attention Network for Stereo Matching

Code: https://github.com/ZYangChen/MoCha-Stereo

Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching

Paper：https://arxiv.org/abs/2402.19270
Code：https://github.com/DFSDDDDD1199/ICGNet

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching

Paper：https://arxiv.org/abs/2403.00486
Code：https://github.com/Windsrain/Selective-Stereo

10）Cooperative Perception | 协同感知

RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception

Code: https://github.com/ryhnhao/RCooper

11）SLAM

SNI-SLAM: SemanticNeurallmplicit SLAM

Paper: https://arxiv.org/pdf/2311.11016.pdf

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

Paper：https://arxiv.org/abs/2402.19231
Code：https://github.com/Lu-Feng/CricaVPR

12）Scene Flow Estimation | 场景流估计

DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement

Paper: https://arxiv.org/pdf/2311.17456.pdf
Code: https://github.com/IRMVLab/DifFlow3D

3DSFLabeling: Boosting 3D Scene Flow Estimation by Pseudo Auto Labeling

Paper: https://arxiv.org/pdf/2402.18146.pdf
Code: https://github.com/jiangchaokang/3DSFLabelling

Regularizing Self-supervised 3D Scene Flows with Surface Awareness and Cyclic Consistency

Paper: https://arxiv.org/pdf/2312.08879.pdf
Code: https://github.com/vacany/sac-flow

13）Point Cloud | 点云

Point Transformer V3: Simpler, Faster, Stronger

Paper: https://arxiv.org/pdf/2312.10035.pdf
Code: https://github.com/Pointcept/PointTransformerV3

Rethinking Few-shot 3D Point Cloud Semantic Segmentation

Paper: https://arxiv.org/pdf/2403.00592.pdf
Code: https://github.com/ZhaochongAn/COSeg

PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation

Code: https://github.com/JinfengX/PointCloudPDF

14) Efficient Network

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

Paper: https://arxiv.org/pdf/2401.06197.pdf

RepViT: Revisiting Mobile CNN From ViT Perspective

Paper: https://arxiv.org/pdf/2307.09283.pdf
Code: https://github.com/THU-MIG/RepViT

15) Segmentation

OMG-Seg: Is One Model Good Enough For All Segmentation?

Paper: https://arxiv.org/pdf/2401.10229.pdf
Code: https://github.com/lxtGH/OMG-Seg

Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

Paper: https://arxiv.org/pdf/2312.04265.pdf
Code: https://github.com/w1oves/Rein

SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation

Paper：https://arxiv.org/abs/2311.15707

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

Paper：https://arxiv.org/abs/2311.15537

Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning

Paper：https://arxiv.org/abs/2403.06122

16）Radar | 毫米波雷达

DART: Doppler-Aided Radar Tomography

Code: https://github.com/thetianshuhuang/dart

17）Nerf与Gaussian Splatting

Dynamic LiDAR Re-simulation using Compositional Neural Fields

Paper: https://arxiv.org/pdf/2312.05247.pdf
Code: https://github.com/prs-eth/Dynamic-LiDAR-Resimulation

GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding

Paper：https://arxiv.org/abs/2403.03608

NARUTO: Neural Active Reconstruction from Uncertain Target Observations

Paper：https://arxiv.org/abs/2402.18771

DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization

Paper：https://arxiv.org/abs/2403.06912

S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes

Paper：https://arxiv.org/pdf/2403.06205

SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

Paper：https://arxiv.org/pdf/2403.05087

DaReNeRF: Direction-aware Representation for Dynamic Scenes

Paper：https://arxiv.org/pdf/2403.02265

18）MOT: Muti-object Tracking | 多物体跟踪

Delving into the Trajectory Long-tail Distribution for Muti-object Tracking

Code: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT

DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking

Paper：https://arxiv.org/abs/2403.02767

19）Multi-label Atomic Activity Recognition

Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes

Paper: https://arxiv.org/pdf/2311.17948.pdf
Code: https://github.com/HCIS-Lab/Action-slot

20) Motion Prediction | 运动预测

SmartRefine: An Scenario-Adaptive Refinement Framework for Efficient Motion Prediction

Code: https://github.com/opendilab/SmartRefine

21）卷积网络相关

CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective

Paper：https://arxiv.org/abs/2403.06676
Code：https://github.com/snskysk/CAM-Back-Again

投稿作者为『自动驾驶之心知识星球』特邀嘉宾，欢迎加入交流！

① 全网独家视频课程

BEV感知、毫米波雷达视觉融合、多传感器标定、多传感器融合、多模态3D目标检测、车道线检测、轨迹预测、在线高精地图、世界模型、点云3D目标检测、目标跟踪、Occupancy、cuda与TensorRT模型部署、大模型与自动驾驶、Nerf、语义分割、自动驾驶仿真、传感器部署、决策规划、轨迹预测等多个方向学习视频（扫码即可学习）

网页端官网：www.zdjszx.com

② 国内首个自动驾驶学习社区

国内最大最专业，近2700人的交流社区，已得到大多数自动驾驶公司的认可！涉及30+自动驾驶技术栈学习路线，从0到一带你入门自动驾驶感知（2D/3D检测、语义分割、车道线、BEV感知、Occupancy、多传感器融合、多传感器标定、目标跟踪）、自动驾驶定位建图（SLAM、高精地图、局部在线地图）、自动驾驶规划控制/轨迹预测等领域技术方案、大模型、端到端等，更有行业动态和岗位发布！欢迎扫描下方二维码，加入自动驾驶之心知识星球，这是一个真正有干货的地方，与领域大佬交流入门、学习、工作、跳槽上的各类难题，日常分享论文+代码+视频

③【自动驾驶之心】技术交流群

自动驾驶之心是首个自动驾驶开发者社区，聚焦2D/3D目标检测、语义分割、车道线检测、目标跟踪、BEV感知、多模态感知、Occupancy、多传感器融合、transformer、大模型、在线地图、点云处理、端到端自动驾驶、SLAM与高精地图、深度估计、轨迹预测、NeRF、Gaussian Splatting、规划控制、模型部署落地、cuda加速、自动驾驶仿真测试、产品经理、硬件配置、AI求职交流等方向。扫码添加汽车人助理微信邀请入群，备注：学校/公司+方向+昵称（快速入群方式）