IntentNet: Learning to Predict Intention from Raw Sensor Data

最新推荐文章于 2025-03-08 18:22:03 发布

64318@461

最新推荐文章于 2025-03-08 18:22:03 发布

阅读量732

点赞数

分类专栏：预测感知文章标签：自动驾驶深度学习

本文链接：https://blog.csdn.net/weixin_56836871/article/details/120938020

版权

IntentNet是一种结合3D LiDAR点云和动态环境地图的一阶段检测和预测网络，旨在预测车辆行为和未来轨迹。通过早期融合多个LiDAR扫描和改进的损失函数，它能够理解交通约束并预测多种意图。网络结构包括用于检测、意图分类和运动估计的特定分支。损失函数考虑了未来的不确定性，并使用了时间折扣因子。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

动机

In this paper we develop a one-stage detector and forecaster that exploits both 3D point clouds produced by a LiDAR sensor as well as dynamic maps of the environment.
we exploit 3D LiDAR point clouds and dynamic HD maps containing semantic elements such as lanes, intersections and traffic lights.

our IntentNet is a fully-convolutional neural network that outputs three types of variables in a single forward pass corresponding to: detection scores for vehicle and background classes, high level action probabilities corresponding to discrete intention, and bounding box regressions in the current and future time steps to represent the intended trajectory
IntentNet is inspired by FaF [4], which performs joint detection and future prediction.

优点

(i) a more suitable architecture based on an early fusion of a larger number of previous LiDAR sweeps,
(ii) a parametrization of the map that allows our model to understand traffific constraints for all vehicles at once and
(iii) an improved loss function that includes a temporal discount factor to
account for the inherent ambiguity of the future.

Input parametrization

3D point cloud: we represent point clouds in bird’s eye view (BEV) as a 3D tensor, treating height as our channel dimension.
Dynamic maps:
We form a BEV representation of our maps by rasterization。
We represent each semantic component in the map with a binary map (i.e., 1 or -1). Roads and intersections are represented as fifilled polygons covering the whole drivable surface. Lane boundaries are parametrized as poly-lines representing the left and right boundaries of lane segments.
In total, there are 17 binary masks used as map features, resulting in a 3D tensor that represents the map

最低0.47元/天解锁文章