告别关于基于学习的车辆运动规划的误解（PDM-Closed）

最新推荐文章于 2025-02-25 16:02:06 发布

真诚的灰灰

最新推荐文章于 2025-02-25 16:02:06 发布

阅读量5k

点赞数 23

文章标签：学习汽车自动驾驶

本文链接：https://blog.csdn.net/jch924583667/article/details/141856905

版权

Parting with Misconceptions about Learning-based Vehicle Motion Planning

告别关于基于学习的车辆运动规划的误解（PDM-Closed）

在这里插入图片描述
https://github.com/autonomousvision/tuplan_garage

Abstract

The release of nuPlan marks a new era in vehicle motion planning research, offering the first large-scale real-world dataset and evaluation schemes requiring both precise short-term planning and long-horizon ego-forecasting. Existing systems struggle to simultaneously meet both requirements. Indeed, we find that these tasks are fundamentally misaligned and should be addressed independently. We further assess the current state of closed-loop planning in the field, revealing the limitations of learning-based methods in complex real-world scenarios and the value of simple rule-based priors such as centerline selection through lane graph search algorithms. More surprisingly, for the open-loop sub-task, we observe that the best results are achieved when using only this centerline as scene context (i.e., ignoring all information regarding the map and other agents). Combining these insights, we propose an extremely simple and efficient planner which outperforms an extensive set of competitors, winning the nuPlan planning challenge 2023.
nuPlan的发布标志着车辆运动规划研究的新时代，它提供了第一个大规模的真实世界数据集和评估方案，这些方案要求同时进行精确的短期规划和长期的自我预测。现有的系统很难同时满足这两个要求。事实上，我们发现这些任务在根本上是不一致的，应该独立处理。我们进一步评估了该领域当前的闭环规划状态，揭示了基于学习方法在复杂真实世界场景中的局限性，以及简单的基于规则的先验（如通过车道图搜索算法选择中心线）的价值。更令人惊讶的是，对于开环子任务，我们观察到当只使用这个中心线作为场景上下文时（即忽略所有关于地图和其他代理的信息），能够获得最佳结果。结合这些见解，我们提出了一个非常简单高效的规划器，它超越了一系列竞争对手，赢得了2023年nuPlan规划挑战。
在这里插入图片描述

1 Introduction

Despite learning-based systems’ success in vehicle motion planning research [1, 2, 3, 4, 5], a lack of standardized large-scale datasets for benchmarking holds back their transfer from research to applications [6, 7, 8]. The recent release of the nuPlan dataset and simulator [9], a collection of 1300 hours of real-world vehicle motion data, has changed this, enabling the development of a new generation of learned motion planners, which promise reduced manual design effort and improved scalability. Equipped with this new benchmark, we perform the first rigorous empirical analysis on a large-scale, open-source, and data-driven simulator for vehicle motion planning, including a comprehensive set of state-of-the-art (SoTA) planners [10, 11, 12] using the official metrics. Our analysis yields several surprising findings:
尽管基于学习的系统在车辆运动规划研究中取得了成功[1, 2, 3, 4, 5]，但缺乏标准化的大规模数据集进行基准测试，阻碍了它们从研究到应用的转化[6, 7, 8]。最近发布的nuPlan数据集和模拟器[9]，包含了1300小时的真实世界车辆运动数据，改变了这一现状，使得新一代学习型运动规划器的开发成为可能，这些规划器承诺减少手动设计工作量并提高可扩展性。借助这一新基准，我们在大规模、开源且数据驱动的车辆运动规划模拟器上进行了首次严格的实证分析，包括使用官方指标对一系列最先进的（SoTA）规划器[10, 11, 12]进行全面评估。我们的分析得出了几个令人惊讶的发现：
Open- and closed-loop evaluation are misaligned. Most learned planners are trained through the supervised learning task of forecasting the ego vehicle’s future motion conditioned on a desired goal location. We refer to this setting as ego-forecasting [2, 3, 13, 14]. In nuPlan, planners can be evaluated in two ways: (1) in open-loop evaluation, which measures ego-forecasting accuracy using distance-based metrics or (2) in closed-loop evaluation, which assesses the actual driving performance in simulation with metrics such as progress or collision rates. Open-loop evaluation lacks dynamic feedback and can have little correlation with closed-loop driving, as previously shown on the simplistic CARLA simulator [15, 16]. Our primary contribution lies in uncovering a negative correlation between both evaluation schemes. Learned planners excel at ego-forecasting but struggle to make safe closed-loop plans, whereas rule-based planners exhibit the opposite trend.
开环和闭环评估是不一致的。大多数学习型规划器是通过监督学习任务来训练的，即预测以期望的目标位置为条件的自车未来运动。我们称这种设置为自我预测[2, 3, 13, 14]。在nuPlan中，规划器可以通过两种方式进行评估：(1) 开环评估，使用基于距离的指标测量自我预测的准确性；(2) 闭环评估，通过模拟中的实际驾驶性能来评估，如进度或碰撞率等指标。开环评估缺乏动态反馈，可能与闭环驾驶的相关性很小，正如之前在简单的CARLA模拟器上所展示的[15, 16]。我们的主要贡献在于揭示了两种评估方案之间的负相关性。学习型规划器在自我预测方面表现出色，但在制定安全的闭环计划方面却遇到困难，而基于规则的规划器则表现出相反的趋势。
Rule-based planning generalizes. We surprisingly find that an established rule-based planning baseline from over twenty years ago [17] surpasses all SoTA learning-based methods in terms of closed-loop evaluation metrics on our benchmark. This contradicts the prevalent motivating claim used in most research on learned planners that rule-based planning faces difficulties in generalization.This was previously only verified on simpler benchmarks [4, 10, 11]. As a result, most current work on learned planning only compares to other learned methods, ignoring rule-based baselines [3, 5, 18].
基于规则的规划具有泛化性。我们惊讶地发现，一个建立于二十多年前的基于规则的规划基线[17]在闭环评估指标方面超过了我们基准测试中的所有最先进的基于学习的方法。这与大多数关于学习型规划器研究中普遍使用的动机声明相矛盾，即基于规则的规划在泛化方面面临困难。这之前只在更简单的基准测试上得到了验证[4, 10, 11]。因此，目前大多数关于学习型规划的工作只与其他学习方法进行比较，忽略了基于规则的基线[3, 5, 18]。
A centerline is all you need for ego-forecasting. We implement a naïve learned planning baseline which does not incorporate any input about other agents in the scene and merely extrapolates the ego state given a centerline representation of the desired route. This baseline sets the new SoTA for open-loop evaluation on our benchmark. It does not require intricate scene representations (e.g. lane graphs, vectorized maps, rasterized maps, tokenized objects), which have been the central subject of inquiry in previous work [10, 11, 12]. None of these prior studies considered a simple centerline-only representation as a baseline, perhaps due to its extraordinary simplicity.
对于自我预测，中心线就足够了。我们实现了一个简单的学习型规划基线，它不包含场景中其他代理的任何输入，只是根据期望路线的中心线表示，对自车状态进行外推。这个基线为我们的基准测试中的开环评估设定了新的SoTA（State of the Art）。它不需要复杂的场景表示（例如车道图、矢量化地图、光栅化地图、标记化对象），这些一直是之前研究的核心主题[10, 11, 12]。这些先前的研究没有考虑仅使用中心线表示作为基线，可能是因为它的简单性非同寻常。

但是也就意味着自车只能加减速，不能做横向的换道、绕行等操作

Our contributions are as follows: (1) We demonstrate and analyze the misalignment between openand closed-loop evaluation schemes in planning. (2) We propose a lightweight extension of IDM [17] with real-time capability that achieves state-of-the-art closed-loop performance. (3) We conduct experiments with an open-loop planner, which is only conditioned on the current dynamic state and a centerline, showing that it outperforms sophisticated models with complex input representations. (4) By combining both models into a hybrid planner, we establish a simple baseline that outperformed 24 other, often learning-based, competing approaches and claimed victory in the nuPlan challenge 2023.
我们的贡献如下：

我们展示了规划中开环和闭环评估方案之间的不一致性，并进行了分析。
我们提出了一个轻量级的IDM[17]扩展，具有实时能力，实现了最先进的闭环性能。
我们使用一个仅依赖于当前动态状态和中心线的开环规划器进行实验，表明它超越了具有复杂输入表示的复杂模型。
通过将这两种模型结合到一个混合规划器中，我们建立了一个简单的基线，它超越了其他 24 种通常基于学习的竞争方法，并在2023年nuPlan挑战中获胜。

2 Related Work

Rule-based planning. Rule-based planners offer a structured, interpretable decision-making framework [17, 19, 20, 21, 22, 23, 24, 25, 26]. They employ explicit rules to determine an autonomous vehicle’s behavior (e.g., brake when an object is straight ahead). A seminal approach in rule-based planning is the Intelligent Driver Model (IDM [17]), which is designed to follow a leading vehicle in traffic while maintaining a safe distance. There exist extensions of IDM [27] which focus on enabling lane changes on highways. However, this is not the goal of our work. Instead, we extend IDM by executing multiple policies with different hyperparameters, and scoring them to select the best option.
基于规则的规划。 基于规则的规划器提供了一个结构化、可解释的决策框架[17, 19, 20, 21, 22, 23, 24, 25, 26]。它们使用明确规则来确定自动驾驶车辆的行为（例如，当正前方有物体时刹车）。在基于规则的规划中，一个开创性的方法是智能驾驶模型（IDM [17]），它旨在在交通中跟随前车，同时保持安全距离。存在IDM的扩展[27]，专注于在高速公路上启用变道。然而，这并不是我们工作的目标。相反，我们通过执行具有不同超参数的多个策略，并对它们进行评分以选择最佳选项，来扩展IDM。
Prior work also combines rule-based decision-making with learned components, e.g., with learned agent forecasts [28], affordance indicators [23, 24], cost-based imitation learning [4, 29, 30, 31, 32], or learning-based planning with rule-based safety filtering [33]. These hybrid planners often forecast future environmental states, enabling informed and contingent driving decisions. This forecasting can either be agent-centric [34, 35, 36], where trajectories are determined for each actor, or environmentcentric [4, 31, 30, 29, 37, 38], involving occupancy or cost maps. Additionally, forecasting can be conditioned on the ego-plan, modeling the ego vehicle’s influence on the scene’s future [39, 40, 41, 42]. We employ an agent-centric forecasting module that is considerably simpler than existing methods, allowing for its use as a starting point in the newly released nuPlan framework.
先前的工作 也将基于规则的决策与学习组件相结合，例如，结合学习型代理预测[28]、可负担性指标[23, 24]、基于成本的模仿学习[4, 29, 30, 31, 32]，或基于规则的安全过滤的基于学习规划[33]。这些混合规划器通常预测未来环境状态，从而实现知情和有条件的驾驶决策。这种预测可以是代理中心的[34, 35, 36]，即为每个参与者确定轨迹，或者是环境中心的[4, 31, 30, 29, 37, 38]，涉及占用或成本地图。此外，预测可以基于自我规划，模拟自车对场景未来的影响[39, 40, 41, 42]。我们采用了一个代理中心的预测模块，它比现有方法简单得多，允许其作为新发布的nuPlan框架中的起点使用。
Ego-forecasting. Unlike predictive planning, ego-forecasting methods use observational data to directly determine the future trajectory. Ego-forecasting approaches include both end-to-end methods [43] that utilize LiDAR scans [44, 45], RGB images [46, 47, 48, 49, 14, 50] or both [13, 51, 5, 52], as well as modular methods involving lower-dimensional inputs like bird’s eye view (BEV) grids or state vectors [24, 53, 11, 54, 55, 56]. A concurrent study introduces a naive MLP inputting the current dynamic state, yielding competitive ego-forecasting results on the nuScenes dataset [57] with no scene context input [58]. Our findings complement these results, differing by evaluating long-term (8s) ego-forecasting in the challenging 2023 nuPlan challenge scenario test distribution [9]. We show that in this setting, completely removing scene context (as in [58]) is harmful, whereas a simple centerline representation of the context is sufficient for strong open-loop performance.
自我预测（Ego-forecasting）。 与预测性规划不同，自我预测方法使用观测数据直接确定未来轨迹。自我预测方法包括端到端方法[43]，这些方法利用激光雷达扫描[44, 45]、RGB图像[46, 47, 48, 49, 14, 50]或两者兼有[13, 51, 5, 52]，以及涉及更低维度输入的模块化方法，如鸟瞰图（BEV）网格或状态向量[24, 53, 11, 54, 55, 56]。一项并行研究引入了一个简单的多层感知器（MLP），输入当前的动态状态，在nuScenes数据集[57]上取得了有竞争力的自我预测结果，而没有场景上下文输入[58]。我们的发现补充了这些结果，不同之处在于评估了2023年nuPlan挑战场景测试分布[9]中长期的（8秒）自我预测。我们展示了在这种设置中，完全移除场景上下文（如[58]中）是有害的，而一个简单的中心线表示的上下文对于强大的开环性能来说是足够的。

3 Ego-forecasting and Planning are Misaligned

In this section, we provide the relevant background regarding the data-driven simulator nuPlan [9]. We describe two baselines for a preliminary experiment to demonstrate that although ego-forecasting and planning are often considered related tasks, they are not well-aligned given their definitions on nuPlan. Improvements in one task can often lead to degradation in the other.
在本节中，我们提供了有关数据驱动模拟器nuPlan[9]的相关背景。我们描述了两个基线，用于初步实验，以证明尽管自我预测和规划通常被认为是相关任务，但根据nuPlan上的定义，它们并没有很好地对齐。一个任务的改进往往会导致另一个任务的退化。

3.1 Background

nuPlan. The nuPlan simulator is the first publicly available real-world planning benchmark and enables rapid prototyping and testing of motion planners. nuPlan constructs a simulated environment as closely as possible to a real-world driving setting through data-driven simulation [59, 60, 61, 62, 63, 64, 65]. This method extracts road maps, traffic patterns, and object properties (positions, orientations, and speeds) from a pre-recorded dataset consisting of 1,300 hours of real-world driving. These elements are then used to initialize scenarios, which are 15-second simulations employed to assess open-loop and closed-loop driving performance. Hence, in simulation, our methods rely on access to detailed HD map information and ground-truth perception. i.e., no localization errors, map imperfections, or misdetections are considered. In open-loop simulation, the entire log is merely replayed (for both the ego vehicle and other actors). Conversely, in closed-loop simulation, the ego vehicle operates under the control of the planner being tested. There are two versions of closed-loop simulation: non-reactive, where all other actors are replayed along their original trajectory, and reactive, where other vehicles employ an IDM planner [17], which detail in the following.
nuPlan模拟器。 nuPlan模拟器是第一个公开可用的真实世界规划基准，它能够快速原型制作和测试运动规划器。nuPlan通过数据驱动的模拟[59, 60, 61, 62, 63, 64, 65]尽可能地构建一个接近真实驾驶环境的模拟环境。这种方法从包含1300小时真实驾驶的预录数据集中提取道路地图、交通模式和对象属性（位置、方向和速度）。然后使用这些元素来初始化场景，这些场景是用于评估开环和闭环驾驶性能的15秒模拟。因此，在模拟中，我们的方法依赖于访问详细的高精地图信息和地面真实感知，即不考虑定位误差、地图不完善或误检。
在开环模拟中，整个日志仅仅是重播的（对于自车和其他参与者都是如此）。相反，在闭环模拟中，自车在被测试的规划器的控制下运行。闭环模拟有两个版本：非反应性的，所有其他参与者都沿着他们的原始轨迹重播；反应性的，其他车辆采用 IDM 规划器[17]，下面将详细说明。
Metrics. nuPlan offers three official evaluation metrics: open-loop score (OLS), closed-loop score non-reactive (CLS-NR), and closed-loop score reactive (CLS-R). Although CLS-NR and CLS-R are computed identically, they differ in background traffic behavior. Each score is a weighted average of sub-scores that are multiplied by a set of penalties. In OLS, the sub-scores account for displacement and heading errors, both average and final, over an extended period (8 seconds). Moreover, if the prediction error is above a threshold, the penalty results in an OLS score of zero for that scenario. Similarly, sub-scores in CLS comprise time-to-collision, progress along the experts’ route, speedlimit compliance, and comfort. Multiplicative CLS penalties are at-fault collisions, driveable area or driving direction infringements and not making progress. These penalties result in substantial CLS reductions, mostly to a zero scenario score, e.g., when colliding with a vehicle. Notably, the CLS primarily relies on short-term actions rather than on consistent long-term planning. All scores (incl. OLS/CLS) range from 0 to 100, where higher scores are better. Given the elaborate composition of nuPlan’s metrics, we refer to the supplementary material for a detailed description.
指标。 nuPlan提供了三个官方评估指标：开环得分（OLS）、非反应性闭环得分（CLS-NR）和反应性闭环得分（CLS-R）。尽管CLS-NR和CLS-R的计算方式相同，但它们在背景交通行为上有所不同。每个得分都是由一组罚分相乘的子得分的加权平均值。在OLS中，子得分考虑了在延长期间（8秒）内的平均和最终位移误差和航向误差。此外，如果预测误差超过阈值，罚分将导致该场景的OLS得分为零。同样，CLS中的子得分包括碰撞时间、沿着专家路线的进度、遵守速度限制和舒适度。乘法CLS罚分包括有责任的碰撞、可行驶区域或驾驶方向的侵犯以及没有取得进展。这些罚分会导致CLS大幅减少，大多数情况下会降至零分，例如，当与车辆发生碰撞时。值得注意的是，CLS主要依赖于短期行动，而不是一致的长期规划。所有得分（包括OLS/CLS）的范围是0到100，分数越高越好。鉴于nuPlan指标的复杂组成，我们参考补充材料以获取详细描述。
Intelligent Driver Model. The simple planning baseline IDM [17] not only simulates the non-ego vehicles in the CLS-R evaluation of nuPlan, but also serves as a baseline for the ego-vehicle’s planning. The nuPlan map is provided as a graph, with centerline segments functioning as nodes. After choosing a set of such nodes to follow via a graph search algorithm, IDM infers a longitudinal trajectory along the selected centerline. Given the current longitudinal position x, velocity v, and distance to the leading vehicle s along the centerline, it iteratively applies the following policy to calculate a longitudinal acceleration:
智能驾驶模型（Intelligent Driver Model, IDM）。 简单的规划基线IDM[17]不仅在nuPlan的CLS-R评估中模拟非自车（非ego车辆），而且也作为自车规划的基线。nuPlan地图被提供为一个图，其中中心线段作为节点。通过图搜索算法选择一组这样的节点来跟随后，IDM沿选定的中心线推断纵向轨迹。给定当前的纵向位置( x )、速度( v )，以及沿中心线与前车的距离( s )，它迭代地应用以下策略来计算纵向加速度：
在这里插入图片描述
The acceleration limit a, target speed v0, safety margin s∗, and exponent δ are manually selected. Intuitively, the policy uses an acceleration a unless the velocity is already close to v0 or the leading vehicle is at a distance of only s∗. Additional details and our exact hyper-parameter choices can be found in the supplementary material.
加速度限制 $a$ ，目标速度 $v_0$ ，安全余量 $s^*$ ，以及指数 $\delta$ 是手动选择的。直观地说，除非速度已经接近 $v_0$ 或前车距离仅为 $s^*$ ，否则该策略会使用加速度 $a$ 。有关更多详细信息和我们确切的超参数选择，可以在补充材料中找到。

3.2 Misalignment

Centerline-conditioned ego-forecasting. We now propose the Predictive Driver Model (Open), i.e., PDM-Open, which is a straightforward multi-layer perceptron (MLP) designed to predict future waypoints. The inputs to this MLP are the centerline © extracted by IDM and the ego history (h). To accommodate the high speeds (reaching up to 15 m/s) and ego-forecasting horizons (extending to 8 seconds) observed in nuPlan, the centerline is sampled with a resolution of 1 meter up to a length of 120 meters. Meanwhile, the ego history incorporates the positions, velocities, and accelerations of the vehicle over the previous two seconds, sampled at a rate of 5Hz. Both c and h are linearly projected to feature vectors of size 512, concatenated, and input to the MLP ϕOpen which has two 512-dimensional hidden layers. The output are the future waypoints for an 8-second horizon, spaced 0.5 seconds apart, expressed as wOpen = ϕOpen(c, h). The model is trained using an L1 loss on our training dataset of 177k samples (described in Section 4). By design, PDM-Open is considerably simpler than existing learned planners [10, 12].
中心线条件的自我预测。 我们现在提出了预测性驾驶模型（开环），即PDM-Open，这是一个简单的多层感知器（MLP），旨在预测未来的航点。这个 MLP 的输入是由 IDM 提取的中心线（c）和自车历史（h）。为了适应nuPlan中观察到的高速（可达15米/秒）和自我预测范围（扩展到8秒），中心线以1米的分辨率采样，长度达120米。同时，自车历史包括过去两秒内车辆的位置、速度和加速度，以5Hz的速率采样。c 和 h 都被线性投影到大小为512的特征向量，连接后输入到具有两个512维隐藏层的MLP $ϕ_{Open}$ 。输出是8秒范围内的未来航点，每隔0.5秒一个，表示为 $w_{\text{Open}} = \phi_{\text{Open}}(c, h)$ 。模型是使用我们 177k 样本的训练数据集（在第4节中描述）进行 L1 损失训练的。按设计，PDM-Open比现有的学习型规划器[10, 12]要简单得多。
OLS vs. CLS. In Table 1, we benchmark the IDM and PDM-Open baselines using the nuPlan metrics. We present two IDM variants with different maximum acceleration values (the default a = 1.0ms−2 and a = 0.1ms−2) and four PDM-Open variants based on different inputs. We observe that reducing IDM’s acceleration improves OLS but negatively impacts CLS. While IDM demonstrates strong closed-loop performance, PDM-Open outperforms IDM in open-loop even if it only uses the current ego state as input (first row). The past ego states (History) only yield little improvement and lead to a drop in CLS. Most importantly, adding the centerline significantly contributes to ego-forecasting performance. A clear trade-off between CLS and OLS indicates a misalignment between the goals of ego-forecasting and planning. This sort of inverse correlation on nuPlan is unanticipated, considering the increasing use of ego-forecasting in current planning literature [3, 10, 12, 11]. While ego-forecasting is not necessary for driving performance, the nuPlan challenge requires both a high OLS and CLS.
OLS与CLS。 在表1 中，我们使用 nuPlan 指标对 IDM 和 PDM-Open 基线进行了基准测试。我们展示了两种不同最大加速度值的 IDM 变体（默认的 $\text{ms}^{-2} 和 a = 0.1 \text{ms}^{-2}$ ）以及四种基于不同输入的 PDM-Open 变体。我们观察到降低 IDM 的加速度会提高OLS，但对 CLS 产生负面影响。尽管 IDM 展示了强大的闭环性能，但即使 PDM-Open 只使用当前的自车状态作为输入（第一行），它在开环中也优于 IDM 。过去的自车状态（History）只带来了微小的改进，并导致 CLS 下降。最重要的是，添加中心线显著有助于自我预测性能。CLS 和 OLS 之间的明显权衡表明自我预测和规划的目标之间存在不一致。在 nuPlan 上这种逆相关是出乎意料的，考虑到自我预测在当前规划文献中的日益使用[3, 10, 12, 11]。虽然自我预测对驾驶性能并非必需，但 nuPlan 挑战要求同时具有高 OLS 和 CLS。
在这里插入图片描述
In Fig. 1, we illustrate the misalignment between the OLS and CLS metrics. In the depicted scenario, the rule-based IDM selects a different lane in comparison to the human driver. However, it maintains its position on the road throughout the simulation. This results in a high CLS yet a low OLS. Conversely, the learned PDM-Open generates predictions along the lane chosen by the human driver, thereby obtaining a high OLS. Nonetheless, as errors accumulate in its short-term predictions during the simulation [66, 67], the model’s trajectory veers off the drivable area, culminating in a subpar CLS.
OLS与CLS之间的不一致性。 在图1 中，我们展示了 OLS 和 CLS 指标之间的不一致性。在所描述的场景中，基于规则的 IDM 选择了与人类驾驶员不同的车道。然而，它在整个模拟过程中都保持了在道路上的位置。这导致了较高的 CLS 但较低的 OLS。相反，学习型的 PDM-Open 沿着人类驾驶员选择的车道生成预测，从而获得了较高的 OLS。尽管如此，由于其短期预测在模拟过程中累积了错误[66, 67]，模型的轨迹偏离了可行驶区域，最终导致 CLS 表现不佳。
在这里插入图片描述
图1：规划与自我预测。 我们展示了一个nuPlan场景，以灰色突出显示可行驶区域，并将原始人类轨迹作为虚线黑线。在每个快照中，我们显示了自车代理及其预测。（左图）观察到IDM预测（受限于基于规则的中心线）与人类轨迹之间存在显著位移，导致开环得分较低。（中间+右图）在模拟0.5秒后，学习型的PDM-Open规划器将其自身的错误外推，并最终偏离道路，导致闭环得分不佳。

这一段说的没明白，开环不是在同一个车道吗？PDM-Open能学习选择选取参考线？

3.3 Methods

We now extend IDM by incorporating several concepts from model predictive control, including forecasting, proposals, simulation, scoring, and selection, as illustrated in Fig. 2 (top). We call this model PDM-Closed. Note that as a first step, we still require a graph search to find a sequence of lanes along the route and extract their centerline, as in the IDM planner.
扩展IDM。 我们现在通过纳入模型预测控制的几个概念来扩展IDM，包括预测、控制策略、模拟、评分和选择，如图2（顶部）所示。我们称这个模型为 PDM-Closed。注意，作为第一步，我们仍然需要进行图搜索来找到沿路线的一系列车道并提取它们的中心线，就像在IDM规划器中一样。
在这里插入图片描述
图2：架构 PDM-Closed选择一条中心线，预测环境，并创建不同的轨迹提议，这些提议被模拟并评分以选择轨迹。PDM-Hybrid模块使用PDM-Closed中心线、轨迹和自车历史来预测偏移，仅校正长期航点，从而在闭环模拟中限制了学习模型的影响。

Forecasting. In nuPlan, the simulator provides an orientation vector and speed for each dynamic agent such as a vehicle or pedestrian. We leverage a simple yet effective constant velocity forecasting [68, 69, 70] over the horizon F of 8 seconds at 10Hz.
预测。在nuPlan中，模拟器为每个动态代理（如车辆或行人）提供了一个方向向量和速度。我们利用一个简单但有效的恒定速度预测方法[68, 69, 70]，预测未来8秒内，以 10Hz 的频率进行预测。
Proposals. In the process of calibrating the IDM planner, we observed a trade-off when selecting a single value for the target speed hyperparameter (v0), which either yielded aggressive driving behavior or insufficient progress across various scenarios. Consequently, we generate a set of trajectory proposals by implementing IDM policies at five distinct target speeds, namely, {20%, 40%, 60%, 80%, 100%} of the designated speed limit. For each target speed, we also incorporate proposals with three lateral centerline offsets (±1m and 0m), thereby producing N = 15 proposals in total. To circumvent computational demands in subsequent stages, the proposals have a reduced horizon of H steps, which corresponds to 4 seconds at a 10Hz.
提案。在对 IDM 规划器进行校准的过程中，我们发现在选择目标速度超参数（ $v_0$ ）的单一值时存在一个权衡，这要么导致激进的驾驶行为，要么在各种场景中进展不足。因此，我们通过实施五种不同目标速度的 IDM 策略来生成一组轨迹提案，分别是指定速度限制的{20%，40%，60%，80%，100%}。对于每个目标速度，我们还结合了三个横向中心线偏移（±1米和0米），从而总共产生了N = 15个提案。为了规避后续阶段的计算需求，这些提案有一个减少的预测范围 H 步，这对应于10Hz下的4秒。
Simulation. Trajectories in nuPlan are simulated by iteratively retrieving actions from an LQR controller [71] and propagating the ego vehicle with a kinematic bicycle model [72, 73]. We simulate the proposals with the same parameters and a faster re-implementation of this two-stage pipeline. Thereby, the proposals are evaluated based on the expected movement in closed-loop.
模拟。在nuPlan中，轨迹是通过迭代地从LQR控制器[71]检索动作，并使用运动学自行车模型[72, 73]传播自车来模拟的。我们使用相同的参数和这个两阶段流程的更快重新实现来模拟提案。因此，提案是基于闭环中预期的运动来评估的。
Trajectory selection. Finally, PDM-Closed selects the highest-scoring proposal which is extended to the complete forecasting horizon F with the corresponding IDM policy. If the best trajectory is expected to collide within 2 seconds, the output is overwritten with an emergency brake maneuver.

轨迹选择。最后，PDM-Closed选择得分最高的提案，该提案使用相应的 IDM 策略扩展到完整的预测范围 $F$ 。如果预计最佳轨迹在 2 秒内会发生碰撞，输出将被覆盖为紧急制动操作。
Enhancing long-horizon accuracy. To integrate the accurate ego-forecasting capabilities of PDM-Open with the precise short-term actions of PDM-Closed, we now propose a hybrid version of PDM, i.e., PDM-Hybrid. Specifically, PDM-Hybrid uses a learned module PDM-Offset to predict offsets to waypoints from PDM-Closed, as shown in Fig. 2 (bottom).
提高长期预测准确性。为了将 PDM-Open 的准确自我预测能力与 PDM-Closed 的精确短期动作相结合，我们现在提出了 PDM 的混合版本，即 PDM-Hybrid。具体来说，PDM-Hybrid 使用一个学习模块 PDM-Offset 来预测从 PDM-Closed 到航点的偏移，如图2（底部）所示。
In practice, the LQR controller used in nuPlan relies exclusively on the first 2 seconds of the trajectory when determining actions in closed-loop. Therefore, applying the correction only to long-term waypoints (i.e., beyond 2 seconds by default, which we refer to as the correction horizon C) allows PDM-Hybrid to maintain closed-loop planning performance. The final planner outputs waypoints (up to the forecasting horizon F) {wHybrid t }F t=0 that are given by:
在实践中，nuPlan中使用的 LQR 控制器在闭环中确定动作时，仅依赖于轨迹的前2秒。因此，仅对长期航点（即默认超过2秒，我们称之为校正范围 $C$ ）应用校正，允许 $P D M - Hy b r i d$ 保持闭环规划性能。最终规划器输出的航点（直到预测范围 $F$ ） ${w_{Hybrid}^t\}_{t=0}^F$ 由以下给出：
在这里插入图片描述
Where c and h are the centerline and history (identical to the inputs of PDM-Open). {wClosed t }F t=0 are the PDM-Closed waypoints added to the hybrid approach, and ϕOffset is an MLP. Its architecture is identical to ϕOpen except for an extra linear projection to accommodate wClosed as an additional input.
在这里， $c$ 和 $h$ 分别代表中心线和历史信息（与 PDM-Open 的输入相同）。 ${w_{Closed}^t\}_{t=0}^F$ 是添加到混合方法中的 PDM-Closed 航点，而 $\phi_{\text{Offset}}$ 是一个多层感知器（MLP）。它的架构与 $\phi_{\text{Open}}$ 相同，只是增加了一个额外的线性投影，以适应 $w_{\text{Closed}}$ 作为额外的输入。
具体来说， $\phi_{\text{Offset}}$ 的输入包括：

中心线信息 $c$
历史信息 $h$
PDM-Closed 计算的航点 ${w_{Closed}^t\}_{t=0}^F$

输出是预测的航点偏移 ${d}_{\text{Offset}}^t$ 。这些偏移被应用于 PDM-Closed 计算的航点，以生成混合方法的最终航点 ${w_{Hybrid}^t\}_{t=0}^F$ 。
这种混合方法结合了 PDM-Open 的长期预测能力和 PDM-Closed 的短期精确动作，从而在保持闭环规划性能的同时，提高了长期预测的准确性。

没看到PDM-Open的模型介绍啊。。。

It is important to note that PDM-Hybrid is designed with high modularity, enabling the substitution of individual components with alternative options when diverse requirements emerge. For example, we show results with a different open-loop module in the supplementary material. Given its overall simplicity, one interesting approach to explore involves incorporating modular yet differentiable algorithms as components, as seen in [34]. Exploring the integration of these modules within unified multi-task architectures is another interesting direction. We reserve such exploration for future work.
值得注意的是，PDM-Hybrid 被设计为具有高度的模块化，这使得在出现多样化需求时，可以替换个别组件为替代选项。例如，在补充材料中，我们展示了使用不同开环模块的结果。鉴于其整体的简单性，一个有趣的探索方法是将模块化但可微分的算法作为组件纳入，如文献[34]中所见。探索这些模块在统一的多任务架构中的集成是另一个有趣的方向。我们将此类探索保留为未来的工作。
这段话强调了 PDM-Hybrid 的灵活性和可扩展性，它允许研究者根据特定的应用场景和需求，替换或升级系统中的某些部分。这种设计哲学不仅有助于适应不断变化的技术环境，还能够促进创新和优化，因为研究人员可以尝试不同的算法和策略，以提高系统的整体性能。此外，将模块化算法集成到多任务学习框架中，可能会带来额外的性能提升，因为这些算法可以共享和利用来自不同任务的有用信息。

4 Experiments

We now outline our proposed benchmark and highlight the driving performance of our approach.
我们现在概述我们提出的基准测试，并突出我们方法的驾驶性能。
Val14 benchmark. We offer standardized data splits for training and evaluation. Training uses all 70 scenario types from nuPlan, restricted to a maximum of 4k scenarios per type, resulting in ∼177k training scenarios. For evaluation, we use 100 scenarios of the 14 scenario types considered by the leaderboard, totaling 1,118 scenarios. Despite minor imbalance (all 14 types do not have 100 available scenarios), our validation split aligns with the online leaderboard evaluation (Table 2 and Table 3), confirming the suitability of our Val14 benchmark as a proxy for the online test set.
Val14 基准测试。我们提供了标准化的训练和评估数据分割。训练使用来自nuPlan的所有70种场景类型，每种类型限制最多4000个场景，总共约有177k个训练场景。对于评估，我们使用了排行榜考虑的14种场景类型的100个场景，总共有1118个场景。尽管存在轻微的不平衡（并非所有14种类型都有100个可用场景），我们的验证分割与在线排行榜评估一致（表2和表3），确认我们的Val14基准测试作为在线测试集的代理是合适的。
Baselines. We include several additional SoTA approaches adopting ego-forecasting for planning in our study. Urban Driver [10] encodes polygons with PointNet layers and predicts trajectories with a linear layer after a multi-head attention block. Our study uses an implementation of Urban Driver trained in the open-loop setting. GC-PGP [12] clusters trajectory proposals based on route-constrained lane-graph traversals before returning the most likely cluster center. PlanCNN [11] predicts waypoints using a CNN from rasterized grid features without an ego state input. It shares several similarities to ChauffeurNet [8], a seminal work in the field. A preliminary version of PDM-Hybrid, which won the nuPlan competition, used GC-PGP as its ego-forecasting component, and we include this as a baseline. We provide a complete description of this version in the supplementary.
基线。在我们的研究中，我们包括了几个额外的采用自我预测进行规划的最新技术（SoTA）方法。Urban Driver [10] 使用 PointNet 层编码多边形，并在多头注意力块之后使用线性层预测轨迹。我们的研究使用了在开环设置中训练的 Urban Driver 的实现。GC-PGP [12] 在返回最可能的聚类中心之前，基于路线约束的车道图遍历对轨迹提案进行聚类。PlanCNN [11] 使用CNN从光栅化的网格特征预测航点，没有自我状态输入。它与ChauffeurNet [8]（该领域的开创性工作）有几分相似。PDM-Hybrid的一个初步版本，它赢得了nuPlan竞赛，使用了GC-PGP作为其自我预测组件，我们将其包括在基线中。我们在补充材料中提供了这个版本的完整描述。
Results. Our results are presented in Table 2. PlanCNN achieves the best CLS among learned planners, possibly due to its design choice of removing ego state from input, trading OLS for enhanced CLS. Contrary to the community’s growing preference for graph- and vector-based scene representations in prediction and planning [74, 11, 75, 76], these results show no clear disadvantage of raster representations for the closed-loop task, with PlanCNN also offering a lower runtime. Surprisingly, the simplest rule-based approach in our study, IDM, outperforms the best learned planner, PlanCNN. Moreover, we observe PDM-Closed’s advantages over IDM in terms of CLS: an improvement from 76-77 to 92-93 as a result of the ideas from Section 3. Surprisingly, PDM-Open achieves the highest OLS of 86 with a runtime of only 7ms using only a centerline and the ego state as input. We observe that PDM-Open improves on other methods in accurate long-horizon lane-following, as detailed further in our supplementary material. Next, despite PDM-Closed’s unsatisfactory 42 OLS, PDM-Hybrid successfully combines PDM-Closed with PDM-Open. Both the centerline and graph versions of PDM-Hybrid achieve identical scores in our evaluation. However, the final centerline version, using PDM-Open instead of GC-PGP, is more efficient during inference. Finally, the privileged approach of outputting the ground-truth ego future trajectory (log replay) fails to achieve a perfect CLS, in part due to the nuPlan framework’s LQR controller occasionally drifting from the provided trajectory. PDM-Hybrid compensates for this by evaluating proposals based on the expected controller outcome, causing it to match/outperform log replay in closed-loop evaluation.
结果。我们的结果呈现在表2 中。PlanCNN 在所有学习型规划器中实现了最佳的CLS（闭合环路得分），这可能是因为它在输入中移除了自我状态的设计选择，以换取增强的CLS。与社区对基于图和向量的场景表示在预测和规划中日益增长的偏好[74, 11, 75, 76]相反，这些结果表明，对于闭环任务，光栅表示并没有明显的劣势，PlanCNN还提供了更低的运行时间。令人惊讶的是，我们研究中最简单的基于规则的方法IDM超过了最佳学习型规划器PlanCNN。此外，我们观察到PDM-Closed在CLS方面相对于 IDM 的优势：由于第3节中的想法，从76-77提高到92-93。令人惊讶的是，PDM-Open以仅使用中心线和自我状态作为输入，实现了最高的OLS（开放环路得分）86，运行时间仅为7ms。我们观察到PDM-Open在准确的长期车道跟踪方面优于其他方法，详见我们的补充材料。接下来，尽管PDM-Closed的OLS仅为42，令人不满意，但PDM-Hybrid成功地将PDM-Closed与PDM-Open结合起来。在我们的评估中，PDM-Hybrid的中心线和图版本都取得了相同的分数。然而，最终的中心线版本，使用PDM-Open而不是GC-PGP，在推理过程中更高效。最后，输出地面真实自我未来轨迹（日志回放）的特权方法未能实现完美的CLS，部分原因是nuPlan框架的LQR控制器偶尔偏离提供轨迹。PDM-Hybrid通过基于预期的控制器结果评估提案来补偿这一点，使其在闭环评估中匹配/超越日志回放。
在这里插入图片描述
表2：Val14基准测试。我们展示了多个规划器的闭环得分（反应性/非反应性，CLS-R/CLS-NR）、开放环路得分（OLS）和运行时间（毫秒）。我们指定了每个规划器使用的输入表示（Rep.）。PDM-Hybrid实现了强大的自我预测（OLS）和规划（CLS）。*表示这是PDM-Hybrid的一个初步版本，它结合了PDM-Closed与GC-PGP [12]，并用于我们的在线排行榜提交（表3）。

Challenge. The 2023 nuPlan challenge saw the preliminary (graph) version of PDM-Hybrid rank first out of 25 participating teams. The leaderboard considers the mean of CLS-R, CLS-NR, and OLS. While open-loop performance lagged slightly, closed-loop performance excelled, resulting in an overall SoTA score. Unfortunately, due to the closure of the leaderboard, our final (centerline) version of PDM-Hybrid that replaces GC-PGP with the simpler PDM-Open module could not be benchmarked. All top contenders combined learned ego-forecasting with rule-based post-solvers or post-processing to boost CLS performance for the challenge [77, 78, 79]. Thus, we expect to see more hybrid approaches in the future.
挑战。2023年nuPlan挑战赛中，PDM-Hybrid的初步（图）版本在25支参赛队伍中排名第一。排行榜考虑了反应性闭环得分（CLS-R）、非反应性闭环得分（CLS-NR）和开放环路得分（OLS）的平均值。虽然开放环路性能略有落后，但闭环性能表现出色，从而获得了总体上的最新技术（SoTA）得分。不幸的是，由于排行榜的关闭，我们的最终（中心线）版本的PDM-Hybrid无法进行基准测试，该版本用更简单的PDM-Open模块替换了GC-PGP。所有顶级竞争者都结合了学习型自我预测和基于规则的后处理或后处理，以提高挑战赛的CLS性能[77, 78, 79]。因此，我们预计未来会看到更多的混合方法。
Importantly, near identical scores were recorded for our submission on both our Val14 benchmark (Table 2) and the official leaderboard (Table 3). Note that the Urban Driver and IDM results on the leaderboard are provided by the nuPlan team, so they likely use different training data and hyper-parameters than our implementations from Table 2.
重要的是，我们在Val14基准测试（表2）和官方排行榜（表3）上的提交都记录了几乎相同的得分。请注意，排行榜上的Urban Driver和IDM结果由nuPlan团队提供，因此它们可能使用了与我们表2中的实现不同的训练数据和超参数。
在这里插入图片描述
Ablation Study. We delve into our design choices through an ablation study in Table 4. Table 4a displays PDM-Hybrid’s closed-loop score reactive (CLS-R) and open-loop score (OLS) with varied correction horizons © from 0s to 3s. Applying the waypoint correction to all waypoints (i.e., C = 0), outperforms PDM-Open in OLS (87 vs. 86, see Table 2) but leads to a substantial drop in CLS-R compared to the default value of C = 2. On the other hand, a noticeable OLS decline occurs when initiating corrections deeper into the trajectory (e.g., C = 3), with minimal impact on CLS-R.
消融研究。我们通过表4 中的消融研究深入探讨了我们的设计选择。表4a 展示了在不同校正范围（C）从0秒到3秒的情况下，PDM-Hybrid的闭环反应得分（CLS-R）和开放环路得分（OLS）。将航点校正应用于所有航点（即，C = 0）在OLS上优于PDM-Open（87 vs. 86，见表2），但与默认值C = 2相比，CLS-R显著下降。另一方面，当在轨迹中更深入地开始校正时（例如，C = 3），OLS明显下降，而对CLS-R的影响最小。
在这里插入图片描述
表4：消融研究。我们展示了闭环反应得分（CLS-R）、开放环路得分（OLS）和运行时间（毫秒）。我们研究了（a）PDM-Hybrid的不同校正范围，（b）忽略PDM-Closed的子模块，以及（c）输入和架构选择对PDM-Open的影响。默认配置（灰色高亮）实现了最佳权衡。

For PDM-Closed (Table 4b), we compare CLS-R and runtime (ms) with the base planner across three scenarios: removing lateral centerline offsets (“lat.”), longitudinal IDM proposals (“lon.”), and environment forecasting (“cast.”). Our analysis reveals that eliminating proposals diminishes CLS-R effectiveness but accelerates runtimes. Performance significantly drops when excluding the forecasting used for creating and evaluating proposals. However, the runtime remains nearly identical, showing the effectiveness of the simple forecasting mechanism.
对于PDM-Closed（表4b），我们比较了在三种场景下的基础规划器的闭环反应得分（CLS-R）和运行时间（毫秒）：移除横向中心线偏移（“lat.”）、纵向IDM提案（“lon.”）和环境预测（“cast.”）。我们的分析揭示了消除提案会降低CLS-R的有效性，但加速了运行时间。当排除用于创建和评估提案的预测时，性能显著下降。然而，运行时间几乎保持不变，显示出简单预测机制的有效性。
As for PDM-Open (Table 4c), we test three variations: a shorter centerline (30m vs. 120m), a coarser centerline (every 10m vs. 1m), and a smaller MLP with a reduced hidden dimension (from 512 to 256). Both a smaller MLP and a reduced centerline length lead to performance degradation, but the impact remains relatively minor compared to disregarding the centerline altogether (Table 1, OLS=72). Meanwhile, the impact of a coarser centerline is negligible.
对于PDM-Open（表4c），我们测试了三种变化：较短的中心线（30米对比120米）、较粗的中心线（每10米对比每1米），以及一个隐藏层较小的MLP（从512减少到256）。较小的MLP和缩短的中心线长度都会导致性能下降，但与完全忽略中心线相比（表1，OLS=72），影响相对较小。同时，较粗的中心线的影响可以忽略不计。

5 Discussion

Although rule-based planning is often criticized for its limited generalization, our results demonstrate strong performance in the closed-loop nuPlan task which best resembles real-world evaluation. Notably, open-loop success in part requires a trade-off in closed-loop performance. Consequently, imitation-trained ego-forecasting methods fare poorly in closed-loop. This suggests that rule-based planners remain promising and warrant further exploration. At the same time, given their poor performance out-of-the-box, there is room for improvement in imitation-based methods on nuPlan.
尽管基于规则的规划常常因其泛化能力有限而受到批评，但我们的结果在最接近现实世界评估的闭环nuPlan任务中展示了强大的性能。值得注意的是，开放环路的成功部分需要在闭环性能上做出权衡。因此，通过模仿训练的自我预测方法在闭环中表现不佳。这表明基于规则的规划者仍然是有前景的，值得进一步探索。同时，鉴于它们在开箱即用时的糟糕表现，模仿基础的方法在nuPlan上还有改进的空间。
Integrating the strengths of closed-loop planning and open-loop ego-forecasting, we present a hybrid model. However, this does not enhance closed-loop driving performance; instead, it boosts open-loop performance while executing identical driving maneuvers. We conclude that considering precise open-loop ego-forecasting as a prerequisite for achieving long-term planning goals is misleading.
通过整合闭环规划和开放环路自我预测的优势，我们提出了一个混合模型。然而，这并没有增强闭环驾驶性能；相反，它在执行相同的驾驶动作时提高了开放环路的性能。我们得出结论，将精确的开放环路自我预测视为实现长期规划目标的先决条件是误导性的。
Acknowledging the potential importance of ego-forecasting for interpretability and assessing humanlike behavior, we propose focusing this evaluation on the short horizon (e.g., 2 seconds) relevant for closed-loop driving. The current nuPlan OLS definition, requiring a unimodal 8-second ego-forecast, may only be useful for alternate applications, like setting goals for background agents in datadriven traffic simulations or allocating computational resources better, e.g. to prioritize perception or prediction in areas the ego-vehicle is expected to traverse. We discourage the use of open-loop metrics as a primary indicator of planning performance [18].
承认自我预测对于可解释性和评估类似人类行为的潜在重要性，我们提议将这种评估集中在与闭环驾驶相关的短期（例如，2秒）范围内。当前nuPlan的开放环路得分（OLS）定义，要求一个单峰的8秒自我预测，可能只对其他应用有用，比如为数据驱动的交通模拟中背景代理设置目标，或者更好地分配计算资源，例如优先考虑在预期自车将穿越的区域的感知或预测。我们不鼓励将开放环路指标作为规划性能的主要指标[18]。
Limitations. While we significantly improve upon the established IDM model, PDM still does not execute lane-change maneuvers. Lane change attempts often lead to collisions when the ego-vehicle is between two lanes, resulting in a high penalty as per the nuPlan metrics. PDM relies on HD maps and precise offboard perception [80, 9] that may be unavailable in real-world driving situations. While real-world deployment was demonstrated for learning-based methods [81, 33, 8], it remains a significant challenge for rule-based approaches. Moreover, our experiments, aside from the held-out test set, have not specifically evaluated the model’s generalization capabilities when encountering distributional shifts, such as unseen towns or novel scenario types. They were all conducted on a single simulator, nuPlan. Therefore, it is important to recognize the limitations inherent in nuPlan’s data-driven simulation approach. When a planner advances more rapidly than the human driving log, objects materialize abruptly in front of the ego-vehicle during simulation. For CLS-NR, vehicles move independently as observed in reality, disregarding the ego agent, leading to excessively aggressive behavior. Conversely, CLS-R background agents rely on IDM and adhere strictly to the centerline, leading to unrealistically passive behavior. We see high value in developing a more refined reactive environment for future work.
局限性。虽然我们在已建立的IDM模型上取得了显著的改进，但 PDM 仍然不执行变道操作。当自车位于两条车道之间时，尝试变道往往会导致碰撞，根据nuPlan的指标，这会产生很高的惩罚。PDM 依赖于高清地图和精确的离线感知[80, 9]，这在现实驾驶情况下可能不可用。虽然已经展示了基于学习的方法在现实世界的部署[81, 33, 8]，但对于基于规则的方法来说，这仍然是一个重大挑战。此外，除了保留的测试集之外，我们的实验并没有特别评估模型在遇到分布偏移（如未见过的城镇或新型场景）时的泛化能力。它们都是在单一模拟器nuPlan上进行的。因此，重要的是要认识到nuPlan数据驱动的模拟方法固有的局限性。当规划器比人类驾驶日志前进得更快时，物体在模拟中会突然在自车前出现。对于CLS-NR，车辆像在现实中观察到的那样独立移动，忽略自车代理，导致过于激进的行为。相反，CLS-R背景代理依赖于 IDM 并严格遵守中心线，导致不切实际的被动行为。我们认为为未来的工作开发一个更精细的反应环境具有很高的价值。
Conclusion. In this paper, we identify prevalent misconceptions in learning-based vehicle motion planning. Based on our insights, we introduce PDM-Hybrid, which builds upon IDM and combines it with a learned ego-forecasting component. It surpassed a comprehensive set of competitors and claimed victory in the 2023 nuPlan competition.
结论。在这篇论文中，我们识别了基于学习的车辆运动规划中普遍存在的误解。基于我们的洞察，我们引入了PDM-Hybrid，它在IDM的基础上结合了学习型的自我预测组件。它超越了一系列全面的竞争对手，并在2023年nuPlan竞赛中取得了胜利。

在这里插入图片描述