动机:
As the scenario becomes more complicated, tuning to improve the motion planner performance becomes increasingly diffificult. To systematically solve this issue, we develop a data-driven auto-tuning framework based on the Apollo autonomous driving framework
【CC】简单来说,场景越来越复杂,系统的解决这个问题只能通过data-driven的 方式
Third, the expert driving data and information about the surrounding environment are collected and automatically l
labeled
【CC】这套框架能自动打标签?整个思路都是向着自动化的
Typically, two major approaches are used to develop such a map: learning via demonstration (imitation learning) or
through optimizing the current reward/cost functional.
【CC】背景知识,典型的motion Planner的处理方式: 要么imitation learning 要么走优化的路子
In an imitation learning system, the state-to-action mapping is directly learned from expert demonstration,a multimodal distribution loss function is necessary but will slow the training process
【CC】imitaiton learning的典型思路是通过数据构建一个分布的map: state-> action,一般比较慢
Optimizing through a reward functional,the reward/cost functionals are typically provided by an expert or learned from data via inverse reinforcement learning
【cc】优化的路子,cost function要么专家定义,要么通过IRL来学习,本文就是走的IRL学习的路子
Expert driving data from different scenarios are easy to collect but are extremely diffificult to reproduce in simulation since the ego car requires interaction with the surrounding environment
【cc】数据获取的痛点: 数据好收集,但是不好在虚拟环境上复现,因为跟环境有交互
we build an auto-tuning system that includes both online trajectory optimization and offlfline parameter tuning
【cc】在线进行轨迹优化,离线进行参数调优;按照下图,是训练好的cost func/参数回塞到在线系统中;但是,这个没有看到数据流的走向,不知道会不会做数据孪生
Our motion planner module is not tied to a specifific approach.
【CC】按照APOLLO的框架,个人猜测,motion planner可能不止有优化版本,还可能有imitation learning的版本;需要做的是“定义好”/“学习好”cost func来评价优化/生成出来的结果,这也是本文的IRL的重点
The performance of these motion planners is be evaluated with the metrics that quantify both optimality and robustness.The optimality o