敏捷的自主避障01——agile autonomy

Learning High-Speed Flight in the Wild


代码:https://github.com/uzh-rpg/agile_autonomy

摘要

  • Autonomous operation with onboard sensing and computation has been limited to low speeds.(其他方法缺点)
  • Here we propose an end-to-end approach that can autonomously fly quadrotors through complex natural and human-made environments at high speeds, with purely onboard sensing and computation. (本文主线)
  • The key principle is to directly map noisy sensory observations to collision-free trajectories in a receding-horizon fashion.
  • zero-shot transfer,By simulating realistic sensor noise
  • privileged learning: imitating an expert with access to privileged information.(特权学习:模仿专家得到特权信息。)

Introduction

  • The limiting factor for autonomous agile flight in arbitrary unknown environments is the coupling of fast and robust perception with effective planning. The perception system has to be robust to disturbances such as sensor noise, motion blur, and changing illumination conditions.(快和感知稳定性冲突)
  • The division of the navigation task into the mapping and planning subtasks is attractive from an engineering perspective, because it enables parallel progress on each component and makes the overall system interpretable. However, it leads to pipelines that largely neglect interactions between the different stages and thus compound errors(把建图和规划模块分开考虑忽视了他们之间的相互作用)
  • end-to-end, imitating a human:
    • collected in simulation(F. Sadeghi and S. Levine, “CAD2RL: real single-image flight without a single real image,” in Robotics: Science and Systems RSS, N. M. Amato, S. S. Srinivasa, N. Ayanian, and S. Kuindersma, Eds., 2017; E. Kaufmann, A. Loquercio, R. Ranftl, M. Müller, V. Koltun, and D. Scaramuzza, “Deep drone acrobatics,” RSS: Robotics, Science, and Systems, 2020.)
    • directly in the real world(D. Gandhi, L. Pinto, and A. Gupta, “Learning to fly by crashing,” in IEEE/RSJ Int. Conf. Intell. Robot. Syst. (IROS), 2017, pp. 3948–3955.)
  • we train the policy exclusively in simulation.
  • To this end, we utilize a stereo matching algorithm to provide depth images as input to the policy.
  • 重点在怎么保证仿真出和现实世界差不多的传感器噪声
  • train the navigation policy via privileged learning
  • Our policy takes a noisy depth image and inertial measurements as sensory inputs
  • produces a set of short-term trajectories together with an estimate of individual trajectory costs.(要cost干嘛?轨迹选择)
  • We train the policy using a multi-hypothesis winner-takes-all loss that adaptively maps the predicted trajectories to the best trajectories that have been found by the sampling-based expert.
  • At test time, we use the predicted trajectory costs to decide which trajectory to execute in a receding horizon.
  • zero-shot generalization setting:
    • we train on randomly generated obstacle courses composed of simple off-theshelf objects, such as schematic trees and a small set of convex shapes such as cylinders and cubes.
    • then directly deploy the policy in the physical world without any adaptation or fine-tuning.
    • Our platform experiences conditions at test time that were never seen during training(有点子NB)

结果

  • the failure rate up to 10 times with respect to state-of-the-art methods.
  • In all experiments, the drone was provided with a reference trajectory, which is not collision-free (Figure 3-C, depicted in red), to encode the intended flight path. This reference can be provided by a user or a higher-level planning algorithm. The drone is tasked to follow that flight path and make adjustments as necessary to avoid obstacles.(都有reference path)
  • We measure performance according to success rate, i.e. the percentage of successful runs over the total number of runs, where we consider a run successful if the drone reaches the goal location within a radius of 5 m without crashing.
  • Both trajectory types are not collision-free and would lead to a crash into obstacles if blindly executed. We flew the straight line at different average speeds in the range of 3 to 10 m s−1.
  • 成功率:5ms-1,100% ->7ms-1,80%->10ms-1,60%
  • 和skydio R1的(跟踪功能)对比,skydio都不行,本文都可以(23333)
  • 和fastplanner和reactive对比:
    • P. Florence, J. Carter, and R. Tedrake, “Integrated perception and control at high speed: Evaluating collision avoidance maneuvers without maps,” in Algorithmic Foundations of Robotics XII. Springer, 2020, pp. 304–319.
    • B. Zhou, F. Gao, L. Wang, C. Liu, and S. Shen, “Robust and efficient quadrotor trajectory generation for fast autonomous flight,” IEEE Robot. Autom. Lett., vol. 4, no. 4, pp. 3529–3536, 2019.
  • However, as the speed increases, the baselines’ performances quickly drop: already at 5 m s−1, no baseline is able to complete all runs without crashing. In contrast, our method can reliably fly at high-speeds through all environments, achieving an average success rate of 70% at 10 ms−1.
  • being the reactive baseline only conditioned on the current observation, it is strongly affected by noise in the observation.
  • the FastPlanner baseline can reject outliers in the depth map by leveraging multiple observations, which makes it more robust to sensing errors. However, this methods generally results in higher processing latency: Multiple observations are required to add obstacles in the map, and, therefore, to plan trajectories to avoid them. This problem is worsened by high-speed motion, which generally results in little overlap between consecutive observations.(fastplanner类的缺点)
  • We build a simulated forest [36] in a rectangular region R(l, w) of width w = 30 m and length l = 60 m, and fill it with trees that have a diameter of about 0.6 m. Trees are randomly placed according to a homogeneous Poisson point process P with intensity δ = 1/25 tree m−2 [36].(生成随机树林的思路,意外收获)
  • when network inference is performed on the GPU, our approach is 25.3 times faster than FastPlanner and 7.4 times faster than the Reactive baseline.
  • When GPU inference is disabled, the network’s latency increases by only 8 ms, and our approach is still much faster than both baselines
  • The maximum speed depends on the sensing range(真知灼见)
  • FastPlanner baseline was only demonstrated up to speeds of 3m s−1 in the original work [18], and thus was not designed to operate at high speeds.(another 意外收获)
  • In contrast to the baselines, our approach is only marginally affected by the noisy depth readings, with only a 10% drop in performance at 10 m s−1, but no change in performance at lower speeds. This is because our policy, trained on depth from stereo, learns to account for common issues in the data such as discretization artifacts and missing values.

讨论

  • 与传统多模块并行的方案对比: The separation into multiple modules simplifies the implementation of engineered systems, enables parallel development of each component, and makes the overall system more interpretable. However, modularity comes at a high cost: the communication between modules introduces latency, errors compound across modules, and interactions between modules are not modeled.
  • 未来趋势: This is mainly because of the fact that, at speeds of 10 m s−1 or higher, feasible solutions require temporal consistency over a long time horizon and strong variations of the instantaneous flying speed as a function of the obstacle density. Combining our short-horizon controller for local obstacle avoidance with a long-term planner is a major opportunity for many robotics applications, including autonomous exploration, delivery, and cinematography.
  • 强化学习的意义: Therefore, we believe that this problem represents a big opportunity for model-free methods, which have the potential to ease the engineering requirements.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 4
    评论
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值