Andrew Ng, High Speed Obstacle Avoidance using Monocular Visionand Reinforcement Learning阅读笔记

 

 

High Speed Obstacle Avoidance using Monocular Vision and Reinforcement Learning阅读笔记


原作者:Jeff Michels, Ashutosh Saxena, Andrew Y. Ng

 

Introduction

 

High speed navigation and obstacleavoidance, remote control car, unstructured outdoor environments.

 

Combines reinforcement learning, computergraphics and computer vision

 

A monocular vision obstacle detectionalgorithm based on supervised learning

 

Collect dataset of several thousand images,each correlated with laser scanner that gives the nearest obstacle in eachdirection

 

A supervised learning algorithm canaccurately estimate the distances to the obstacles. This is the basic visionsystem. The output is fed into a higher level controller trained usingreinforcement learning.

 

Use a graphical driving simulator, syntheticimages instead of real images and laser scan data

 

Use graphical simulator to train reinforcementlearning algorithm, systematically vary the level of graphical realism

 

Using low-to-medium quality synthetic imagesto train, can give reasonable results in real test.

 

Combine synthetic and real images to train,the result is better than either one.

 

Relatedwork

3 categories of cues for depth fromtwo-dimensional images: monoculars cues, stereopsis, and motion parallax

 

Monocular-vision and apprenticeship learningwas used to drive a car on highly structured roads.

 

Visionsystem

Divide each image into vertical stripes

 

Synthetic images are inexpensive, and nonoise in the ground truth.

 

In order to emphasize multiplicative ratherthan additive errors, we converted each distance to a log scale. Experimentstraining with linear distance give poor results.

 

Each image is divided into 16 stripes. Eachstripe is divided into 11vertaclly overlapping windows.

 

Coefficients representing texture energiesand gradients are calculated as feature vector.

 

Transform from RGB to YCbCr. For eachwindow, we apply Laws’ masks to measure texture energies.

 

Texture gradients are an important cue indepth estimation.

 

In order to calculate texture gradient thatrobust to noise in the image, we use a variant of Radon transform and a variantof the Harris corner detector.

 

We trained linear models to estimate thelog distance to the nearest obstacle in a stripe.

 

Simple minimization of the sum of squarederrors produced nearly identical results to the more complex methods.

 

The real error metric to optimize in thiscase should be the mean time to crash.

 

Vehicle will be driving in unstructuredterrain, experiments in this domain are not easily repeatable.

 

Let /alpha be a possible steeringdirection, chosen by picking the direction correspond to the farthest predicteddistance.

 

To calculate the relative depth error, weremove the mean from the true and estimated log-distances for each image.

 

Letting hazard distance = 5m denote thedistance at which an obstacle becomes a hazard.

 

We combined the system trained on syntheticdata with the one trained on real images in order to reduce the hazard rateerror, but did not produce any improvement over real-image system.

 

Control

 

We model the RC car control problem as a Markov decision process (MDP).


We then used the PEGASUS policy search algorithm.


The reward function was given as R(s)=-abs(v_{desired}-v_{actural}) - K * Crashed, where v_{desired} and v_{actual} are the desired and actual speeds of the car, Crashed is a binary variable stating whether or not the car has crashed in that time step. Thus, the vehicle attempts to maintain the desired forward speed while minimizing contact with obstacles.


DragonFly spy camera, 320*240 pixel resolution, 20Hz. Steering and throttle commands are sent back to the RC transmitter from laptop.


Experimental Results

To be read.


Conclusion and Discussion

The experiments with the graphical simulator show that model-based RL holds great promise even in settings involving complex environments and complex perception. 


A vision system trained on computer graphics was able to give reasonable depth estimate on real image data, and a control policy trained in a graphical simulator worked well on real autonomous driving. 


 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
提供的源码资源涵盖了Java应用等多个领域,每个领域都包含了丰富的实例和项目。这些源码都是基于各自平台的最新技术和标准编写,确保了在对应环境下能够无缝运行。同时,源码中配备了详细的注释和文档,帮助用户快速理解代码结构和实现逻辑。 适用人群: 适合毕业设计、课程设计作业。这些源码资源特别适合大学生群体。无论你是计算机相关专业的学生,还是对其他领域编程感兴趣的学生,这些资源都能为你提供宝贵的学习和实践机会。通过学习和运行这些源码,你可以掌握各平台开发的基础知识,提升编程能力和项目实战经验。 使用场景及目标: 在学习阶段,你可以利用这些源码资源进行课程实践、课外项目或毕业设计。通过分析和运行源码,你将深入了解各平台开发的技术细节和最佳实践,逐步培养起自己的项目开发和问题解决能力。此外,在求职或创业过程中,具备跨平台开发能力的大学生将更具竞争力。 其他说明: 为了确保源码资源的可运行性和易用性,特别注意了以下几点:首先,每份源码都提供了详细的运行环境和依赖说明,确保用户能够轻松搭建起开发环境;其次,源码中的注释和文档都非常完善,方便用户快速上手和理解代码;最后,我会定期更新这些源码资源,以适应各平台技术的最新发展和市场需求。 所有源码均经过严格测试,可以直接运行,可以放心下载使用。有任何使用问题欢迎随时与博主沟通,第一时间进行解答!

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值