Andrew Ng, High Speed Obstacle Avoidance using Monocular Visionand Reinforcement Learning阅读笔记

翻译 2015年11月20日 16:38:35



High Speed Obstacle Avoidance using Monocular Vision and Reinforcement Learning阅读笔记

原作者:Jeff Michels, Ashutosh Saxena, Andrew Y. Ng




High speed navigation and obstacleavoidance, remote control car, unstructured outdoor environments.


Combines reinforcement learning, computergraphics and computer vision


A monocular vision obstacle detectionalgorithm based on supervised learning


Collect dataset of several thousand images,each correlated with laser scanner that gives the nearest obstacle in eachdirection


A supervised learning algorithm canaccurately estimate the distances to the obstacles. This is the basic visionsystem. The output is fed into a higher level controller trained usingreinforcement learning.


Use a graphical driving simulator, syntheticimages instead of real images and laser scan data


Use graphical simulator to train reinforcementlearning algorithm, systematically vary the level of graphical realism


Using low-to-medium quality synthetic imagesto train, can give reasonable results in real test.


Combine synthetic and real images to train,the result is better than either one.



3 categories of cues for depth fromtwo-dimensional images: monoculars cues, stereopsis, and motion parallax


Monocular-vision and apprenticeship learningwas used to drive a car on highly structured roads.



Divide each image into vertical stripes


Synthetic images are inexpensive, and nonoise in the ground truth.


In order to emphasize multiplicative ratherthan additive errors, we converted each distance to a log scale. Experimentstraining with linear distance give poor results.


Each image is divided into 16 stripes. Eachstripe is divided into 11vertaclly overlapping windows.


Coefficients representing texture energiesand gradients are calculated as feature vector.


Transform from RGB to YCbCr. For eachwindow, we apply Laws’ masks to measure texture energies.


Texture gradients are an important cue indepth estimation.


In order to calculate texture gradient thatrobust to noise in the image, we use a variant of Radon transform and a variantof the Harris corner detector.


We trained linear models to estimate thelog distance to the nearest obstacle in a stripe.


Simple minimization of the sum of squarederrors produced nearly identical results to the more complex methods.


The real error metric to optimize in thiscase should be the mean time to crash.


Vehicle will be driving in unstructuredterrain, experiments in this domain are not easily repeatable.


Let /alpha be a possible steeringdirection, chosen by picking the direction correspond to the farthest predicteddistance.


To calculate the relative depth error, weremove the mean from the true and estimated log-distances for each image.


Letting hazard distance = 5m denote thedistance at which an obstacle becomes a hazard.


We combined the system trained on syntheticdata with the one trained on real images in order to reduce the hazard rateerror, but did not produce any improvement over real-image system.




We model the RC car control problem as a Markov decision process (MDP).

We then used the PEGASUS policy search algorithm.

The reward function was given as R(s)=-abs(v_{desired}-v_{actural}) - K * Crashed, where v_{desired} and v_{actual} are the desired and actual speeds of the car, Crashed is a binary variable stating whether or not the car has crashed in that time step. Thus, the vehicle attempts to maintain the desired forward speed while minimizing contact with obstacles.

DragonFly spy camera, 320*240 pixel resolution, 20Hz. Steering and throttle commands are sent back to the RC transmitter from laptop.

Experimental Results

To be read.

Conclusion and Discussion

The experiments with the graphical simulator show that model-based RL holds great promise even in settings involving complex environments and complex perception. 

A vision system trained on computer graphics was able to give reasonable depth estimate on real image data, and a control policy trained in a graphical simulator worked well on real autonomous driving. 


斯坦福大学公开课 :机器学习课程(Andrew Ng)——15、无监督学习:Reinforcement Learning and Control

在之前的讨论中,我们总是给定一个样本x,然后给出或者不给出label y。之后对样本进行拟合、分类、聚类或者降维等操作。然而对于很多序列决策或者控制问题,很难有这么规则的样本。比如,四足机器人的控制问...
  • mmc2015
  • mmc2015
  • 2015年01月06日 19:29
  • 1503

《机器学习》(Machine Learning)——Andrew Ng 斯坦福大学公开课学习笔记(三)

第5集 生成学习算法 (一)生成学习模型: 例如:恶性和良性癌症的问题,分别对样本中恶性癌症和良性癌症的特征分别建模,当有新的样本需要判定时,看它是和哪个模型更像,进而预测该样本是良性还是恶性 ...
  • u013896242
  • u013896242
  • 2015年08月08日 21:58
  • 1979

Machine Learning(by Andrew Ng) 学习笔记

监督学习:通过已有的训练样本(即已知数据以及其对应的输出)来训练,从而得到一个最优模型,再利用这个模型将所有新的数据样本映射为相应的输出结果。 监督学习问题分为“回归”和“分类”问题。 在回归问题...
  • wangchao7281
  • wangchao7281
  • 2017年05月04日 11:12
  • 418

Andrew Ng Machine Learning 专题【Logistic Regression & Regularization】

此文是斯坦福大学,机器学习界 superstar — Andrew Ng 所开设的 Coursera 课程:Machine Learning 的课程笔记。力求简洁,仅代表本人观点,不足之处希望大家探讨...
  • yOung_One
  • yOung_One
  • 2015年08月10日 22:53
  • 2254

coursera Machine learning Andrew NG 笔记(一)

看到不少推荐Andrew Ng的机器学习的课程,所以在coursera上注册了开始学。2016年1月15日1. Introduction1. machine learning definition ...
  • laura0949
  • laura0949
  • 2016年01月15日 15:20
  • 821

Andrew NG机器学习课程笔记系列之——Introduction to Machine Learning

引言 本系列文章是本人对Andrew NG的机器学习课程的一些笔记,如有错误,请读者以课程为准。 在现实生活中,我们每天都可能在不知不觉中使用了各种各样的机器学习算法。 例如,当你每一次使用 Goog...
  • mydear_11000
  • mydear_11000
  • 2016年03月12日 13:27
  • 1181

Andrew Ng Neural-networks-deep-learning 课程笔记一

Week1 Introduction to Deep Learning Tips:在Andrew Ng的课程中,通常使用列向量构成的矩阵来表示一系列样本,如X.shape=(n_x,m),n_...
  • kelvinmao
  • kelvinmao
  • 2017年09月06日 21:12
  • 337

Andrew Ng 深度学习课程小记

Andrew Ng 深度学习课程小记 2011年秋季,Andrew Ng 推出了面向入门者的MOOC雏形课程机器学习: Machine Learning,随后在2012年4月,Andrew ...
  • hongbudao
  • hongbudao
  • 2017年08月29日 10:25
  • 1460

《机器学习》(Machine Learning)——Andrew Ng 斯坦福大学公开课学习笔记(一)

看到蘑菇街招聘的一个加分项是学过Andrew Ng的机器学习课程,于是找来看了下目录,大多数内容之前在PRML中有接触过,研究生课程智能信息处理中也有接触,但觉得不够系统,于是按斯坦福的公开课课表过一...
  • u013896242
  • u013896242
  • 2015年08月05日 16:24
  • 2272

Andrew Ng《Machine Learning》第七讲——支持向量机SVM(Support Vector Machine)

  • zhonglj0314
  • zhonglj0314
  • 2017年02月15日 18:28
  • 353
您举报文章:Andrew Ng, High Speed Obstacle Avoidance using Monocular Visionand Reinforcement Learning阅读笔记