Andrew Ng, High Speed Obstacle Avoidance using Monocular Visionand Reinforcement Learning阅读笔记

翻译 2015年11月20日 16:38:35



High Speed Obstacle Avoidance using Monocular Vision and Reinforcement Learning阅读笔记

原作者:Jeff Michels, Ashutosh Saxena, Andrew Y. Ng




High speed navigation and obstacleavoidance, remote control car, unstructured outdoor environments.


Combines reinforcement learning, computergraphics and computer vision


A monocular vision obstacle detectionalgorithm based on supervised learning


Collect dataset of several thousand images,each correlated with laser scanner that gives the nearest obstacle in eachdirection


A supervised learning algorithm canaccurately estimate the distances to the obstacles. This is the basic visionsystem. The output is fed into a higher level controller trained usingreinforcement learning.


Use a graphical driving simulator, syntheticimages instead of real images and laser scan data


Use graphical simulator to train reinforcementlearning algorithm, systematically vary the level of graphical realism


Using low-to-medium quality synthetic imagesto train, can give reasonable results in real test.


Combine synthetic and real images to train,the result is better than either one.



3 categories of cues for depth fromtwo-dimensional images: monoculars cues, stereopsis, and motion parallax


Monocular-vision and apprenticeship learningwas used to drive a car on highly structured roads.



Divide each image into vertical stripes


Synthetic images are inexpensive, and nonoise in the ground truth.


In order to emphasize multiplicative ratherthan additive errors, we converted each distance to a log scale. Experimentstraining with linear distance give poor results.


Each image is divided into 16 stripes. Eachstripe is divided into 11vertaclly overlapping windows.


Coefficients representing texture energiesand gradients are calculated as feature vector.


Transform from RGB to YCbCr. For eachwindow, we apply Laws’ masks to measure texture energies.


Texture gradients are an important cue indepth estimation.


In order to calculate texture gradient thatrobust to noise in the image, we use a variant of Radon transform and a variantof the Harris corner detector.


We trained linear models to estimate thelog distance to the nearest obstacle in a stripe.


Simple minimization of the sum of squarederrors produced nearly identical results to the more complex methods.


The real error metric to optimize in thiscase should be the mean time to crash.


Vehicle will be driving in unstructuredterrain, experiments in this domain are not easily repeatable.


Let /alpha be a possible steeringdirection, chosen by picking the direction correspond to the farthest predicteddistance.


To calculate the relative depth error, weremove the mean from the true and estimated log-distances for each image.


Letting hazard distance = 5m denote thedistance at which an obstacle becomes a hazard.


We combined the system trained on syntheticdata with the one trained on real images in order to reduce the hazard rateerror, but did not produce any improvement over real-image system.




We model the RC car control problem as a Markov decision process (MDP).

We then used the PEGASUS policy search algorithm.

The reward function was given as R(s)=-abs(v_{desired}-v_{actural}) - K * Crashed, where v_{desired} and v_{actual} are the desired and actual speeds of the car, Crashed is a binary variable stating whether or not the car has crashed in that time step. Thus, the vehicle attempts to maintain the desired forward speed while minimizing contact with obstacles.

DragonFly spy camera, 320*240 pixel resolution, 20Hz. Steering and throttle commands are sent back to the RC transmitter from laptop.

Experimental Results

To be read.

Conclusion and Discussion

The experiments with the graphical simulator show that model-based RL holds great promise even in settings involving complex environments and complex perception. 

A vision system trained on computer graphics was able to give reasonable depth estimate on real image data, and a control policy trained in a graphical simulator worked well on real autonomous driving. 


斯坦福大学公开课 :机器学习课程(Andrew Ng)——15、无监督学习:Reinforcement Learning and Control

在之前的讨论中,我们总是给定一个样本x,然后给出或者不给出label y。之后对样本进行拟合、分类、聚类或者降维等操作。然而对于很多序列决策或者控制问题,很难有这么规则的样本。比如,四足机器人的控制问...
  • mmc2015
  • mmc2015
  • 2015年01月06日 19:29
  • 1406

Machine Learning(by Andrew Ng) 学习笔记

监督学习:通过已有的训练样本(即已知数据以及其对应的输出)来训练,从而得到一个最优模型,再利用这个模型将所有新的数据样本映射为相应的输出结果。 监督学习问题分为“回归”和“分类”问题。 在回归问题...

machine Learning(Andrew Ng) 学习笔记(1)

(1) 监督式学习(supervised learning) 和 非监督式学习(unsupervised learning) 监督学习:监督学习,简单来说就是给定一定的训练样本(这里一定要注意,...

coursera Machine learning Andrew NG 笔记(一)

看到不少推荐Andrew Ng的机器学习的课程,所以在coursera上注册了开始学。2016年1月15日1. Introduction1. machine learning definition ...

《机器学习》(Machine Learning)——Andrew Ng 斯坦福大学公开课学习笔记(三)

第5集 生成学习算法 (一)生成学习模型: 例如:恶性和良性癌症的问题,分别对样本中恶性癌症和良性癌症的特征分别建模,当有新的样本需要判定时,看它是和哪个模型更像,进而预测该样本是良性还是恶性 ...

Andrew Ng的 Machine Learning 读书笔记 Lecture 4(数据归一化,调参)

数据归一化:        归一化化定义:我是这样认为的,归一化化就是要把你需要处理的数据经过处理后(通过某种算法)限制在你需要的一定范围内。首先归一化是为了后面数据处理的方便,其次是保正程序运行时收...

Machine Learning —— By Andrew Ng(机器学习 听后自己做的笔记 记录重点内容)

MachineLearning ——byAndrew Ng , Stanford   第一讲:机器学习的动机与应用 一、监督学习: 1、  回归问题: 房价预测 2、  分类问题:...

《机器学习》(Machine Learning)——Andrew Ng 斯坦福大学公开课学习笔记(一)

看到蘑菇街招聘的一个加分项是学过Andrew Ng的机器学习课程,于是找来看了下目录,大多数内容之前在PRML中有接触过,研究生课程智能信息处理中也有接触,但觉得不够系统,于是按斯坦福的公开课课表过一...

Andrew Ng -machine learning 课堂笔记(一)第六周

bias and variance

Andrew Ng -machine learning 课堂笔记(二)第七周

---------Support Vector Machines(SVM) :支持向量机,large margin classifier大间距分类器  --------怎么从逻辑回归一点点修改成SVM...
您举报文章:Andrew Ng, High Speed Obstacle Avoidance using Monocular Visionand Reinforcement Learning阅读笔记