关闭

Andrew Ng, High Speed Obstacle Avoidance using Monocular Visionand Reinforcement Learning阅读笔记

标签: 增强学习
85人阅读 评论(0) 收藏 举报
分类:

 

 

High Speed Obstacle Avoidance using Monocular Vision and Reinforcement Learning阅读笔记


原作者:Jeff Michels, Ashutosh Saxena, Andrew Y. Ng

 

Introduction

 

High speed navigation and obstacleavoidance, remote control car, unstructured outdoor environments.

 

Combines reinforcement learning, computergraphics and computer vision

 

A monocular vision obstacle detectionalgorithm based on supervised learning

 

Collect dataset of several thousand images,each correlated with laser scanner that gives the nearest obstacle in eachdirection

 

A supervised learning algorithm canaccurately estimate the distances to the obstacles. This is the basic visionsystem. The output is fed into a higher level controller trained usingreinforcement learning.

 

Use a graphical driving simulator, syntheticimages instead of real images and laser scan data

 

Use graphical simulator to train reinforcementlearning algorithm, systematically vary the level of graphical realism

 

Using low-to-medium quality synthetic imagesto train, can give reasonable results in real test.

 

Combine synthetic and real images to train,the result is better than either one.

 

Relatedwork

3 categories of cues for depth fromtwo-dimensional images: monoculars cues, stereopsis, and motion parallax

 

Monocular-vision and apprenticeship learningwas used to drive a car on highly structured roads.

 

Visionsystem

Divide each image into vertical stripes

 

Synthetic images are inexpensive, and nonoise in the ground truth.

 

In order to emphasize multiplicative ratherthan additive errors, we converted each distance to a log scale. Experimentstraining with linear distance give poor results.

 

Each image is divided into 16 stripes. Eachstripe is divided into 11vertaclly overlapping windows.

 

Coefficients representing texture energiesand gradients are calculated as feature vector.

 

Transform from RGB to YCbCr. For eachwindow, we apply Laws’ masks to measure texture energies.

 

Texture gradients are an important cue indepth estimation.

 

In order to calculate texture gradient thatrobust to noise in the image, we use a variant of Radon transform and a variantof the Harris corner detector.

 

We trained linear models to estimate thelog distance to the nearest obstacle in a stripe.

 

Simple minimization of the sum of squarederrors produced nearly identical results to the more complex methods.

 

The real error metric to optimize in thiscase should be the mean time to crash.

 

Vehicle will be driving in unstructuredterrain, experiments in this domain are not easily repeatable.

 

Let /alpha be a possible steeringdirection, chosen by picking the direction correspond to the farthest predicteddistance.

 

To calculate the relative depth error, weremove the mean from the true and estimated log-distances for each image.

 

Letting hazard distance = 5m denote thedistance at which an obstacle becomes a hazard.

 

We combined the system trained on syntheticdata with the one trained on real images in order to reduce the hazard rateerror, but did not produce any improvement over real-image system.

 

Control

 

We model the RC car control problem as a Markov decision process (MDP).


We then used the PEGASUS policy search algorithm.


The reward function was given as R(s)=-abs(v_{desired}-v_{actural}) - K * Crashed, where v_{desired} and v_{actual} are the desired and actual speeds of the car, Crashed is a binary variable stating whether or not the car has crashed in that time step. Thus, the vehicle attempts to maintain the desired forward speed while minimizing contact with obstacles.


DragonFly spy camera, 320*240 pixel resolution, 20Hz. Steering and throttle commands are sent back to the RC transmitter from laptop.


Experimental Results

To be read.


Conclusion and Discussion

The experiments with the graphical simulator show that model-based RL holds great promise even in settings involving complex environments and complex perception. 


A vision system trained on computer graphics was able to give reasonable depth estimate on real image data, and a control policy trained in a graphical simulator worked well on real autonomous driving. 


 

0
0

查看评论
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
    个人资料
    • 访问:84次
    • 积分:10
    • 等级:
    • 排名:千里之外
    • 原创:0篇
    • 转载:0篇
    • 译文:1篇
    • 评论:0条
    文章分类
    文章存档