Andrew Ng, High Speed Obstacle Avoidance using Monocular Visionand Reinforcement Learning阅读笔记

最新推荐文章于 2018-12-06 11:00:06 发布

silentriverg

最新推荐文章于 2018-12-06 11:00:06 发布

阅读量441

点赞数

分类专栏：增强学习文章标签：增强学习

增强学习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

High Speed Obstacle Avoidance using Monocular Vision and Reinforcement Learning阅读笔记

原作者：Jeff Michels, Ashutosh Saxena, Andrew Y. Ng

Introduction

High speed navigation and obstacleavoidance, remote control car, unstructured outdoor environments.

Combines reinforcement learning, computergraphics and computer vision

A monocular vision obstacle detectionalgorithm based on supervised learning

Collect dataset of several thousand images,each correlated with laser scanner that gives the nearest obstacle in eachdirection

A supervised learning algorithm canaccurately estimate the distances to the obstacles. This is the basic visionsystem. The output is fed into a higher level controller trained usingreinforcement learning.

Use a graphical driving simulator, syntheticimages instead of real images and laser scan data

Use graphical simulator to train reinforcementlearning algorithm, systematically vary the level of graphical realism

Using low-to-medium quality synthetic imagesto train, can give reasonable results in real test.

Combine synthetic and real images to train,the result is better than either one.

Relatedwork

3 categories of cues for depth fromtwo-dimensional images: monoculars cues, stereopsis, and motion parallax

Monocular-vision and apprenticeship learningwas used to drive a car on highly structured roads.

Visionsystem

Divide each image into vertical stripes

Synthetic images are inexpensive, and nonoise in the ground truth.

In order to emphasize multiplicative ratherthan additive errors, we converted each distance to a log scale. Experimentstraining with linear distance give poor results.

Each image is divided into 16 stripes. Eachstripe is divided into 11vertaclly overlapping windows.

Coefficients representing texture energiesand gradients are calculated as feature vector.

Transform from RGB to YCbCr. For eachwindow, we apply Laws’ masks to measure texture energies.

Texture gradients are an important cue indepth estimation.

In order to calculate texture gradient thatrobust to noise in the image, we use a variant of Radon transform and a variantof the Harris corner detector.

We trained linear models to estimate thelog distance to the nearest obstacle in a stripe.

Simple minimization of the sum of squarederrors produced nearly identical results to the more complex methods.

The real error metric to optimize in thiscase should be the mean time to crash.

Vehicle will be driving in unstructuredterrain, experiments in this domain are not easily repeatable.

Let /alpha be a possible steeringdirection, chosen by picking the direction correspond to the farthest predicteddistance.

To calculate the relative depth error, weremove the mean from the true and estimated log-distances for each image.

Letting hazard distance = 5m denote thedistance at which an obstacle becomes a hazard.

We combined the system trained on syntheticdata with the one trained on real images in order to reduce the hazard rateerror, but did not produce any improvement over real-image system.

Control

We model the RC car control problem as a Markov decision process (MDP).

We then used the PEGASUS policy search algorithm.

The reward function was given as R(s)=-abs(v_{desired}-v_{actural}) - K * Crashed, where v_{desired} and v_{actual} are the desired and actual speeds of the car, Crashed is a binary variable stating whether or not the car has crashed in that time step. Thus, the vehicle attempts to maintain the desired forward speed while minimizing contact with obstacles.

DragonFly spy camera, 320*240 pixel resolution, 20Hz. Steering and throttle commands are sent back to the RC transmitter from laptop.

Experimental Results

To be read.

Conclusion and Discussion

The experiments with the graphical simulator show that model-based RL holds great promise even in settings involving complex environments and complex perception.

A vision system trained on computer graphics was able to give reasonable depth estimate on real image data, and a control policy trained in a graphical simulator worked well on real autonomous driving.

silentriverg

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Andrew Ng, High Speed Obstacle Avoidance using Monocular Visionand Reinforcement Learning阅读笔记

High Speed Obstacle Avoidance using Monocular Vision and Reinforcement Learning阅读笔记原作者：Jeff Michels, Ashutosh Saxena, Andrew Y. Ng Introduction High speed navigation and obstacleavoidanc
复制链接

扫一扫