High Speed Obstacle Avoidance using Monocular Vision and Reinforcement Learning阅读笔记
原作者:Jeff Michels, Ashutosh Saxena, Andrew Y. Ng
Introduction
High speed navigation and obstacleavoidance, remote control car, unstructured outdoor environments.
Combines reinforcement learning, computergraphics and computer vision
A monocular vision obstacle detectionalgorithm based on supervised learning
Collect dataset of several thousand images,each correlated with laser scanner that gives the nearest obstacle in eachdirection
A supervised learning algorithm canaccurately estimate the distances to the obstacles. This is the basic visionsystem. The output is fed into a higher level controller trained usingreinforcement learning.
Use a graphical driving simulator, syntheticimages instead of real images and laser scan data
Use graphical simulator to train reinforcementlearning algorithm, systematically vary the level of graphical realism
Using low-to-medium quality synthetic imagesto train, can give reasonable results in real test.
Combine synthetic and real images to train,the result is better than either one.
Relatedwork
3 categories of cues for depth fromtwo-dimensional images: monoculars cues, stereopsis, and motion parallax
Monocular-vision and apprenticeship learningwas used to drive a car on highly structured roads.
Visionsystem
Divide each image into vertical stripes
Synthetic images are inexpensive, and nonoise in the ground truth.
In order to emphasize multiplicative ratherthan additive errors, we converted each distance to a log scale. Experimentstraining with linear distance give poor results.
Each image is divided into 16 stripes. Eachstripe is divided into 11vertaclly overlapping windows.
Coefficients representing texture energiesand gradients are calculated as feature vector.
Transform from RGB to YCbCr. For eachwindow, we apply Laws’ masks to measure texture energies.
Texture gradients are an important cue indepth estimation.
In order to calculate texture gradient thatrobust to noise in the image, we use a variant of Radon transform and a variantof the Harris corner detector.
We trained linear models to estimate thelog distance to the nearest obstacle in a stripe.
Simple minimization of the sum of squarederrors produced nearly identical results to the more complex methods.
The real error metric to optimize in thiscase should be the mean time to crash.
Vehicle will be driving in unstructuredterrain, experiments in this domain are not easily repeatable.
Let /alpha be a possible steeringdirection, chosen by picking the direction correspond to the farthest predicteddistance.
To calculate the relative depth error, weremove the mean from the true and estimated log-distances for each image.
Letting hazard distance = 5m denote thedistance at which an obstacle becomes a hazard.
We combined the system trained on syntheticdata with the one trained on real images in order to reduce the hazard rateerror, but did not produce any improvement over real-image system.
Control
We model the RC car control problem as a Markov decision process (MDP).
We then used the PEGASUS policy search algorithm.
The reward function was given as R(s)=-abs(v_{desired}-v_{actural}) - K * Crashed, where v_{desired} and v_{actual} are the desired and actual speeds of the car, Crashed is a binary variable stating whether or not the car has crashed in that time step. Thus, the vehicle attempts to maintain the desired forward speed while minimizing contact with obstacles.
DragonFly spy camera, 320*240 pixel resolution, 20Hz. Steering and throttle commands are sent back to the RC transmitter from laptop.
Experimental Results
To be read.
Conclusion and Discussion
The experiments with the graphical simulator show that model-based RL holds great promise even in settings involving complex environments and complex perception.
A vision system trained on computer graphics was able to give reasonable depth estimate on real image data, and a control policy trained in a graphical simulator worked well on real autonomous driving.