Autonomous exploration of mobile robots through deep neural networks

文章来源

这是一篇选自《INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS》的文章。2017年July-August七八月份发表的:
July-August 2017: 1–9
DOI: 10.1177/1729881417703571
journals.sagepub.com/home/arx


通过DNN进行移动机器人的自动exploration
作者:Lei Tai [^1] , Shaohua Li [^2] and Ming Liu[^2]

摘要

机器人自主探索问题往往是解决机器人探索未知环境的需求。我们describe了一个室内探索算法,这个算法的hierarchical(adj. 按等级划分的; 等级制度的;)structure 混合了几种CNN层和decision-making process。The whole system is trained end to end by taking only visual information (RGB-D information) as input and generates a sequence of main moving direction as output so that the robot achieves autonomous exploration ability.The robot is a TurtleBot with a Kinect mounted on it. The model is trained and tested in a real world environment. And the training data set is provided for download. The outputs of the test data are compared with the human decision.我们使用Gaussian process latent variable model来可视化最后convolutional layer的feature map,以此证明了我们这种DNN形式探索的有效性。
我们同时展示了一个novel和lightweight的深度学习library libcnn来解决机器人的深度学习process。

PS:其实最后只是使用了depth map

Keywords

Robot exploration, deep learning, CNN

Date received: 28 July 2016; accepted: 23 February 2017

Introduction【背景】

----Previous approaches for exploration are mostly based on a probabilistic model and calculated cost maps, like the work of Stachniss et al1.The advantage of probability based models is that they take into consideration uncertainty to describe the real-world cases. However, most of these approaches only utilize geometric information without any cognitive process.【过去师兄概率模型和计算后的costmap,比如工作1,这样的好处是有考虑到现实世界的不确定性。但是这些东西大概率只是使用到了geometric information 但是没有使用cognitive process。】2 3 From the perspective of intelligence, dealing with input directly to generate output without further processing of input information is a kind of low-level intelligence. It will be more satisfactory if a mobile robot could imitate the way human beings deal with such a task.Fortunately, deep learning, with its advantage in hierarchical feature extraction,【分层特征提取】 provides a potential solution for this problem.

Motivation and bio-inspired perception【动机和生物启发的感知?】

【首先bb了一些ANN的知识,此处略去】 Regarding the decision-making, a recent work by Google DeepMind (http://www.deepmind.com) has shown that the decision-making process could be learned using a deep reinforcement learning model on the Q-functions.
至于决策,谷歌DeepMind一些工作表明DMP可以被深度强化学习(model on Q-functions的)所学习。4 5
Note that all these state-of-art methods tried to solve only one aspect of perception, or with strong assumptions on the distinctiveness(独特性) of the input features (e.g., pixels from a gameplay)
However, a practical robotic application is usually conducted with uncertainties in observations. This requirement makes it hard to design an effective and unified deep network that is able to realize a complete – though maybe seemingly simple – robotic task. Despite these considerations, we present in this article a unified deep network that is able to perform efficient and human-like robotic exploration in real time.【之前的工作都只有perception方面或者对输入的图像有强独立性的假设,我们就这些问题展现了一个同意的DNN 能够表现出高效且human-like的实时robotic exploration能力】
本工作试图将感知和控制与单一的深层网络相结合。proposed structure fuses convolutional neural network(CNN) with the decision-making process.

The CNN structure is used to detect and comprehend visual features and the fully connected layers are for decision-making. Except for that, no additional modules or algorithms are required for the execution of the task. Overall, we present a complete solution to the autonomous exploration based on a single network structure.

Although there are several libraries for deep learning in computer vision and speech
recognition, we still need an ultra-fast and reliable library for robotic applications.
As part of the contributions, we hereby present a novel deep-learning library – libcnn,which is optimized for robotics in terms of lightweight and flexibility. It is used to support the implementation of all the related modules in this article

Contributions

We address the following contributions of this article:
We present a novel deep-learning library especially
optimized for robotic applications. Its featured modules include scene labelling, object recognition and decision-making.
We present a deep-network solution towards humanlike exploration for a mobile robot. It results in high similarity between the robotic and human decisions,
leading to effective and efficient robotic exploration. This is the first work to en-couple both robotic perception and control in a real environment with a single network.
A large indoor corridor data set with human decision label is provided for download.
The result of a test is compared with human decision quantitatively and visualization of feature maps are shown by Gaussian process latent variable model (GPLVM).

Related Work

Following dropout, a more generalized algorithm,namely ‘Drop-Connect’ was proposed.Instead of dropping out nodes in a network, a drop-connect network drops out
connections, that is, weights of a neural network and proves that the dropping of nodes is just a special case of the proposed network.
比起drop out 是drop层中的node,drop-connect操作是将部分连接drop掉。drop out去掉节点只是drop-connect的一个特例。
2014年 a fully CNN was proposed by Long6that highly reduces the computation redundancy and could be adapted to inputs of an arbitrary size.

然后就开始介绍自己的模型了,就是简单的用CNN加上soft_max这样输出不同方向的概率,然后不同的概率还更具不同的cofidence进行加权。

Exploration and confidence-based decision-making

在这里插入图片描述
CNNmodel如上图Figure2

  1. 只是用depth map作为网络输入
  2. depth可以很好的表征哪里是可通行的
    传统的CNN使用来提取特征的,然后后面跟着全连接层。最后一个weak classifier——soft-max classifier来分类。
    比较了两种方法:
  3. 我们的soft-max输出后面是有五个cmd组成,和传统的CNN分类有点像
  4. confidence-based decision-making strategy.虽然也用到了soft-max但是输出的output应该被当成某个指令的可能性,也就是agent对这个指令的置信度 。和上面比起来为的是解决winner-take-all的shortcoming.比如,当左转的可能性是0.3,并且直走的可能性是0.29的时候,按照第一点这样的策略,会直接左转,但是其实agent不确定左转还是右转,如果让c1~c5代表每一个cmd的执行读,那么总的方向就是在这里插入图片描述

(感觉第二种比较好,有一种线性化的感觉,最后速度是由置信度合成的。

网络设置

原本的depth map size from Kinect is 640 * 480,
downsampled to 1/4,that is ,160*120( for reduce computational cost)
then
downsampled depth map is put into 3-stage ‘convolution + activation + pooling’

The first convolution layer uses 32 convolution kernels of size 5 x5, followed by a ReLU layer and a 2 x2 pooling layer with stride 2.
The second stage of convolution+ activation + pooling is the same as the first stage .
For the third stage, 64 convolutional kernels of size 5 x are used, with no change of the ReLU layer and pooling layer .
This results in 64 feature maps of size 20 x15.

The fully connected layer is made up of five nodes. The last layer represents the scoring
of each output state. The control commands consist of five states: one for going straightforward, two for turning left and two for turning right as previously mentioned. The final decision is calculated by applying the soft-max function to the scores of the five possible states.

Experiments and results

turtlebot2(装备Microsoft Kinect sensor ,有效感知范围是800mm to 4000mm)和没有GPU的interCPU电脑

Human instruction and data gathering

就人开turtlebot然后收集指令和画面,
在这里插入图片描述
收集的画面

Sample results and evaluation

为了评价代理决策和人类决策的相似性,我们在以下两个方面对我们的方法进行了评价:第一,机器人生成命令和人的控制命令的一致性 感知命令;其次,代理探索轨迹与人类探索轨迹的相似性



  1. ↩︎ ↩︎
  2. ↩︎
  3. ↩︎
  4. ↩︎
  5. ↩︎
  6. ↩︎
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值