A Novel GRU-RNN Network Model for Dynamic Path Planning of Mobile Robot-CSDN博客

本文链接：https://blog.csdn.net/kuizhao8951/article/details/105189928

其他人的工作

With the traditional APF method, it is easy for the mobile robot to become locally locked, for it to be subject to narrow path vibration, and for neighboring obstacles to be too close to allow planning of the path.
Weerakoon et al. [4] solves the deadlock problem by replacing the traditional function with an exponential function. The APF method is also combined with other intelligent algorithms to improve the parameters of the intelligent algorithm [5].
传统APF方法以及别人对 deadlock problem 的改进等

The tradi- tional ACO method does not deal well with the balance between premature problems and slow convergence speed. Chen et al. [6] proposes a combination of ‘‘scent pervasion’’ policy and ‘‘one minus search’’ strategy to pre-process grid maps to speed up algorithm convergence and quickly com- plete robot path planning. Cao
传统ACO(蚁群)算法对早熟问题和慢收敛问题的解决方案等.

However, in these study, the environ- ment and obstacles were taken to be static.
以上都是静态环境的工作.
Bodhale et al. [11] successfully implemented dynamic path planning by com- bining the potential field method with a Monte Carlo posi- tioning method.
APF+MC 实现动态
Rapidly-exploring ran- dom tree (RRT) and other methods rarely pay attention to the information on obstacle movement [14], [15].
Most previous studies have examined collision avoidance planning strategies within the reinforcement learn- ing (RL) framework [16]–[18]. Deep.
RRT 和其他算法不注重动态障碍物信息.
RL可以避开动态障碍物,下面就是别人怎么用RL做的.
Carrio et al. [18] used a combination of convolu- tional neural networks (CNN), gated recurrent unit (GRU)networks, and variant Q-learning to solve the problem of unmanned autonomous vehicle (UAV) control when only visual images were input.
Inoue et al. [19] proposed a novel method combining the rapidly-exploring tree and a long short-term memory (LSTM) network, which overcomes the difficulties involved in the acquisition of a large amount of training data

我们怎么做

主要目的:the develop-ment of a collision avoidance algorithm for a mobile robot.
ACOplus+APFplus作为老师
ACOplus:pheromone trail and state transition rules 来加速收敛
APFplus中 the influence of target point gravitation is removed to avoid local locking
学习系统GRU-RNN

TF变换模型

According to the conversion between the coordi- nate systems, the information of the robot and surrounding obstacles in the global coordinate system can be obtained.
文章把模拟环境的tf变换说明了一下,传统的ROS中的tf变化差不多
在这里插入图片描述

当然其实也没啥需要看的,就是tf变化如何把障碍物坐标系转换到全局

机器人模型

时间是离散的,T是采样时间,那么每过一个采样时间新的位置(x,y)就是
在这里插入图片描述

环境模型

The environment model is constructed as a grid model(但是这个栅格模式应该是连续的,不是离散的)
The robot works in a two-dimensional environment and the number of grids in space is m× n.
(x0, y0) as its origin point.
In order to ensure the safety of the robot, the boundary of the obstacle is expanded, by half the length of the robot.
A black grid indicates that the area is not accessible.
Obstacles in the mobile robot’s motion space can be divided into two types: known and unknown.
For obstacles in an unknown environment, because there is no positional infor- mation about obstacles in advance, they can be detected only by sensors carried by the robot itself, and new effective track points (including deterministic points and uncertain point), must be added to the program as the motion progresses.
起点和 mxn的大小啊已知和未知的障碍物啊通过激光来确定障碍物信息啊
不过这边貌似是规定了一个圆形区域,对检测的障碍物也有一定的规定
比如
顶点具有以下特征:
•一个顶点恰好位于传感器的检测边缘。
•在顶点的一侧有障碍物，而在另一侧没有。
•这个顶点可以从任何其他顶点看到
详情还是看论文,总之就是类似模拟了现实中的激光得到的point信息
当检测到的点不是物体的顶点的时候(侧面说明这个算法的适用范围是研究方形物体)使用APF
是顶点的时候可以使用ACO

整体算法

当物体不靠近的时候使用ACO,算法使用已知的环境信息和传感器检测到的障碍物信息来提供一个路径.当障碍物在机器人2m以内的时候使用APF,快速避免碰撞.为了加速ACO,the pheromone trail and state transition rules are improved.
The pseudo code of the autonomous collision avoidance algorithm of the teacher system can be found in the appendix.!!!
有代码✌
在这里插入图片描述
妈的原来只是这个.

然后拿了一个学习网络GRU-RNN NETWORK
在这里插入图片描述

结论

成果学习了,并且输入输出的算法时间比老师要快得多.并且适应新环境的能力不错.
Weakness: The disadvantage is that the training network needs samples generated by the teacher system, sometimes it is impossible to reach the target point accurately, although the probability is less than 2%
在未来，我们将专注于使用深度强化学习方法(如深度Q-learning (DQN)、确定性策略梯度算法(DDPG)、策略搜索、异步优势行为者-批评者(A3C)等)，通过学习机器人自身的成功经验或失败经验来处理机器人导航任务。这种方法不需要额外的教师系统来生成训练样本.