课程 12: Lesson 4 -Search
1. Motion Planning
运动规划:有一个地图,给定一个起点,给定一个终点,给定成本函数(最简单的就是按某个路线行驶所需要消耗的时间),程序目标是找到成本最小的路径。
2. 练习:Compute Cost
转向也可能需要计算成本
3. 练习: Compute Cost 2
4. 练习:Optimal Path
5. 练习:Optimal Path 2
6. 练习:Maze
7. 练习:Maze 2
8. 练习:First Search Program
这里有个练习题,返回搜索到目标后的最少花费usedCost,返回list [usedCost, x, y].这里没有打印出路径,因为没有要求。我实现的代码如下:
# -*- coding: utf-8 -*-
# ----------
# User Instructions:
#
# Define a function, search() that returnsa list
# in the form of [optimal path length,row, col]. For
# the grid shown below, your functionshould output
# [11, 4, 5].
#
# If there is no valid path from the startpoint
# to the goal, your function should returnthe string
# 'fail'
# ----------
# Grid format:
# 0 = Navigable space
# 1 = Occupied space
grid = [[0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 1, 1, 1, 0],
[0, 0, 0, 0, 1, 0]]
init = [0, 0]
goal = [len(grid)-1, len(grid[0])-1]
cost = 1
delta = [[-1, 0], # go up
[ 0,-1], # go left
[ 1, 0], # go down
[ 0, 1]] # go right
delta_name = ['^', '<', 'v', '>']
def search(grid,init,goal,cost):
# ----------------------------------------
# insert code here
# ----------------------------------------
path = []
openlist = []
openlist.append([0, init[0], init[1]])
closedlist = []
obstacleslist = []
while len(openlist) > 0:
newOpenList = []
for i in range(len(openlist)):
# find all valid neighbor
for j in range(len(delta)):
x = openlist[i][1] +delta[j][0]
y = openlist[i][2] +delta[j][1]
if (x >= 0 and x <len(grid) and \
y >= 0 and y <len(grid[0])):
if [x, y] in closedlist:
continue
elif [x, y] inobstacleslist:
continue
elif grid[x][y] == 1:
obstacleslist.append([x,y])
continue
elif x == goal[0] and y ==goal[1]:
path = [openlist[i][0]+ cost, goal[0], goal[1]]
closedlist.append([x,y])
return path
else:
hasInclude = False
for k inrange(len(newOpenList)):
ifnewOpenList[k][1] == x and newOpenList[k][2] == y:
hasInclude = True
ifopenlist[i][0] + cost < newOpenList[k][0]:
newOpenList[k][0] = openlist[i][0] + cost
break
if not hasInclude:
newOpenList.append([openlist[i][0] + cost, x, y])
if [openlist[i][1], openlist[i][2]] not in closedlist:
closedlist.append([openlist[i][1], openlist[i][2]])
#print("Test, closelist = ", closedlist)
#print("Test, openlist = ", openlist)
#print("new open list = ", newOpenList)
#print("----------------------")
openlist = newOpenList
path = 'Fail'
return path
print(search(grid,init,goal,cost))
可以看到,我用的是广度优先搜索算法。
9. 练习:Expansion Grid
完成
10. 练习:Print Path
完成,老师提供的算法比我的简洁。
11. A* 算法
在网格搜索中,它的效率比普通的搜索算法(比如前面练习的广度优先算法),要高。
重要的是,它有个启发式函数。
h(x,y) 返回当前点(x,y)到目标goal的距离,视频上说是“欧几里得”距离,实际上讲解用的是曼哈顿距离。计算时不考虑中间可能出现的障碍物。
g值,返回从起始点到当前点(x,y)的实际距离,就是走过来的最小代价。起始点的g值是0.
f = g + h(x,y) 对当前考察的点(open列表中遍历到的(x,y)),计算这个和值。
f函数被称为evaluation function,评价函数。
在用open列表扩展节点时,优先采用f值小的点。
12. 练习:Implement A*
例子中的启发函数值是给定的一个已知列表值,它直接给了所在点到目标的曼哈顿距离。(注:如果地图中能对角线方向移动,则距离不能用曼哈顿距离。。)
# -*- coding: utf-8 -*-
# -----------
# User Instructions:
#
# Modify the the search function so thatit becomes
# an A* search algorithm as defined in theprevious
# lectures.
#
# Your function should return the expandedgrid
# which shows, for each element, the countwhen
# it was expanded or -1 if the element wasnever expanded.
#
# If there is no path from init to goal,
# the function should return the string'fail'
# ----------
grid = [[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0]]
heuristic = [[9, 8, 7, 6, 5, 4],
[8, 7, 6, 5, 4, 3],
[7, 6, 5, 4, 3, 2],
[6, 5, 4, 3, 2, 1],
[5, 4, 3, 2, 1, 0]]
init = [0, 0]
goal = [len(grid)-1, len(grid[0])-1]
cost = 1
delta = [[-1, 0 ], # go up
[ 0, -1], # go left
[ 1, 0 ], # go down
[ 0, 1 ]] # go right
delta_name = ['^', '<', 'v', '>']
def search(grid,init,goal,cost,heuristic):
# ----------------------------------------
# modify the code below
# ----------------------------------------
closed = [[0 for col in range(len(grid[0]))] for row inrange(len(grid))]
closed[init[0]][init[1]] = 1
expand = [[-1 for col in range(len(grid[0]))] for row inrange(len(grid))]
action = [[-1 for col in range(len(grid[0]))] for row inrange(len(grid))]
x = init[0]
y = init[1]
g = 0
f = heuristic[x][y]
open = [[f, g, x, y]]
found = False # flag that is setwhen search is complete
resign = False # flag set if we can't find expand
count = 0
while not found and not resign:
if len(open) == 0:
resign = True
return "Fail"
else:
open.sort()
open.reverse()
next = open.pop()
x = next[2]
y = next[3]
g = next[1]
f = next[0]
expand[x][y] = count
count += 1
if x == goal[0] and y == goal[1]:
found = True
else:
for i in range(len(delta)):
x2 = x + delta[i][0]
y2 = y + delta[i][1]
if x2 >= 0 and x2 <len(grid) and y2 >=0 and y2 < len(grid[0]):
if closed[x2][y2] == 0and grid[x2][y2] == 0:
g2 = g + cost
h2 =heuristic[x2][y2]
f2 = g2 + h2
open.append([f2,g2, x2, y2])
closed[x2][y2] = 1
return expand
print(search(grid,init,goal,cost,heuristic))
13. A* in Action
14. Dynamic Programming
动态规划法
15. 练习:Computing Value
16. 练习:Computing Value 2
17. 练习: Value Program
# -*- coding: utf-8 -*-
# ----------
# User Instructions:
#
# Create a function compute_value whichreturns
# a grid of values. The value of a cell isthe minimum
# number of moves required to get from thecell to the goal.
#
# If a cell is a wall or it is impossibleto reach the goal from a cell,
# assign that cell a value of 99.
# ----------
grid = [[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0]]
goal = [len(grid)-1, len(grid[0])-1]
cost = 1 # the cost associated with movingfrom a cell to an adjacent one
delta = [[-1, 0 ], # go up
[ 0, -1], # go left
[ 1, 0 ], # go down
[ 0, 1 ]] # go right
delta_name = ['^', '<', 'v', '>']
def compute_value(grid,goal,cost):
# ----------------------------------------
# insert code below
# ----------------------------------------
# make sure your function returns a grid of values as
# demonstrated in the previous video.
obstaclesVal = 99
value = [[obstaclesVal for row in range(len(grid[0]))] for col inrange(len(grid))]
isChanged = True
while isChanged:
isChanged = False
for x in range(len(grid)):
for y in range(len(grid[0])):
if goal[0] == x and goal[1] ==y:
if value[x][y] > 0:
value[x][y] = 0
isChanged = True
elif grid[x][y] == 0:
for a in range(len(delta)):
x2 = x + delta[a][0]
y2 = y + delta[a][1]
if x2 >= 0 and x2< len(grid) and y2 >= 0 and y2 < len(grid[0]) and grid[x2][y2] == 0:
v2 = value[x2][y2]+ cost
if v2 <value[x][y]:
isChanged =True
value[x][y] =v2
return value
print(compute_value(grid,goal,cost))
18. 练习:Optimum policy
在上一个代码的基础上,加上每个空格位的方向,返回优化的policy。
# -*- coding: utf-8 -*-
grid = [[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0]]
init = [0, 0]
goal = [len(grid)-1, len(grid[0])-1]
cost = 1 # the cost associated with movingfrom a cell to an adjacent one
delta = [[-1, 0 ], # go up
[ 0, -1], # go left
[ 1, 0 ], # go down
[ 0, 1 ]] # go right
delta_name = ['^', '<', 'v', '>']
def optimum_policy(grid,goal,cost):
# ----------------------------------------
# modify code below
# ----------------------------------------
value = [[99 for row in range(len(grid[0]))] for col inrange(len(grid))]
policy = [[' ' for row in range(len(grid[0]))] for col inrange(len(grid))]
change = True
while change:
change = False
for x in range(len(grid)):
for y in range(len(grid[0])):
if goal[0] == x and goal[1] ==y:
if value[x][y] > 0:
value[x][y] = 0
policy[x][y] = '*'
change = True
elif grid[x][y] == 0:
for a in range(len(delta)):
x2 = x + delta[a][0]
y2 = y + delta[a][1]
if x2 >= 0 and x2< len(grid) and y2 >= 0 and y2 < len(grid[0]) and grid[x2][y2] == 0:
v2 = value[x2][y2]+ cost
if v2 <value[x][y]:
change = True
value[x][y] =v2
policy[x][y] =delta_name[a]
for i in range(len(value)):
print(value[i])
return policy
print(optimum_policy(grid,goal,cost))
19. 练习:Left Turn Policy
这个编程,如果不仔细听他的提示,会有点麻烦。
代码:
# -*- coding: utf-8 -*-
forward = [[-1, 0], # go up
[ 0, -1], # go left
[ 1, 0], # go down
[ 0, 1]] # go right
forward_name = ['up','left', 'down', 'right']
# action has 3 values:right turn, no turn, left turn
action = [-1, 0, 1]
action_name = ['R','#', 'L']
# EXAMPLE INPUTS:
# grid format:
# 0 = navigable space
# 1 = unnavigable space
grid = [[1, 1, 1, 0,0, 0],
[1, 1, 1, 0, 1, 0],
[0, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 1, 1],
[1, 1, 1, 0, 1, 1]]
init = [4, 3, 0] #given in the form [row,col,direction]
# direction = 0: up
# 1: left
# 2: down
# 3: right
goal = [2, 0] # givenin the form [row,col]
cost = [2, 1, 20] #cost has 3 values, corresponding to making
# a right turn, no turn, anda left turn
grid = [[0, 0, 0, 0,1, 1],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 1, 1, 1, 0],
[0, 0, 0, 0, 1, 0]]
init = [4, 5, 0]
goal = [4, 3]
cost = [1, 1, 1]
# EXAMPLE OUTPUT:
# callingoptimum_policy2D with the given parameters should return
# [[' ', ' ', ' ','R', '#', 'R'],
# [' ', ' ', ' ', '#', ' ', '#'],
# ['*', '#', '#', '#', '#', 'R'],
# [' ', ' ', ' ', '#', ' ', ' '],
# [' ', ' ', ' ', '#', ' ', ' ']]
# ----------
# ----------------------------------------
# modify code below
#----------------------------------------
def optimum_policy2D(grid,init,goal,cost):
value = [[[999 for row inrange(len(grid[0]))] for col in range(len(grid))],
[[999 for row inrange(len(grid[0]))] for col in range(len(grid))],
[[999 for row inrange(len(grid[0]))] for col in range(len(grid))],
[[999 for row inrange(len(grid[0]))] for col in range(len(grid))]]
policy = [[[' ' for row inrange(len(grid[0]))] for col in range(len(grid))],
[[' ' for row inrange(len(grid[0]))] for col in range(len(grid))],
[[' ' for row inrange(len(grid[0]))] for col in range(len(grid))],
[[' ' for row in range(len(grid[0]))]for col in range(len(grid))]]
policy2D = [[' ' for row inrange(len(grid[0]))] for col in range(len(grid))]
#policy2D[goal[0]][goal[1]] = '*'
change = True
while change:
change = False
# go through all grid cells andcaculate values
for x in range(len(grid)):
for y in range(len(grid[0])):
for orientation inrange(len(forward)):
if x == goal[0] and y ==goal[1]:
if value[orientation][x][y] > 0:
change = True
value[orientation][x][y] = 0
policy[orientation][x][y] = '*'
elif grid[x][y] == 0:
# calulate the three was topropagate value
for i inrange(len(action)):
o2 = (orientation +action[i] + len(forward)) % len(forward)
x2 = x +forward[o2][0]
y2 = y + forward[o2][1]
cost_step = cost[i]
if x2 >= 0 andx2 < len(grid) and y2 >=0 and y2 < len(grid[0]) and grid[x2][y2] == 0:
v2 = value[o2][x2][y2]+ cost_step
if v2 <value[orientation][x][y]:
value[orientation][x][y] = v2
policy[orientation][x][y] = action_name[i]
change = True
#print('print policy3d------------')
#for i in range(len(policy)):
# print(policy[i])
print('===============generate policy2D')
x = init[0]
y = init[1]
orientation = init[2]
policy2D[x][y] = policy[orientation][x][y]
while policy[orientation][x][y] != '*':
if policy[orientation][x][y] == '#':
newOri = orientation
elif policy[orientation][x][y] == 'R':
newOri = (orientation - 1 + len(forward))% len(forward)
elif policy[orientation][x][y] == 'L':
newOri = (orientation + 1) %len(forward)
x = x + forward[newOri][0]
y = y + forward[newOri][1]
orientation = newOri
policy2D[x][y] =policy[orientation][x][y]
return policy2D
print("-----------------------------")
p2D =optimum_policy2D(grid,init,goal,cost)
if p2D != 'Fail':
for i in range(len(p2D)):
print(p2D[i])
自我总结:Astar适合知道目标位置,但是不知道完全地图的情况,比如迷宫探索。因为搜索过程中,容易发现到错误的分支,然后退回的情况。H函数是可以用大概估计的值。
而如果知道全部地图,用动态规划法,会精确和易于变化。
这个问题里面,要注意是限制了不能原地倒车转向,只有3个action。符合一般生活中存在的单向行驶的道路的现象。
发现,在拥有几种同等代价的路线的情形下,不会推荐几种,只会显示一种。
比如当初始数据是:
grid = [[0, 0, 0, 0,1, 1],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 1, 1, 1, 0],
[0, 0, 0, 0, 1, 0]]
init = [4, 5, 0]
goal = [1,0]
cost = [1, 1, 1]
会打印一个类似这样的结果:
['L', '#', '#', 'L', '', ' ']
['*', ' ', ' ', 'R','#', 'L']
[' ', ' ', ' ', ' ', '', '#']
[' ', ' ', ' ', ' ', '', '#']
[' ', ' ', ' ', ' ', '', '#']
这是由于生成当前位置某个特定方向的3d策略值的时候,它只考虑到了下一个位置中其中的一个最小值,没有考虑同时出现多个最小值的情况。policy[orientation][x][y]也只是存了一个最佳的方向之一。而且后面在生成policy2D路径的时候,也是第一次遇到终点,就会结束。
查看百度地图的路径规划效果,看到查询2地的路径时,往往会给出至少2-3个路线。
觉得这个课程的代码要修改以实现显示最好的前几种路径的话,要指定可显示的最多的路径数,然后使用Astar算法完成从起始点到目标点的候选路径的选择。
在用 动态规划法获取起始点和目标点附近的那些value值(策略值)后,可以将它们作为Astar算法中的H值(启发函数值),然后f=g+H。
从起始点开始,每次遇到一个分支,就考虑是否需要增加一条新的路径,如果已有路径数达到计划缓存的路径数上限(可以大于准备显示的路径数,防止前期的失败的路径过多),就替换掉最高f值的路径。不需要close数组,替换成判断这个有方向的候选点是否已经存在于这次open点所在的path中了。 在append一个open点的时候,要保存点及方向和这个点所在的路径。如果候选点的h值还是非法值(这里的h指的是每个方向都对应一个数组,有的格子在某种方向时,不能到达终点),说明是不可能到达终点的,不用添加到open列表中。
课程标准代码如下:
# -*- coding: utf-8-*-
"""
Created on Fri May 1819:21:36 2018
@author: Administrator
"""
# ----------
# User Instructions:
#
# Implement the functionoptimum_policy2D below.
#
# You are given a carin grid with initial state
# init. Your task isto compute and return the car's
# optimal path to theposition specified in goal;
# the costs for eachmotion are as defined in cost.
#
# There are four motiondirections: up, left, down, and right.
# Increasing the indexin this array corresponds to making a
# a left turn, anddecreasing the index corresponds to making a
# right turn.
forward = [[-1, 0], # go up
[ 0, -1], # go left
[ 1, 0], # go down
[ 0, 1]] # go right
forward_name = ['up','left', 'down', 'right']
# action has 3 values:right turn, no turn, left turn
action = [-1, 0, 1]
action_name = ['R','#', 'L']
# EXAMPLE INPUTS:
# grid format:
# 0 = navigable space
# 1 = unnavigable space
grid = [[1, 1, 1, 0,0, 0],
[1, 1, 1, 0, 1, 0],
[0, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 1, 1],
[1, 1, 1, 0, 1, 1]]
init = [4, 3, 0] #given in the form [row,col,direction]
# direction = 0: up
# 1: left
# 2: down
# 3: right
goal = [2, 0] # givenin the form [row,col]
cost = [2, 1, 20] #cost has 3 values, corresponding to making
# a right turn, no turn, anda left turn
grid = [[0, 0, 0, 0,1, 1],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 1, 1, 1, 0],
[0, 0, 0, 0, 1, 0]]
init = [4, 5, 0]
goal = [4, 3]
cost = [1, 1, 1]
# EXAMPLE OUTPUT:
# callingoptimum_policy2D with the given parameters should return
# [[' ', ' ', ' ','R', '#', 'R'],
# [' ', ' ', ' ', '#', ' ', '#'],
# ['*', '#', '#', '#', '#', 'R'],
# [' ', ' ', ' ', '#', ' ', ' '],
# [' ', ' ', ' ', '#', ' ', ' ']]
# ----------
#----------------------------------------
# modify code below
#----------------------------------------
def optimum_policy2D(grid,init,goal,cost):
value = [[[999 for row inrange(len(grid[0]))] for col in range(len(grid))],
[[999 for row inrange(len(grid[0]))] for col in range(len(grid))],
[[999 for row inrange(len(grid[0]))] for col in range(len(grid))],
[[999 for row inrange(len(grid[0]))] for col in range(len(grid))]]
policy = [[[' ' for row inrange(len(grid[0]))] for col in range(len(grid))],
[[' ' for row inrange(len(grid[0]))] for col in range(len(grid))],
[[' ' for row inrange(len(grid[0]))] for col in range(len(grid))],
[[' ' for row inrange(len(grid[0]))] for col in range(len(grid))]]
policy2D = [[' ' for row inrange(len(grid[0]))] for col in range(len(grid))]
#policy2D[goal[0]][goal[1]] = '*'
change = True
while change:
change = False
# go through all grid cells andcaculate values
for x in range(len(grid)):
for y in range(len(grid[0])):
for orientation inrange(len(forward)):
if x == goal[0] and y ==goal[1]:
ifvalue[orientation][x][y] > 0:
change = True
value[orientation][x][y] = 0
policy[orientation][x][y]= '*'
elif grid[x][y] == 0:
# calulate the threewas to propagate value
for i inrange(len(action)):
o2 = (orientation +action[i] + len(forward)) % len(forward)
x2 = x +forward[o2][0]
y2 = y +forward[o2][1]
cost_step = cost[i]
if x2 >= 0 andx2 < len(grid) and y2 >=0 and y2 < len(grid[0]) and grid[x2][y2] == 0:
v2 =value[o2][x2][y2] + cost_step
if v2 <value[orientation][x][y]:
value[orientation][x][y] = v2
policy[orientation][x][y]= action_name[i]
change =True
#print('print policy3d------------')
#for i in range(len(policy)):
# print(policy[i])
print('===============generate policy2D')
x = init[0]
y = init[1]
orientation = init[2]
policy2D[x][y] = policy[orientation][x][y]
while policy[orientation][x][y] != '*':
newOri = -1
if policy[orientation][x][y] == '#':
newOri = orientation
elif policy[orientation][x][y] == 'R':
newOri = (orientation - 1 +len(forward)) % len(forward)
elif policy[orientation][x][y] == 'L':
newOri = (orientation + 1) %len(forward)
else: # ' '
print('Failed to find a path!')
break;
x = x + forward[newOri][0]
y = y + forward[newOri][1]
orientation = newOri
policy2D[x][y] =policy[orientation][x][y]
return policy2D
print("-----------------------------")
p2D =optimum_policy2D(grid,init,goal,cost)
if p2D != 'Fail':
for i in range(len(p2D)):
print(p2D[i])
20. Planning Conclusion
这节课程主要讲了离散空间的2中搜索算法:
1. A*算法,它使用一种启发式搜索来发现路径。
2. 动态规划法,它会找出一个整体的策略,策略中包含到任意一点的路径。
这2种算法是人工智能的路径规划中的主要方法。
下节课程会讲在机器人活动中的实际应用,以及连续状态空间,还有什么是“控制”。