课程 12: Lesson 4 -Search

1.   Motion Planning


2.   练习:Compute Cost


3.   练习: Compute Cost 2

4.   练习:Optimal Path

5.   练习:Optimal Path 2

6.   练习:Maze

7.   练习:Maze 2

8.   练习:First Search Program

这里有个练习题,返回搜索到目标后的最少花费usedCost,返回list [usedCost, x, y].这里没有打印出路径,因为没有要求。我实现的代码如下:

# -*- coding: utf-8 -*-

# ----------

# User Instructions:


# Define a function, search() that returnsa list

# in the form of [optimal path length,row, col]. For

# the grid shown below, your functionshould output

# [11, 4, 5].


# If there is no valid path from the startpoint

# to the goal, your function should returnthe string

# 'fail'

# ----------


# Grid format:

#  0 = Navigable space

#  1 = Occupied space


grid = [[0, 0, 1, 0, 0, 0],

       [0, 0, 1, 0, 0, 0],

       [0, 0, 0, 0, 1, 0],

       [0, 0, 1, 1, 1, 0],

       [0, 0, 0, 0, 1, 0]]

init = [0, 0]

goal = [len(grid)-1, len(grid[0])-1]

cost = 1


delta = [[-1, 0], # go up

        [ 0,-1], # go left

        [ 1, 0], # go down

        [ 0, 1]] # go right


delta_name = ['^', '<', 'v', '>']


def search(grid,init,goal,cost):

   # ----------------------------------------

   # insert code here

   # ----------------------------------------

   path = []

   openlist = []

   openlist.append([0, init[0], init[1]])

   closedlist = []

   obstacleslist = []


   while len(openlist) > 0:

       newOpenList = []

       for i in range(len(openlist)):

           # find all valid neighbor

           for j in range(len(delta)):

                x = openlist[i][1] +delta[j][0]

                y = openlist[i][2] +delta[j][1]

                if (x >= 0 and x <len(grid) and \

                    y >= 0 and y <len(grid[0])):

                    if [x, y] in closedlist:


                    elif [x, y] inobstacleslist:


                    elif grid[x][y] == 1:



                    elif x == goal[0] and y ==goal[1]:

                        path = [openlist[i][0]+ cost, goal[0], goal[1]]


                        return path


                        hasInclude = False

                        for k inrange(len(newOpenList)):

                            ifnewOpenList[k][1] == x and newOpenList[k][2] == y:

                                hasInclude = True

                                ifopenlist[i][0] + cost < newOpenList[k][0]:

                                   newOpenList[k][0] = openlist[i][0] + cost


                        if not hasInclude:

                           newOpenList.append([openlist[i][0] + cost, x, y])

           if [openlist[i][1], openlist[i][2]] not in closedlist:

               closedlist.append([openlist[i][1], openlist[i][2]])

       #print("Test, closelist = ", closedlist)

       #print("Test, openlist = ", openlist)

       #print("new open list = ", newOpenList)


       openlist = newOpenList

   path = 'Fail'

   return path




9.   练习:Expansion Grid


10. 练习:Print Path


11. A* 算法



h(x,y) 返回当前点(x,y)到目标goal的距离,视频上说是“欧几里得”距离,实际上讲解用的是曼哈顿距离。计算时不考虑中间可能出现的障碍物。


f = g + h(x,y) 对当前考察的点(open列表中遍历到的(x,y)),计算这个和值。

f函数被称为evaluation  function,评价函数。


12. 练习:Implement A*


# -*- coding: utf-8 -*-

# -----------

# User Instructions:


# Modify the the search function so thatit becomes

# an A* search algorithm as defined in theprevious

# lectures.


# Your function should return the expandedgrid

# which shows, for each element, the countwhen

# it was expanded or -1 if the element wasnever expanded.


# If there is no path from init to goal,

# the function should return the string'fail'

# ----------


grid = [[0, 1, 0, 0, 0, 0],

       [0, 1, 0, 0, 0, 0],

       [0, 1, 0, 0, 0, 0],

       [0, 1, 0, 0, 0, 0],

       [0, 0, 0, 0, 1, 0]]

heuristic = [[9, 8, 7, 6, 5, 4],

             [8, 7, 6, 5, 4, 3],

             [7, 6, 5, 4, 3, 2],

             [6, 5, 4, 3, 2, 1],

             [5, 4, 3, 2, 1, 0]]


init = [0, 0]

goal = [len(grid)-1, len(grid[0])-1]

cost = 1


delta = [[-1, 0 ], # go up

        [ 0, -1], # go left

        [ 1, 0 ], # go down

        [ 0, 1 ]] # go right


delta_name = ['^', '<', 'v', '>']


def search(grid,init,goal,cost,heuristic):

   # ----------------------------------------

   # modify the code below

   # ----------------------------------------

   closed = [[0 for col in range(len(grid[0]))] for row inrange(len(grid))]

   closed[init[0]][init[1]] = 1


   expand = [[-1 for col in range(len(grid[0]))] for row inrange(len(grid))]

   action = [[-1 for col in range(len(grid[0]))] for row inrange(len(grid))]


   x = init[0]

   y = init[1]

   g = 0

   f = heuristic[x][y]


   open = [[f, g, x, y]]


   found = False  # flag that is setwhen search is complete

   resign = False # flag set if we can't find expand

   count = 0


   while not found and not resign:

       if len(open) == 0:

           resign = True

           return "Fail"




            next = open.pop()

           x = next[2]

           y = next[3]

           g = next[1]

           f = next[0]

           expand[x][y] = count

           count += 1


           if x == goal[0] and y == goal[1]:

                found = True


                for i in range(len(delta)):

                    x2 = x + delta[i][0]

                    y2 = y + delta[i][1]

                    if x2 >= 0 and x2 <len(grid) and y2 >=0 and y2 < len(grid[0]):

                        if closed[x2][y2] == 0and grid[x2][y2] == 0:

                            g2 = g + cost

                            h2 =heuristic[x2][y2]

                            f2 = g2 + h2

                            open.append([f2,g2, x2, y2])

                            closed[x2][y2] = 1


   return expand



13. A* in Action

14. Dynamic Programming


15. 练习:Computing Value

16. 练习:Computing Value 2

17. 练习: Value Program

# -*- coding: utf-8 -*-

# ----------

# User Instructions:


# Create a function compute_value whichreturns

# a grid of values. The value of a cell isthe minimum

# number of moves required to get from thecell to the goal.


# If a cell is a wall or it is impossibleto reach the goal from a cell,

# assign that cell a value of 99.

# ----------

grid = [[0, 1, 0, 0, 0, 0],

       [0, 1, 0, 0, 0, 0],

       [0, 1, 0, 0, 0, 0],

       [0, 1, 0, 0, 0, 0],

       [0, 0, 0, 0, 1, 0]]

goal = [len(grid)-1, len(grid[0])-1]

cost = 1 # the cost associated with movingfrom a cell to an adjacent one


delta = [[-1, 0 ], # go up

        [ 0, -1], # go left

        [ 1, 0 ], # go down

        [ 0, 1 ]] # go right


delta_name = ['^', '<', 'v', '>']


def compute_value(grid,goal,cost):

   # ----------------------------------------

   # insert code below

   # ----------------------------------------


   # make sure your function returns a grid of values as

   # demonstrated in the previous video.


        obstaclesVal = 99

   value = [[obstaclesVal for row in range(len(grid[0]))] for col inrange(len(grid))]


   isChanged = True

   while isChanged:

       isChanged = False


       for x in range(len(grid)):

           for y in range(len(grid[0])):

                if goal[0] == x and goal[1] ==y:

                    if value[x][y] > 0:

                        value[x][y] = 0

                        isChanged = True


                elif grid[x][y] == 0:

                    for a in range(len(delta)):

                        x2 = x + delta[a][0]

                        y2 = y + delta[a][1]


                        if x2 >= 0 and x2< len(grid) and y2 >= 0 and y2 < len(grid[0]) and grid[x2][y2] == 0:

                            v2 = value[x2][y2]+ cost


                            if v2 <value[x][y]:

                                isChanged =True

                                value[x][y] =v2

   return value



18. 练习:Optimum policy


# -*- coding: utf-8 -*-

grid = [[0, 1, 0, 0, 0, 0],

       [0, 1, 0, 0, 0, 0],

       [0, 1, 0, 0, 0, 0],

       [0, 1, 0, 0, 0, 0],

       [0, 0, 0, 0, 1, 0]]


init = [0, 0]

goal = [len(grid)-1, len(grid[0])-1]

cost = 1 # the cost associated with movingfrom a cell to an adjacent one


delta = [[-1, 0 ], # go up

        [ 0, -1], # go left

        [ 1, 0 ], # go down

        [ 0, 1 ]] # go right


delta_name = ['^', '<', 'v', '>']


def optimum_policy(grid,goal,cost):

   # ----------------------------------------

   # modify code below

   # ----------------------------------------

   value = [[99 for row in range(len(grid[0]))] for col inrange(len(grid))]

   policy = [[' ' for row in range(len(grid[0]))] for col inrange(len(grid))]


   change = True


   while change:

       change = False


       for x in range(len(grid)):

           for y in range(len(grid[0])):

                if goal[0] == x and goal[1] ==y:

                    if value[x][y] > 0:

                        value[x][y] = 0

                        policy[x][y] = '*'

                        change = True


                elif grid[x][y] == 0:

                    for a in range(len(delta)):

                        x2 = x + delta[a][0]

                        y2 = y + delta[a][1]


                        if x2 >= 0 and x2< len(grid) and y2 >= 0 and y2 < len(grid[0]) and grid[x2][y2] == 0:

                            v2 = value[x2][y2]+ cost


                            if v2 <value[x][y]:

                                change = True

                                value[x][y] =v2

                                policy[x][y] =delta_name[a]


   for i in range(len(value)):


   return policy



19. 练习:Left Turn Policy



# -*- coding: utf-8 -*-

forward = [[-1,  0], # go up

           [ 0, -1], # go left

           [ 1, 0], # go down

           [ 0, 1]] # go right

forward_name = ['up','left', 'down', 'right']


# action has 3 values:right turn, no turn, left turn

action = [-1, 0, 1]

action_name = ['R','#', 'L']



# grid format:

#     0 = navigable space

#     1 = unnavigable space

grid = [[1, 1, 1, 0,0, 0],

        [1, 1, 1, 0, 1, 0],

        [0, 0, 0, 0, 0, 0],

        [1, 1, 1, 0, 1, 1],

        [1, 1, 1, 0, 1, 1]]


init = [4, 3, 0] #given in the form [row,col,direction]

                 # direction = 0: up

                 #             1: left

                 #             2: down

                 #             3: right


goal = [2, 0] # givenin the form [row,col]


cost = [2, 1, 20] #cost has 3 values, corresponding to making

                  # a right turn, no turn, anda left turn


grid = [[0, 0, 0, 0,1, 1],

        [0, 0, 1, 0, 0, 0],

        [0, 0, 0, 0, 1, 0],

        [0, 0, 1, 1, 1, 0],

        [0, 0, 0, 0, 1, 0]]

init = [4, 5, 0]

goal = [4, 3]

cost = [1, 1, 1]



# callingoptimum_policy2D with the given parameters should return

# [[' ', ' ', ' ','R', '#', 'R'],

#  [' ', ' ', ' ', '#', ' ', '#'],

#  ['*', '#', '#', '#', '#', 'R'],

#  [' ', ' ', ' ', '#', ' ', ' '],

#  [' ', ' ', ' ', '#', ' ', ' ']]

# ----------


# ----------------------------------------

# modify code below



def optimum_policy2D(grid,init,goal,cost):


    value = [[[999 for row inrange(len(grid[0]))] for col in range(len(grid))],

              [[999 for row inrange(len(grid[0]))] for col in range(len(grid))],

               [[999 for row inrange(len(grid[0]))] for col in range(len(grid))],

                 [[999 for row inrange(len(grid[0]))] for col in range(len(grid))]]


    policy = [[[' ' for row inrange(len(grid[0]))] for col in range(len(grid))],

              [[' ' for row inrange(len(grid[0]))] for col in range(len(grid))],

               [[' ' for row inrange(len(grid[0]))] for col in range(len(grid))],

                 [[' ' for row in range(len(grid[0]))]for col in range(len(grid))]]


    policy2D = [[' ' for row inrange(len(grid[0]))] for col in range(len(grid))]


    #policy2D[goal[0]][goal[1]] = '*'


    change = True


    while change:

        change = False

        # go through all grid cells andcaculate values

        for x in range(len(grid)):

            for y in range(len(grid[0])):

                for orientation inrange(len(forward)):

                    if x == goal[0] and y ==goal[1]:

                        if value[orientation][x][y] > 0:

                            change = True

                           value[orientation][x][y] = 0

                           policy[orientation][x][y] = '*'

                    elif grid[x][y] == 0:

                        # calulate the three was topropagate value

                        for i inrange(len(action)):

                            o2 = (orientation +action[i] + len(forward)) % len(forward)

                            x2 = x +forward[o2][0]

                            y2 = y + forward[o2][1]

                            cost_step = cost[i]


                            if x2 >= 0 andx2 < len(grid) and y2 >=0 and y2 < len(grid[0]) and grid[x2][y2] == 0:

                                v2 = value[o2][x2][y2]+ cost_step

                                if v2 <value[orientation][x][y]:

                                   value[orientation][x][y] = v2

                                   policy[orientation][x][y] = action_name[i]

                                    change = True


    #print('print policy3d------------')

    #for i in range(len(policy)):

    #   print(policy[i])


    print('===============generate policy2D')

    x = init[0]

    y = init[1]

    orientation = init[2]


    policy2D[x][y] = policy[orientation][x][y]

    while policy[orientation][x][y] != '*':

        if policy[orientation][x][y] == '#':

            newOri = orientation

        elif policy[orientation][x][y] == 'R':

            newOri = (orientation - 1 + len(forward))% len(forward)

        elif policy[orientation][x][y] == 'L':

            newOri = (orientation + 1) %len(forward)


        x = x + forward[newOri][0]

        y = y + forward[newOri][1]

        orientation = newOri

        policy2D[x][y] =policy[orientation][x][y]


    return policy2D




p2D =optimum_policy2D(grid,init,goal,cost)

if p2D != 'Fail':

    for i in range(len(p2D)):







grid = [[0, 0, 0, 0,1, 1],

        [0, 0, 1, 0, 0, 0],

        [0, 0, 0, 0, 1, 0],

        [0, 0, 1, 1, 1, 0],

        [0, 0, 0, 0, 1, 0]]

init = [4, 5, 0]

goal = [1,0]

cost = [1, 1, 1]


['L', '#', '#', 'L', '', ' ']

['*', ' ', ' ', 'R','#', 'L']

[' ', ' ', ' ', ' ', '', '#']

[' ', ' ', ' ', ' ', '', '#']

[' ', ' ', ' ', ' ', '', '#']




在用 动态规划法获取起始点和目标点附近的那些value值(策略值)后,可以将它们作为Astar算法中的H值(启发函数值),然后f=g+H。

从起始点开始,每次遇到一个分支,就考虑是否需要增加一条新的路径,如果已有路径数达到计划缓存的路径数上限(可以大于准备显示的路径数,防止前期的失败的路径过多),就替换掉最高f值的路径。不需要close数组,替换成判断这个有方向的候选点是否已经存在于这次open点所在的path中了。 在append一个open点的时候,要保存点及方向和这个点所在的路径。如果候选点的h值还是非法值(这里的h指的是每个方向都对应一个数组,有的格子在某种方向时,不能到达终点),说明是不可能到达终点的,不用添加到open列表中。



# -*- coding: utf-8-*-


Created on Fri May 1819:21:36 2018


@author: Administrator


# ----------

# User Instructions:


# Implement the functionoptimum_policy2D below.


# You are given a carin grid with initial state

# init. Your task isto compute and return the car's

# optimal path to theposition specified in goal;

# the costs for eachmotion are as defined in cost.


# There are four motiondirections: up, left, down, and right.

# Increasing the indexin this array corresponds to making a

# a left turn, anddecreasing the index corresponds to making a

# right turn.

forward = [[-1,  0], # go up

           [ 0, -1], # go left

           [ 1, 0], # go down

           [ 0, 1]] # go right

forward_name = ['up','left', 'down', 'right']


# action has 3 values:right turn, no turn, left turn

action = [-1, 0, 1]

action_name = ['R','#', 'L']



# grid format:

#     0 = navigable space

#     1 = unnavigable space

grid = [[1, 1, 1, 0,0, 0],

        [1, 1, 1, 0, 1, 0],

        [0, 0, 0, 0, 0, 0],

        [1, 1, 1, 0, 1, 1],

        [1, 1, 1, 0, 1, 1]]


init = [4, 3, 0] #given in the form [row,col,direction]

                 # direction = 0: up

                 #             1: left

                 #             2: down

                 #             3: right


goal = [2, 0] # givenin the form [row,col]


cost = [2, 1, 20] #cost has 3 values, corresponding to making

                  # a right turn, no turn, anda left turn


grid = [[0, 0, 0, 0,1, 1],

        [0, 0, 1, 0, 0, 0],

        [0, 0, 0, 0, 1, 0],

        [0, 0, 1, 1, 1, 0],

        [0, 0, 0, 0, 1, 0]]

init = [4, 5, 0]

goal = [4, 3]

cost = [1, 1, 1]



# callingoptimum_policy2D with the given parameters should return

# [[' ', ' ', ' ','R', '#', 'R'],

#  [' ', ' ', ' ', '#', ' ', '#'],

#  ['*', '#', '#', '#', '#', 'R'],

#  [' ', ' ', ' ', '#', ' ', ' '],

#  [' ', ' ', ' ', '#', ' ', ' ']]

# ----------



# modify code below



def optimum_policy2D(grid,init,goal,cost):


    value = [[[999 for row inrange(len(grid[0]))] for col in range(len(grid))],

              [[999 for row inrange(len(grid[0]))] for col in range(len(grid))],

               [[999 for row inrange(len(grid[0]))] for col in range(len(grid))],

                 [[999 for row inrange(len(grid[0]))] for col in range(len(grid))]]


    policy = [[[' ' for row inrange(len(grid[0]))] for col in range(len(grid))],

              [[' ' for row inrange(len(grid[0]))] for col in range(len(grid))],

               [[' ' for row inrange(len(grid[0]))] for col in range(len(grid))],

                 [[' ' for row inrange(len(grid[0]))] for col in range(len(grid))]]


    policy2D = [[' ' for row inrange(len(grid[0]))] for col in range(len(grid))]


    #policy2D[goal[0]][goal[1]] = '*'


    change = True


    while change:

        change = False

        # go through all grid cells andcaculate values

        for x in range(len(grid)):

            for y in range(len(grid[0])):

                for orientation inrange(len(forward)):

                    if x == goal[0] and y ==goal[1]:

                        ifvalue[orientation][x][y] > 0:

                            change = True

                           value[orientation][x][y] = 0

                            policy[orientation][x][y]= '*'

                    elif grid[x][y] == 0:

                        # calulate the threewas to propagate value

                        for i inrange(len(action)):

                            o2 = (orientation +action[i] + len(forward)) % len(forward)

                            x2 = x +forward[o2][0]

                            y2 = y +forward[o2][1]

                            cost_step = cost[i]


                            if x2 >= 0 andx2 < len(grid) and y2 >=0 and y2 < len(grid[0]) and grid[x2][y2] == 0:

                                v2 =value[o2][x2][y2] + cost_step

                                if v2 <value[orientation][x][y]:

                                   value[orientation][x][y] = v2

                                    policy[orientation][x][y]= action_name[i]

                                    change =True


    #print('print policy3d------------')

    #for i in range(len(policy)):

    #   print(policy[i])


    print('===============generate policy2D')

    x = init[0]

    y = init[1]

    orientation = init[2]


    policy2D[x][y] = policy[orientation][x][y]

    while policy[orientation][x][y] != '*':

        newOri = -1

        if policy[orientation][x][y] == '#':

            newOri = orientation

        elif policy[orientation][x][y] == 'R':

            newOri = (orientation - 1 +len(forward)) % len(forward)

        elif policy[orientation][x][y] == 'L':

            newOri = (orientation + 1) %len(forward)

        else: # ' '

            print('Failed to find a path!')


        x = x + forward[newOri][0]

        y = y + forward[newOri][1]

        orientation = newOri

        policy2D[x][y] =policy[orientation][x][y]


    return policy2D




p2D =optimum_policy2D(grid,init,goal,cost)

if p2D != 'Fail':

    for i in range(len(p2D)):



20. Planning Conclusion


1.        A*算法,它使用一种启发式搜索来发现路径。

2.        动态规划法,它会找出一个整体的策略,策略中包含到任意一点的路径。



