Python实现两地铁站路径查询广度优先搜索BFS、启发式搜索算法

最新推荐文章于 2024-07-06 02:22:16 发布

Small_Fish25

最新推荐文章于 2024-07-06 02:22:16 发布

阅读量1.2k

点赞数 9

分类专栏： Python 专题 AI/机器学习专题文章标签： python 智慧城市

本文链接：https://blog.csdn.net/qq_42642142/article/details/119789811

版权

Python 专题同时被 2 个专栏收录

18 篇文章 1 订阅

订阅专栏

AI/机器学习专题

9 篇文章 4 订阅

订阅专栏

本文介绍了如何从网页抓取北京地铁数据，包括线路和站点信息，利用Python进行数据处理、网络图构建，并运用DFS和A*搜索算法寻找最短路径。通过requests库抓取数据，re和networkx库进行数据提取和可视化，展示了如何通过BFS和启发式搜索实现地铁线路导航。

摘要由CSDN通过智能技术生成

操作思路及流程：

从网页获取数据
对数据进行处理
数据建图
搜索算法设计

数据获取

需要的Package: requests

#准备了一个链接，通过requests来抓取页面的信息，并打印下
r = requests.get('http://map.amap.com/service/subway?_1469083453978&srhdata=1100_drw_beijing.json')
r.encoding = r.apparent_encoding
print(r.text)

部分数据如下：

在这里插入图片描述

通过返回的信息[一系列的字典类型]我们可以找到一些貌似是我们想要的键值对数据[站名 、经纬度坐标等]

数据处理及提取

需要的Package: re 、numpy 、networkx

提取前我们需要想一下我们想要获得什么数据？
-   所有线路信息
   - 字典存储dict：key：线路名称[字符串]；value：站点名称[列表]
   
-  所有站点信息
   - 字典存储dict：key：站点名称[字符串]；value：站点坐标[元组]

def get_lines_stations_info(text):
    # 遍历text格式数据，组成地点数据结构
    # 所有线路信息的dict：key：线路名称；value：站点名称list
    lines_info = {}
    # 所有站点信息的dict：key：站点名称；value：站点坐标(x,y)
    stations_info = {}

    pattern = re.compile('"st".*?"kn"')
    lines_list = pattern.findall(text)
    for i in range(len(lines_list)):
        # 地铁线路名
        pattern = re.compile('"ln":".*?"')
        line_name = pattern.findall(lines_list[i])[0][6:-1]
        # 站点信息list
        pattern = re.compile('"rs".*?"sp"')
        temp_list = pattern.findall(lines_list[i])
        station_name_list = []
        for j in range(len(temp_list)):
            # 站名
            pattern = re.compile('"n":".*?"')
            station_name = pattern.findall(temp_list[j])[0][5:-1]
            station_name_list.append(station_name)
            # 坐标(x,y)
            pattern = re.compile('"sl":".*?"')
            #对格式进行转化
            position = tuple(map(float, pattern.findall(temp_list[j])[0][6:-1].split(',')))
            # 数据加入站点信息dict
            stations_info[station_name] = position

        # 数据加入地铁线路dict
        lines_info[line_name] = station_name_list
    return lines_info, stations_infos

每写完一个函数，对其进行以下测试：

 lines_info, stations_info = get_lines_stations_info(r.text)
 print(lines_info)  #线路信息
 print(stations_info) #站点信息

在这里插入图片描述

数据可视化

import networkx as nx
# 此代码解决不显示汉字的问题  设置字体
matplotlib.rcParams['font.sans-serif'] = ['SimHei']
#指定figure的宽和高
plt.figure(figsize=(20, 20))
#定义一个无向图
stations_graph = nx.Graph()
#添加结点
stations_graph.add_nodes_from(list(stations_info.keys()))
#绘图
nx.draw(stations_graph, stations_info, with_labels=True, node_size=6)
#显示
plt.show()

在这里插入图片描述

图的创建

图一般有两种存储格式：邻接矩阵和邻接表

此次采用邻接表建图：

#将sta2加入key为sta1的字典的value中
def add_sta2tosta1(info,sta1,sta2):
    list_tmp = info.get(sta1)
    if not list_tmp:
        list_tmp = []
    list_tmp.append(sta2)
    info[sta1] = list_tmp
    return info
#根据线路信息创建邻接表
def create_map(lines_info):
    #邻接表: 字典形式key:站名 value:相邻的站名列表
    neighbor_info = {} #创建一个空字典
    for line_name,station_list in lines_info.items():
        for i in range(len(station_list) - 1):
            sta_one = station_list[i]
            sta_two = station_list[i+1]
            #添加信息  双方都要互相添加对方
            neighbor_info = add_sta2tosta1(neighbor_info,sta_one,sta_two)
            neighbor_info = add_sta2tosta1(neighbor_info,sta_two,sta_one)
    return neighbor_info

对函数进行测试：

nb_info = create_map(lines_info)
print(nb_info)

在这里插入图片描述

有了邻接表我们就可以对地图进行搜索遍历,在这之前，我们在对原来的图根据邻接表信息添加上折线

stations_connection_graph = nx.Graph(nb_info)
plt.figure(figsize=(30, 30))
nx.draw(stations_connection_graph, stations_info, with_labels=True, node_size=6)
plt.show()

谢谢有被图丑到^ ^

在这里插入图片描述

搜索算法设计

方法1：非启发式搜索 DFS、BFS

如果我们想找到一条最短路径，可以用DFS吗？答案是不可以。

当求解目的是必须找到最优解时，要使用BFS。
当求解目标是只要找到解即可，而且内存大小受限，则要使用DFS。

#BFD寻找最短路径
def BFS(lines_info,neighbor_info,start_station,end_station):
    # 搜索策略：以站点数量为cost（因为车票价格是按站算的）
    # 这种情况下，站点间的坐标距离难以转化为可靠的启发函数，所以只用简单的BFS算法
    # 由于每深一层就是cost加1，所以每层的cost都相同，算和不算没区别，所以省略
    #输入信息错误
    if (not neighbor_info.get(start_station)) or (not neighbor_info.get(end_station)):
        return None
    #搜索的路线：key:站点名 value:从start_station到key的路径
    paths = {}
    #添加初始站点
    paths[start_station] = [start_station]
    while True:
        #临时变量：存放下一层要遍历的结点  key:站点名 value:从start_station到key的路径
        paths_tmp = {}
        #遍历此次要遍历的所有结点
        for (k,val) in paths.items():
            #获取当前结点的邻接结点列表
            neighbors = neighbor_info[k].copy()
            if (len(val) >= 2):
                pre_station = val[-2]
                # 不往上一站走
                neighbors.remove(pre_station)
            #遍历邻接结点
            for station in neighbors:
                #如果已经走过了
                if station in paths:
                    continue
                #添加该节点
                path = val.copy()
                path.append(station)
                #入队
                paths_tmp[station] = path
                if station == end_station:
                    return path
        #更新队列
        paths = paths_tmp

测试：
print(BFS(lines_info,nb_info,'朝阳门','望京西'))
[‘朝阳门’, ‘东四十条’, ‘东直门’, ‘柳芳’, ‘光熙门’, ‘芍药居’, ‘望京西’]

方法2：启发式搜索路程最短优先

#以路径路程为cost的启发式搜索
def get_path_Astar(lines_info, neighbor_info, stations_info, from_station, to_station):
    # 搜索策略：以路径的站点间直线距离累加为cost，以当前站点到目标的直线距离为启发函数

    # 检查输入站点名称
    if not neighbor_info.get(from_station):
        print('起始站点“%s”不存在。请正确输入！'%from_station)
        return None
    if not neighbor_info.get(to_station):
        print('目的站点“%s”不存在。请正确输入！'%to_station)
        return None
    
    # 计算所有节点到目标节点的直线距离，备用
    distances = {}
    x,y = stations_info.get(to_station)
    for (k,v) in stations_info.items():
        x0,y0 = stations_info.get(k)
        l = ((x-x0)**2 + (y-y0)**2)**0.5
        distances[k] = l
        
    # 已搜索过的节点，dict
    # key=站点名称，value是已知的起点到此站点的最小cost
    searched = {}
    searched[from_station] = 0
    
    # 数据结构为pandas的dataframe
    # index为站点名称
    # g为已走路径，h为启发函数值（当前到目标的直线距离）
    nodes = pd.DataFrame([[[from_station], 0, 0, distances.get(from_station)]],
                         index=[from_station], columns=['path', 'cost', 'g', 'h'])
    print(nodes)
    
    count = 0
    while True:
        if count > 1000:
            break
        nodes.sort_values('cost', inplace=True)
        #index:估价函数最小的站点名称 node：一些其他属性信息
        for index, node in nodes.iterrows():
            #迭代次数+1
            count += 1
            # 向邻居中离目的地最短的那个站点搜索  neighbors：距离目的地最近的结点的邻居
            neighbors = neighbor_info.get(index).copy()
            if len(node['path']) >= 2:
                # 不向这个路径的反向去搜索
                neighbors.remove(node['path'][-2])
            for i in range(len(neighbors)):
                count += 1
                neighbor = neighbors[i]
                # g : 已经走的路径+此次走的距离
                g = node['g'] + get_distance(stations_info, index, neighbor)
                # h :当前结点到目标结点的距离
                h = distances[neighbor]
                # cost计算估价哈桑农户
                cost = g + h
                #将当前结点加入最短距离路径index中
                path = node['path'].copy()
                path.append(neighbor)
                # 找到目标，结束
                if neighbor == to_station:
                    print('共检索%d次。'%count)
                    return path
                #如果当前结点已经被搜索过了
                if neighbor in searched:
                    #并且现在搜索的路径不是最优，忽略
                    if g >= searched[neighbor]:
                        continue
                    else:
                        #否则修改 g 信息
                        searched[neighbor] = g
                        # 修改此站点对应的node信息 [先删除，再添加]
                        nodes.drop(neighbor, axis=0, inplace=True)#inplace=True表示在本来的数据上进行删除
                        row = pd.DataFrame([[path, cost, g, h]],index=[neighbor], columns=['path', 'cost', 'g', 'h'])
                        nodes = nodes.append(row)
                else:
                    #没有搜索过则进行标记
                    searched[neighbor] = g
                    row = pd.DataFrame([[path, cost, g, h]],index=[neighbor], columns=['path', 'cost', 'g', 'h'])
                    nodes = nodes.append(row)
            # 这个站点的所有邻居都搜索完了，删除这个节点
            nodes.drop(index, axis=0, inplace=True)

        # 外层for循环只跑第一行数据，然后重新sort后再计算
        continue     
        
    print('未能找到路径')
    return None

#获得两者之间的距离
def get_distance(stations_info, str1, str2):
    x1,y1 = stations_info.get(str1)
    x2,y2 = stations_info.get(str2)
    return ((x1-x2)**2 + (y1-y2)**2)** 0.5

测试：
paths = get_path_Astar(lines_info, nb_info, stations_info, '朝阳门', '望京西')
if paths:
 print("路径总计%d站。"%(len(paths)-1))
 print("-".join(paths))
共检索122次。
路径总计6站。
朝阳门-东四十条-东直门-柳芳-光熙门-芍药居-望京西

主函数代码：

if __name__ == '__main__':
    #获取数据
    r = requests.get('http://map.amap.com/service/subway?_1469083453978&srhdata=1100_drw_beijing.json')
    r.encoding = r.apparent_encoding
    #数据提取及处理
    lines_info, stations_info = get_lines_stations_info(r.text)
    #print(lines_info)
    #print(stations_info)

    # 画地铁图
    matplotlib.rcParams['font.sans-serif'] = ['SimHei']# 此代码解决不显示汉字的问题
    plt.figure(figsize=(20, 20))
    stations_graph = nx.Graph()
    stations_graph.add_nodes_from(list(stations_info.keys()))
    nx.draw(stations_graph, stations_info, with_labels=True, node_size=6)
    #plt.show()

    #建立图[连接表/连接矩阵]
    nb_info = create_map(lines_info)
    #print(nb_info)

    stations_connection_graph = nx.Graph(nb_info)
    plt.figure(figsize=(30, 30))
    nx.draw(stations_connection_graph, stations_info, with_labels=True, node_size=6)
    #plt.show()

    print(BFS(lines_info,nb_info,'朝阳门','望京西'))
    paths = get_path_Astar(lines_info, nb_info, stations_info, '朝阳门', '望京西')
    if paths:
        print("路径总计%d站。"%(len(paths)-1))
        print("-".join(paths))

Small_Fish25

关注

9
点赞
踩
22

收藏

觉得还不错? 一键收藏
6
评论
Python实现两地铁站路径查询广度优先搜索BFS、启发式搜索算法

操作思路及流程：从网页获取数据对数据进行处理数据建图搜索算法设计数据获取需要的Package: requests#准备了一个链接，通过requests来抓取页面的信息，并打印下r = requests.get('http://map.amap.com/service/subway?_1469083453978&srhdata=1100_drw_beijing.json')r.encoding = r.apparent_encodingprint(r.text)部分
复制链接

扫一扫