爬取某城市菜鸟驿站地址

最新推荐文章于 2025-05-14 15:07:12 发布

husy有趣的恶魔

最新推荐文章于 2025-05-14 15:07:12 发布

阅读量746

点赞数 7

文章标签： python 爬虫

本文链接：https://blog.csdn.net/qq_41939998/article/details/135507733

版权

为了测试论文设计的超启发式算法，需要获取仓库，客户点的位置信息。其中客户点位置多且较为分散，难以获取。而目前大部分的快递都会先送到驿站，再由客户自提，并且驿站一般开在人口较多的地方，可以理解为客户群体的集合点。

本文以东莞市为例，拟通过爬虫获取东莞市所有的菜鸟驿站位置，并获取相应的经纬度信息。

http://www.iecity.com/dongguan/brand/75655.html网址中收录了东莞目前1782家菜鸟驿站信息，通过正则表达式获取其中的网址链接和地址信息。

其中每页最多存在一百家店的信息，点击下一页发现网址变为http://www.iecity.com/dongguan/brand/75655_2.html，可以通过修改后缀数字从而实现翻页。防止在爬取时由于网络原因导致访问失败，给定一定次数进行重新访问。

import re
import requests
headers={'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0'}
urls1=[]
locs1=[]
for i in range(1,19):
    url='http://www.iecity.com/dongguan/brand/75655_{}.html'.format(i)
    k=0
    while(k<8):
        try:
            html=requests.get(url,headers=headers,timeout=5)
            html.encoding='gbk'
            tmp=re.findall('<ul class=\'LifeList clearfix\'>[\s\S]*?<div class="item2"><h2>地理位置分布</h2>',html.text)
            urls1.extend(re.findall('<a href=\\\'(http:.*?)\'',tmp[0]))
            locs1.extend(re.findall('地址：<span>(.*?)</span>',tmp[0]))
            break
        except:
            k+=1
            if(k==8):
                print('{}页无法连接'.format(i))
    print(i,k,len(urls1),len(locs1))

获取具体地址后还需要获得经纬度信息，可以通过高德的api实现。高德api可以通过在官网高德开放平台 | 高德地图API (amap.com)注册后获得。获取地址后通过正则进行解析，并且去除非东莞市的地址。

n=0 
m=0
u = 'https://restapi.amap.com/v3/geocode/geo'   # 输入API问号前固定不变的部分
f=open(r"k.txt","w")
for loc in locs1:
    params = { 'key': '21be9ab86b66627334ef30ba02c3a38c',
               'address': loc   }                # 将两个参数放入字典
    res = requests.get(u,params)
    try:

        f.write(re.findall('"city":"东莞市",.*?"location":"(.*?)"',res.text)[0]+re.findall('"formatted_address":"(.{0,25})","country":"中国","province":"广东省","citycode":"0769","city":"东莞市"',res.text)[0]+'\n')

    except:
        print(url,'不是东莞市地址，请手动获取')
        m+=1

    n+=1
    print(n,m)
f.close()