今天学习到一个公交信息爬取技术,以下为详细步骤(以高德地图为例):
准备工作
1.获取高德开放平台API权限
网站:https://lbs.amap.com/
注册,进入控制台-应用管理-我的应用-创建新应用-添加key,选择web端JS API。完成创建后会得到一行key代码,复制备用。
2.将获得的key替换掉以下url中“key=**”的内容
url = ‘https://restapi.amap.com/v3/bus/linename?s=rsv3&extensions=all&key=8bdd8fdfbc3d4147c6d2a4c94cef3d4a&output=json&city=南京&offset=1&keywords=88路&platform=JS’
编程阶段
1.导入库
import requests
import json
import pandas as pd
from lxml import etree
import time
url = 'https://restapi.amap.com/v3/bus/linename?s=rsv3&extensions=all&key=8bdd8fdfbc3d4147c6d2a4c94cef3d4a&output=json&city=南京&offset=1&keywords=88路&platform=JS'
r = requests.get(url).text
rt = json.loads(r)
rt['buslines'][0]['name'] #获得公交线路的名称
运行结果:
dt = {}
dt['line_name'] = rt['buslines'][0]['name']
dt['polyname'] = rt['buslines'][0]['polyline']
dt['total_price'] = rt['buslines'][0]['total_price']
st_name = []
st_coords = []
for st in rt['buslines'][0]['busstops']:
st_name.append(st['name'])
st_coords.append(st['location'])
dt['station_name'] = st_name
dt['station_coords'] = st_coords
print(dt)
dm = pd.DataFrame([dt])
dm
运行结果:
2.编写函数
def get_dt(city,line):
url = 'https://restapi.amap.com/v3/bus/linename?s=rsv3&extensions=all&key=8bdd8fdfbc3d4147c6d2a4c94cef3d4a&output=json&city={}&offset=1&keywords={}&platform=JS'.format(city,line)
r = requests.get(url).text
rt = json.loads(r)
try:
if rt['buslines']:
print('data available..')
if len(rt['buslines']) == 0: #有名称没数据
print('no data in list..')
else:
dt = {}
dt['line_name'] = rt['buslines'][0]['name']
dt['polyname'] = rt['buslines'][0]['polyline']
dt['total_price'] = rt['buslines'][0]['total_price']
st_name = []
st_coords = []
for st in rt['buslines'][0]['busstops']:
st_name.append(st['name'])
st_coords.append(st['location'])
dt['station_name'] = st_name
dt['station_coords'] = st_coords
dm = pd.DataFrame([dt])
print(dm)
else:
pass
except:
print('error..try it again..')
time.sleep(2)
get_dt(city,line)
==========================================================================
测试结果1:
测试结果2: