【Python爬虫】获取全国客运列车车次及起讫站点位置信息

最新推荐文章于 2023-04-19 11:21:53 发布

老城厢

最新推荐文章于 2023-04-19 11:21:53 发布

阅读量2.2k

点赞数

文章标签： python 大数据

本文链接：https://blog.csdn.net/ZuoZuodeSiXiang/article/details/107556550

版权

【本文更新于2020-7-24】
利用Python，通过对列车车次查询网站的内容进行爬虫爬取，获得全国客运列车车次及其起讫站点位置信息（经纬度）。值得注意的是，这些地点的坐标基于高德坐标系，因此在可视化的过程中，推荐使用高德开放平台的“数据可视化”功能。
下面给出参考代码。
注意：
（1）在使用高德开放平台提供的各项功能之前，需申请密钥（key）。这一加密字符串的功能，类似百度平台的“ak”。
（2）在使用Python爬虫前，需安装requests、bs4等库。
（3）高德开放平台提供的各项功能的使用方法，可参考官方提供的使用说明。

#获取铁路列车信息
import xlwt
import requests as rq
from bs4 import BeautifulSoup as bs
import time
from retry import retry

rootpath = 'E:/PythonChris/Train_Schedule/'

#所有火车车次网站
site_of_train = 'http://www.jt2345.com/huoche/checi'

@retry(tries=2, delay=2)
def geturl(url):
    headers = {'User-Agent':'Chrome/72.0'}
    r = rq.get(url,headers = headers,timeout = 5)
    r.raise_for_status
    r.encoding = 'GB2312'
    txt = r.text
    return txt

#收集所有车次信息
@retry(tries=2, delay=2)
def gettrains(txt):
    trains = []
    soup = bs(txt,'html.parser')
    for l in soup.find_all('a'):
        trains.append(str(l.string))
    trains = trains[5:]
    return trains

trains = gettrains(geturl(site_of_train))
num_of_train = len(trains)
#print(len(trains))

despath = rootpath + 'Train_Schedule_Detail.xls'
wb = xlwt.Workbook(encoding = 'UTF-8')
wt = wb.add_sheet('Detail',cell_overwrite_ok=True)
now_time = time.strftime('Updated: %Y-%m-%d %H:%M', time.localtime())
now_time = now_time + '，共收集到{}列次数据。'.format(num_of_train)
wt.write(0,0,label = now_time)
fir = ['车次','列车类型','始发站','终点站','发车时间','到站时间','全程耗时','数据更新时间','起点经纬度','终点经纬度']
for i in range(10):
    wt.write(1,i,label = fir[i])

#为具体车次的信息进行函数构建
@retry(tries=2, delay=2)
def specialsoup(txt):
    soup = bs(txt,'html.parser')
    infor_one = []
    for p in soup.find_all('td'):
        infor_one.append(str(p.string))
    tem = [infor_one[5],infor_one[8],infor_one[10],infor_one[18],infor_one[12],infor_one[14],infor_one[16],infor_one[22]]
    infor_one = tem
    return infor_one

#获取经纬度
key = '***' #需要根据自己的key替换
@retry(tries=2, delay=2)
def getpoint(location):
    global key
    headers = {'User-Agent':'Chrome/72.0'}
    url = 'https://restapi.amap.com/v3/geocode/geo?address=' + location + ' 火车站&output=XML&key=' + key
    rx = rq.get(url,headers = headers)
    rx.raise_for_status
    rx.encoding = 'UTF-8'
    tx = rx.text
    so = bs(tx,'html.parser')
    loc = str(so.find('location').string)
    return loc

#收集所有车次的具体信息
@retry(tries=2, delay=2)
def getinfor(trains):
    count = 0
    global site_of_train
    global num_of_train
    global wt
    global wb
    #infor_all = []
    for t in range(len(trains)):
        try:
            url = site_of_train +'/'+ str(trains[t]) +'.htm'
            txt = geturl(url)
            infor_one = specialsoup(txt)
            count += 1
            percent = '{:.3f}'.format(count/num_of_train*100)
            for k in range(8):
                wt.write(t+2,k,label = infor_one[k])
            wt.write(t+2,8,label = getpoint(infor_one[2]))
            wt.write(t+2,9,label = getpoint(infor_one[3]))
            print('收集工作已完成{}%。{}/{}'.format(percent,count,num_of_train))
        except:
            continue
    wb.save(despath)
        #infor_all.append(infor_one)
    #return infor_all

getinfor(trains)
``

老城厢

关注

0
点赞
踩
12

收藏

觉得还不错? 一键收藏
2
评论
【Python爬虫】获取全国客运列车车次及起讫站点位置信息

【本文更新于2020-7-24】利用Python，通过对列车车次查询网站的内容进行爬虫爬取，获得全国客运列车车次及其起讫站点位置信息（经纬度）。值得注意的是，这些地点的坐标基于高德坐标系，因此在可视化的过程中，推荐使用高德开放平台的“数据可视化”功能。下面给出参考代码。注意：（1）在使用高德开放平台提供的各项功能之前，需申请密钥（key）。这一加密字符串的功能，类似百度平台的“ak”。（2）在使用Python爬虫前，需安装requests、bs4等库。（3）高德开放平台提供的各项功能的使用方法，
复制链接

扫一扫