python例题求乘客等车时间_利用Python数据处理进行公交车到站时间预测（一）

白宇翰

于 2021-01-13 17:07:04 发布

阅读量1.1k

点赞数 1

文章标签： python例题求乘客等车时间

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_42202078/article/details/112889578

版权

本文介绍了一种使用Python进行公交到站时间预测的方法，涉及数据清洗、分组、时间间隔计算等步骤。通过对公交GPS数据的处理，删除无效数据，区分上行和下行路线，计算相邻站点间的行驶间隔，并处理时间戳以获取时间差。最后通过源代码展示了具体实现过程。

摘要由CSDN通过智能技术生成

1.数据格式

id int id编号

type int 41表示站间数据，42中间站进出数据 43始末站进出数据

route_id int 线路ID号，10454，10069，120881

bus_id varchar 车辆编号

station_id varchar 站点编号

lon decimal 经度

lat decimal 纬度

speed decimal 速度

direction decimal 方向

gpsflag int gps状态 0有效，1无效

updownflag int 上下行，0上行，1下行

inoutflag int 进出站，0进站，1出站

runningflag int 运营状态，0正常运营，1停止运营

onlineflag int 在线状态，0正常状态，1不在线

create_time timestamp gps时间

共十五个字段，如下截图所示：

2.简单数据清洗

首先，删除线路id编号，因为我们本次处理的是一条线路。根据运营状态、在线状态、gps是否有效，可删除无效数据。

利用上下行的标志位，将简单清理后的数据分成两部分，上行部分和下行部分:

然后，根据不同的公交汽车，把上下行数据按照不通公交车分类。生成两个List。每个List分别对应上行或者下行公共汽车的集合，List的元素就是该公共汽车在数据采集周期内的每个到达每个站点的不同位置

3.获取间隔时间

假设我们现在有了单辆bus的信息，那么计算相邻两站之间的时间，只需要根据type和inoutflag就可以了。只需要type为42(表示为在中间站)同事inoutflag为0.表示进站。提取符合这两条的记录便可以计算所有车站之间的行驶间隔了。最后我们把数据删除的只剩下站点和到站时间信息。

由于我们要获取的是时间间隔，而我们现在只有到站时间。利用python的时间处理模块，将这一时间字符串转化为时间戳，然后利用list计算出各站点之间的gap(时间差)，然后保存为Series后插入到dataframe格式中。

最后，由于数据存在误差，gps传输的数据也容易受到干扰，所以需要删除一些明显诡异的值。

4.源代码

# -*- coding: utf-8 -*-

"""

Created on Tue Dec 15 19:51:52 2015

@author: Luyixiao

"""

import pandas as pd

import numpy as np

import time

def disData(path):

data = pd.read_table(path,header=None)#read the txt data as table

daVal = data.drop(2,axis = 1)#delet the useless columns(the rout id)

daVal = daVal[daVal[13]==0]#onlineflag

daVal = daVal[daVal[12]==0]#runningflag

daVal = daVal[daVal[9]==0]#gpsflag

daVal =daVal.drop([5,6,7,8,9,12,13], axis = 1)

upRoad = daVal[daVal[10] == 0]#updownflag get the up flow data

downRoad = daVal[daVal[10]==1]

groupedUp = upRoad.groupby(3)#bus_id

upList = []

for bus in groupedUp:

upList.append(bus)

groupedDown = downRoad.groupby(3)#bus_id

downList = []

for bus in groupedDown:

downList.append(bus)

return upList,downList

#above return value is a list,elements in lists can be as the input

def timeGet(da):

inrec = da[da[11]==0]#inoutflag ,endure that the bus enters station

inrec = inrec[inrec[1]==42]

clr = inrec.drop([0,1,3,10,11],axis = 1)

gg = clr.groupby(4)#group by the station

timl=[]#the list store the time stamp

for cnt in range(0,len(clr)):

timl.append(time.mktime(time.strptime(clr.iat[cnt,1],'%Y-%m-%d %H:%M:%S')))

gap = []

for cnt in range(0,len(timl)-1):

gap.append(timl[cnt+1]-timl[cnt])

gap.append(0)#the last one define as zero for the corresponding of length

clr['gap'] = pd.Series(gap,index= clr.index)#add the row to the data frame

gd = clr.groupby(4)

ll = []

for si in gd:

ll.append(si)#each station in each car as a group,we average them

kk = {}#the dict for store the "station":"average_time_for_this_bus"

for cnt in range(0,len(ll)):

temp = ll[cnt][1][ll[cnt][1]['gap']<600]

temp = temp[temp['gap']>60]

if len(temp)*2 < len(ll[cnt][1]):

ave = 0

else:

ave = temp.sum()['gap']/len(temp)

kk[ll[cnt][0]] = ave

return kk,gap

总之呢，groupby之后遍历，转成List是一种很好用的技巧。

本文同步分享在博客“钱塘小甲子”(CSDN)。

如有侵权，请联系 support@oschina.cn 删除。

本文参与“OSC源创计划”，欢迎正在阅读的你也加入，一起分享。

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。