一、技术难点
如何建立一个循环,分别表示不同的时间段,然后按照此时间段索引数据,分别保存。
二、代码实现
1)读取数据,并预处理
import numpy as np
import pandas as pd
f=open('D:\DynamicPopulation\交通赛数据_上\\20140806_train.txt')
data=pd.read_csv(f,names=['ID','lat','lon','passager','time'])
data['passager_1']=data['passager'].shift(1)
data['change']=data['passager']-data['passager_1']
data=data.drop(['passager_1','passager_1'],axis=1)
2)建立时间索引
data['time']=pd.to_datetime(data['time'])
data=data.set_index('time')
data=data.sort_index()
3)坐标纠正
data['lat_res']=data['lat']+0.002325
data['lon_res']=data['lon']-0.00252
data=data.drop(['lat','lon'],axis=1)
4)选取上、下车点
up_point=data.loc[data['change']==1]
down_point=data.loc[data['change']==-1]
5)按时间段循环索引、保存(本文重点!!!)
import datetime
from pandas.tseries.offsets import Hour,Minute
import os
start='2014-08-06 08:00:00'
end='2014-08-06 09:00:00'
start=datetime.datetime.strptime(start,"%Y-%m-%d %H:%M:%S")
end=datetime.datetime.strptime(end,"%Y-%m-%d %H:%M:%S")
save_dir = "D:\\DynamicPopulation\\交通赛数据_上\\20140803\\"
for i in range(8,25):
dataName='%02dup.txt'%(i)
file_name = os.path.join(save_dir, dataName)
data_hour=up_point[start:end]
data_hour.to_csv(file_name)
print("save:",dataName)
start=start+Hour()
end=end+Hour()