1-出租车数据的基础处理,由gps生成OD(pandas)

这段代码展示了如何使用Pandas库处理出租车轨迹数据,包括读取CSV文件,重命名列名,按车辆编号排序,识别状态变化(上车和下车点),并创建OD数据。通过数据清洗,筛选出状态改变行,并根据这些信息构建起点和终点信息。最终将结果保存到CSV文件中。
摘要由CSDN通过智能技术生成
import pandas as pd
#Read data
data = pd.read_csv(r'data-sample/TaxiData-Sample',header = None)
#rename the columns for the data
data.columns = ['VehicleNum', 'Stime', 'Lng', 'Lat', 'OpenStatus', 'Speed']
data
VehicleNumStimeLngLatOpenStatusSpeed
02227122:54:04114.16700022.71839900
12227118:26:26114.19059822.64780004
22227118:35:18114.20140122.64970000
32227116:02:46114.23349822.725901024
42227121:41:17114.23359722.720900019
.....................
16013022387320:20:03114.16014922.60693400
16013032387320:15:13114.16455122.605118148
16013042387320:16:23114.16801522.60608300
16013052387320:16:43114.16801522.60608300
16013062387320:26:39114.17064722.60438300

1601307 rows × 6 columns

data[data['VehicleNum'] == 22271]
VehicleNumStimeLngLatOpenStatusSpeed
02227122:54:04114.16700022.71839900
12227118:26:26114.19059822.64780004
22227118:35:18114.20140122.64970000
32227116:02:46114.23349822.725901024
42227121:41:17114.23359722.720900019
.....................
14322227111:31:09114.27940422.73900000
14332227111:33:09114.27940422.73900000
14342227110:43:56114.27950322.73900000
14352227110:51:48114.27950322.73920100
14362227100:07:41114.28270022.727800023

1437 rows × 6 columns

data[-(data['VehicleNum']==22271].head(2)
VehicleNumStimeLngLatOpenStatusSpeed
14373580701:53:46113.80989822.62680100
14383580701:43:46113.81330122.62360000
data=data.sort_values(by=['VehicleNum','Stime'])
data
VehicleNumStimeLngLatOpenStatusSpeed
392227100:00:49114.26650222.72820100
3972227100:01:48114.26650222.72820100
14132227100:02:47114.26650222.72820100
2442227100:03:46114.26650222.72820100
2472227100:04:45114.26889822.729500011
.....................
253203693423:49:54114.05670222.595800036
258313693423:50:42114.06030322.603701040
254853693423:52:32114.06919922.614500031
257493693423:53:02114.06739822.618401032
264643693423:53:32114.06729922.62159907

1601307 rows × 6 columns

data2=data.iloc[2460:2480]
data2
VehicleNumStimeLngLatOpenStatusSpeed
15504232233406:27:51114.03601822.519051019
15499002233406:28:11114.03524822.517817025
15493522233406:28:31114.03514922.51761605
15504222233406:28:51114.03456922.51796700
15472562233406:29:11114.03234922.518984021
15474182233406:29:31114.03125022.519550022
15474192233406:29:51114.03125022.51955000
15477642233406:30:11114.03125022.51955000
15504172233406:30:19114.03125022.51955010
15474162233406:30:39114.03009822.520317118
15472522233406:30:59114.02948022.521549135
15487782233406:31:19114.03061722.523132135
15500842233406:31:39114.03098322.523632111
15485972233406:31:59114.03098322.52363210
15485982233406:32:19114.03126522.52409915
15472572233406:32:39114.03244822.525850140
15474222233406:32:59114.03250122.52591711
15493502233406:33:19114.03250122.52591710
15493512233406:33:39114.03273022.52631814
15504212233406:33:59114.03373022.528099140
#数据整体往下平移了一行
data2['OpenStatus'].shift()
1550423    NaN
1549900    0.0
1549352    0.0
1550422    0.0
1547256    0.0
1547418    0.0
1547419    0.0
1547764    0.0
1550417    0.0
1547416    1.0
1547252    1.0
1548778    1.0
1550084    1.0
1548597    1.0
1548598    1.0
1547257    1.0
1547422    1.0
1549350    1.0
1549351    1.0
1550421    1.0
Name: OpenStatus, dtype: float64
#把平移后的数据整体赋值给新的一列OpenStatus1
data2['OpenStatus1']=data2['OpenStatus'].shift()
<ipython-input-59-8e1c0ecd6a11>:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data2['OpenStatus1']=data2['OpenStatus'].shift()
data2
VehicleNumStimeLngLatOpenStatusSpeedOpenStatus1
15504232233406:27:51114.03601822.519051019NaN
15499002233406:28:11114.03524822.5178170250.0
15493522233406:28:31114.03514922.517616050.0
15504222233406:28:51114.03456922.517967000.0
15472562233406:29:11114.03234922.5189840210.0
15474182233406:29:31114.03125022.5195500220.0
15474192233406:29:51114.03125022.519550000.0
15477642233406:30:11114.03125022.519550000.0
15504172233406:30:19114.03125022.519550100.0
15474162233406:30:39114.03009822.5203171181.0
15472522233406:30:59114.02948022.5215491351.0
15487782233406:31:19114.03061722.5231321351.0
15500842233406:31:39114.03098322.5236321111.0
15485972233406:31:59114.03098322.523632101.0
15485982233406:32:19114.03126522.524099151.0
15472572233406:32:39114.03244822.5258501401.0
15474222233406:32:59114.03250122.525917111.0
15493502233406:33:19114.03250122.525917101.0
15493512233406:33:39114.03273022.526318141.0
15504212233406:33:59114.03373022.5280991401.0
#数据清洗条件 #1.后者行等于前一行   # 2。但前后者行和中间行是不同的   # 3。三条数据车牌号相等
#数据清洗
data = data[-((data['OpenStatus'].shift(-1) == data['OpenStatus'].shift())&
(data['OpenStatus'].shift(-1) != data['OpenStatus'])&
(data['VehicleNum'].shift(-1) == data['VehicleNum'].shift())&
(data['VehicleNum'].shift(-1) == data['VehicleNum']))]
data
VehicleNumStimeLngLatOpenStatusSpeed
392227100:00:49114.26650222.72820100
3972227100:01:48114.26650222.72820100
14132227100:02:47114.26650222.72820100
2442227100:03:46114.26650222.72820100
2472227100:04:45114.26889822.729500011
.....................
253203693423:49:54114.05670222.595800036
258313693423:50:42114.06030322.603701040
254853693423:52:32114.06919922.614500031
257493693423:53:02114.06739822.618401032
264643693423:53:32114.06729922.62159907

1598866 rows × 6 columns

#识别OD
#OpenStatus从0变为1,说明上车,是O点,从1变为0,说明下车,是D点 
# 方法
# 将Open Status向上平移一行 赋值给Open Status1 用Open Status1- Open Status得到Status change
# Status change为-1说明是下车点  为1说明是上车点
data['OpenStatus1']=data['OpenStatus'].shift(-1)
<ipython-input-72-6240087fbc74>:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['OpenStatus1']=data['OpenStatus'].shift(-1)
data
VehicleNumStimeLngLatOpenStatusSpeedOpenStatus1
392227100:00:49114.26650222.728201000.0
3972227100:01:48114.26650222.728201000.0
14132227100:02:47114.26650222.728201000.0
2442227100:03:46114.26650222.728201000.0
2472227100:04:45114.26889822.7295000110.0
........................
253203693423:49:54114.05670222.5958000360.0
258313693423:50:42114.06030322.6037010400.0
254853693423:52:32114.06919922.6145000310.0
257493693423:53:02114.06739822.6184010320.0
264643693423:53:32114.06729922.62159907NaN

1598866 rows × 7 columns

data['Statuschange']=data['OpenStatus1']-data['OpenStatus']
<ipython-input-74-c1d32677d8c5>:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['Statuschange']=data['OpenStatus1']-data['OpenStatus']
data
VehicleNumStimeLngLatOpenStatusSpeedOpenStatus1Statuschange
392227100:00:49114.26650222.728201000.00.0
3972227100:01:48114.26650222.728201000.00.0
14132227100:02:47114.26650222.728201000.00.0
2442227100:03:46114.26650222.728201000.00.0
2472227100:04:45114.26889822.7295000110.00.0
...........................
253203693423:49:54114.05670222.5958000360.00.0
258313693423:50:42114.06030322.6037010400.00.0
254853693423:52:32114.06919922.6145000310.00.0
257493693423:53:02114.06739822.6184010320.00.0
264643693423:53:32114.06729922.62159907NaNNaN

1598866 rows × 8 columns

data.iloc[2920:2940] 
#Statuschange=-1就是下车点
VehicleNumStimeLngLatOpenStatusSpeedOpenStatus1Statuschange
15502382233409:05:01113.98235322.553267171.00.0
15468882233409:05:21113.98130022.5519331161.00.0
15505862233409:05:41113.98021722.5516001251.00.0
15468852233409:06:01113.97895122.5525491291.00.0
15482462233409:06:21113.97882122.552700130.0-1.0
15477522233409:06:23113.97882122.552700000.00.0
15477392233409:06:43113.97882122.552700000.00.0
15495062233409:07:03113.97882122.552700000.00.0
15502372233409:07:23113.97998022.5519160200.00.0
15493262233409:07:43113.98056822.5510180220.00.0
15485822233409:08:03113.98037022.550800050.00.0
15468862233409:08:23113.98037022.550800000.00.0
15468872233409:08:43113.98037022.550800000.00.0
15473932233409:09:03113.98037022.550800000.00.0
15477402233409:09:23113.98037022.550800000.00.0
15498742233409:09:43113.98003422.550400030.00.0
15479342233409:10:03113.98040022.5496670110.00.0
15473942233409:10:23113.98166722.5487000300.00.0
15480822233409:10:43113.98338322.5474170370.00.0
15491282233409:11:03113.98519922.5460000410.00.0
data.iloc[2840:2860]
##Statuschange=-1就是上车点
VehicleNumStimeLngLatOpenStatusSpeedOpenStatus1Statuschange
15486462233408:37:03114.05593122.538349000.00.0
15506482233408:37:45114.05719822.537832000.00.0
15484662233408:38:05114.05719822.537832000.00.0
15478262233408:38:16114.05719822.537832000.00.0
15486482233408:38:36114.05719822.537832000.00.0
15486492233408:38:56114.05719822.537832000.00.0
15490172233408:39:16114.05741922.537617021.01.0
15502982233408:39:20114.05741922.537617101.00.0
15502992233408:39:40114.05741922.537617101.00.0
15481452233408:40:00114.05636622.537600141.00.0
15484612233408:40:20114.05576322.537500171.00.0
15506392233408:40:40114.05290222.5356331711.00.0
15504602233408:41:20114.05503122.5346341271.00.0
15497542233408:42:00114.05413122.5312821271.00.0
15481392233408:42:20114.05223122.5312181321.00.0
15471192233408:42:40114.04893522.5316011391.00.0
15493802233408:43:00114.04636422.5315841461.00.0
15474402233408:43:20114.04340422.5313841551.00.0
15486082233408:43:40114.04053522.5310821521.00.0
15486012233408:44:00114.03769722.5309161471.00.0
data[(data['Statuschange']==1) | (data['Statuschange']==-1)]
VehicleNumStimeLngLatOpenStatusSpeedOpenStatus1Statuschange
1362227123:53:08114.20570422.7204000471.01.0
15487412233400:00:52114.11113022.5767501130.0-1.0
15483512233400:07:44114.08049822.5541820111.01.0
15496202233400:17:58114.08491522.540850120.0-1.0
15471822233400:18:56114.08491522.540850001.01.0
...........................
4590733680523:20:04114.13353022.6175501150.0-1.0
4581043680523:35:55114.11296822.549601091.01.0
4584163680523:42:55114.08950022.5380671510.0-1.0
4556813680523:46:12114.09121722.540768001.01.0
4539873680523:53:33114.12035422.544300100.0-1.0

30242 rows × 8 columns

#shift(-1)这一行要和原本的哪一行车牌要相同
data[((data['Statuschange']==1) | (data['Statuschange']==-1)) & (data['VehicleNum']==data['VehicleNum'].shift(-1))]
VehicleNumStimeLngLatOpenStatusSpeedOpenStatus1Statuschange
15487412233400:00:52114.11113022.5767501130.0-1.0
15483512233400:07:44114.08049822.5541820111.01.0
15496202233400:17:58114.08491522.540850120.0-1.0
15471822233400:18:56114.08491522.540850001.01.0
15476272233400:44:47114.05623622.633383130.0-1.0
...........................
4590733680523:20:04114.13353022.6175501150.0-1.0
4581043680523:35:55114.11296822.549601091.01.0
4584163680523:42:55114.08950022.5380671510.0-1.0
4556813680523:46:12114.09121722.540768001.01.0
4539873680523:53:33114.12035422.544300100.0-1.0

30016 rows × 8 columns

# 结果赋给oddata
oddata=data[((data['Statuschange']==1) | (data['Statuschange']==-1)) & (data['VehicleNum']==data['VehicleNum'].shift(-1))]
oddata
VehicleNumStimeLngLatOpenStatusSpeedOpenStatus1Statuschange
15487412233400:00:52114.11113022.5767501130.0-1.0
15483512233400:07:44114.08049822.5541820111.01.0
15496202233400:17:58114.08491522.540850120.0-1.0
15471822233400:18:56114.08491522.540850001.01.0
15476272233400:44:47114.05623622.633383130.0-1.0
...........................
4590733680523:20:04114.13353022.6175501150.0-1.0
4581043680523:35:55114.11296822.549601091.01.0
4584163680523:42:55114.08950022.5380671510.0-1.0
4556813680523:46:12114.09121722.540768001.01.0
4539873680523:53:33114.12035422.544300100.0-1.0

30016 rows × 8 columns

oddata[['VehicleNum','Stime','Lng','Lat','Statuschange']]
VehicleNumStimeLngLatStatuschange
15487412233400:00:52114.11113022.576750-1.0
15483512233400:07:44114.08049822.5541821.0
15496202233400:17:58114.08491522.540850-1.0
15471822233400:18:56114.08491522.5408501.0
15476272233400:44:47114.05623622.633383-1.0
..................
4590733680523:20:04114.13353022.617550-1.0
4581043680523:35:55114.11296822.5496011.0
4584163680523:42:55114.08950022.538067-1.0
4556813680523:46:12114.09121722.5407681.0
4539873680523:53:33114.12035422.544300-1.0

30016 rows × 5 columns

#提取有用的列,双[]
oddata=oddata[['VehicleNum','Stime','Lng','Lat','Statuschange']]
oddata
VehicleNumStimeLngLatStatuschange
15487412233400:00:52114.11113022.576750-1.0
15483512233400:07:44114.08049822.5541821.0
15496202233400:17:58114.08491522.540850-1.0
15471822233400:18:56114.08491522.5408501.0
15476272233400:44:47114.05623622.633383-1.0
..................
4590733680523:20:04114.13353022.617550-1.0
4581043680523:35:55114.11296822.5496011.0
4584163680523:42:55114.08950022.538067-1.0
4556813680523:46:12114.09121722.5407681.0
4539873680523:53:33114.12035422.544300-1.0

30016 rows × 5 columns

#修改列名 注意格式
oddata.columns=['VehicleNum','Stime','SLng','SLat','Statuschange']
oddata
VehicleNumStimeSLngSLatStatuschange
15487412233400:00:52114.11113022.576750-1.0
15483512233400:07:44114.08049822.5541821.0
15496202233400:17:58114.08491522.540850-1.0
15471822233400:18:56114.08491522.5408501.0
15476272233400:44:47114.05623622.633383-1.0
..................
4590733680523:20:04114.13353022.617550-1.0
4581043680523:35:55114.11296822.5496011.0
4584163680523:42:55114.08950022.538067-1.0
4556813680523:46:12114.09121722.5407681.0
4539873680523:53:33114.12035422.544300-1.0

30016 rows × 5 columns

#新建终点时间经纬度,值是起点上移一行,因为不是终点就是起点
oddata['Elng']=oddata['SLng'].shift(-1)
<ipython-input-110-74375a3c7844>:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  oddata['Elng']=oddata['SLng'].shift(-1)
oddata['ELat']=oddata['SLat'].shift(-1)
oddata['Etime']=oddata['Stime'].shift(-1)
<ipython-input-111-7dbbfabccb1e>:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  oddata['ELat']=oddata['SLat'].shift(-1)
oddata
VehicleNumStimeSLngSLatStatuschangeElngELatEtime
15487412233400:00:52114.11113022.576750-1.0114.08049822.55418200:07:44
15483512233400:07:44114.08049822.5541821.0114.08491522.54085000:17:58
15496202233400:17:58114.08491522.540850-1.0114.08491522.54085000:18:56
15471822233400:18:56114.08491522.5408501.0114.05623622.63338300:44:47
15476272233400:44:47114.05623622.633383-1.0114.09163722.54320002:38:35
...........................
4590733680523:20:04114.13353022.617550-1.0114.11296822.54960123:35:55
4581043680523:35:55114.11296822.5496011.0114.08950022.53806723:42:55
4584163680523:42:55114.08950022.538067-1.0114.09121722.54076823:46:12
4556813680523:46:12114.09121722.5407681.0114.12035422.54430023:53:33
4539873680523:53:33114.12035422.544300-1.0NaNNaNNaN

30016 rows × 8 columns

#只要起点
oddata[oddata['Statuschange']==1]
VehicleNumStimeSLngSLatStatuschangeElngELatEtime
15483512233400:07:44114.08049822.5541821.0114.08491522.54085000:17:58
15471822233400:18:56114.08491522.5408501.0114.05623622.63338300:44:47
15475112233402:38:35114.09163722.5432001.0114.09349822.55438202:46:52
15477892233403:58:46114.03881822.5532321.0114.05229922.60436604:13:57
15477642233406:30:11114.03125022.5195501.0114.06788622.52129906:41:19
...........................
4548323680522:49:01114.11436522.5506321.0114.11575322.55755022:50:30
4569003680522:51:55114.11540222.5580831.0114.11871322.54788223:03:03
4539863680523:03:42114.11848422.5478671.0114.13353022.61755023:20:04
4581043680523:35:55114.11296822.5496011.0114.08950022.53806723:42:55
4556813680523:46:12114.09121722.5407681.0114.12035422.54430023:53:33

15022 rows × 8 columns

#要筛选车牌也要一样 不然蹿车
oddata[(oddata['Statuschange']==1)&(oddata['VehicleNum']==oddata['VehicleNum'].shift(-1))]
VehicleNumStimeSLngSLatStatuschangeElngELatEtime
15483512233400:07:44114.08049822.5541821.0114.08491522.54085000:17:58
15471822233400:18:56114.08491522.5408501.0114.05623622.63338300:44:47
15475112233402:38:35114.09163722.5432001.0114.09349822.55438202:46:52
15477892233403:58:46114.03881822.5532321.0114.05229922.60436604:13:57
15477642233406:30:11114.03125022.5195501.0114.06788622.52129906:41:19
...........................
4548323680522:49:01114.11436522.5506321.0114.11575322.55755022:50:30
4569003680522:51:55114.11540222.5580831.0114.11871322.54788223:03:03
4539863680523:03:42114.11848422.5478671.0114.13353022.61755023:20:04
4581043680523:35:55114.11296822.5496011.0114.08950022.53806723:42:55
4556813680523:46:12114.09121722.5407681.0114.12035422.54430023:53:33

14818 rows × 8 columns

#删除一列
oddata.drop('Statuschange',axis=1)
VehicleNumStimeSLngSLatElngELatEtime
15487412233400:00:52114.11113022.576750114.08049822.55418200:07:44
15483512233400:07:44114.08049822.554182114.08491522.54085000:17:58
15496202233400:17:58114.08491522.540850114.08491522.54085000:18:56
15471822233400:18:56114.08491522.540850114.05623622.63338300:44:47
15476272233400:44:47114.05623622.633383114.09163722.54320002:38:35
........................
4590733680523:20:04114.13353022.617550114.11296822.54960123:35:55
4581043680523:35:55114.11296822.549601114.08950022.53806723:42:55
4584163680523:42:55114.08950022.538067114.09121722.54076823:46:12
4556813680523:46:12114.09121722.540768114.12035422.54430023:53:33
4539873680523:53:33114.12035422.544300NaNNaNNaN

30016 rows × 7 columns

oddate=oddata[(oddata['Statuschange']==1)&(oddata['VehicleNum']==oddata['VehicleNum'].shift(-1))].drop('Statuschange',axis=1)
oddate
VehicleNumStimeSLngSLatElngELatEtime
15483512233400:07:44114.08049822.554182114.08491522.54085000:17:58
15471822233400:18:56114.08491522.540850114.05623622.63338300:44:47
15475112233402:38:35114.09163722.543200114.09349822.55438202:46:52
15477892233403:58:46114.03881822.553232114.05229922.60436604:13:57
15477642233406:30:11114.03125022.519550114.06788622.52129906:41:19
........................
4548323680522:49:01114.11436522.550632114.11575322.55755022:50:30
4569003680522:51:55114.11540222.558083114.11871322.54788223:03:03
4539863680523:03:42114.11848422.547867114.13353022.61755023:20:04
4581043680523:35:55114.11296822.549601114.08950022.53806723:42:55
4556813680523:46:12114.09121722.540768114.12035422.54430023:53:33

14818 rows × 7 columns

#encoding='utf-8_sig'防止中文乱码 
oddate.to_csv(r'oddate.csv',index=None,encoding='utf-8_sig')

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值