数据日期是一串数字,以每天的日期分别筛选出男女司机数据,并保存文件。
代码1:按照日期、性别分类
import pandas as pd
female_data = pd.read_csv('C:\\Users\\11357\\Desktop\\男女司机\\data_female.csv')
male_data = pd.read_csv('C:\\Users\\11357\\Desktop\\男女司机\\data_male.csv')
today = 20220418000000
count = 1000000
while today < 20220425000000:
next_day = today + 1000000
male_day = male_data[(male_data['DEP_TIME'] >= today) & (male_data['DEP_TIME'] < next_day)]
male_day.to_csv('C:\\Users\\11357\\Desktop\\男女司机\\日期\\male_day'+ str(today)[4:8]+'.csv',index = None,encoding = 'utf-8_sig')
female_day = female_data[(female_data['DEP_TIME'] >= today) & (female_data['DEP_TIME'] < next_day)]
female_day.to_csv('C:\\Users\\11357\\Desktop\\男女司机\\日期\\female_day'+ str(today)[4:8]+'.csv',index = None,encoding = 'utf-8_sig')
today += count
代码2:按照日期、性别分类,别求男女司机每天累计行驶时间和里程,还有平均值。完整代码:
import pandas as pd
female_data = pd.read_csv('C:\\Users\\11357\\Desktop\\男女司机\\data_female.csv')
male_data = pd.read_csv('C:\\Users\\11357\\Desktop\\男女司机\\data_male.csv')
today = 20220418000000
count = 1000000
male_drive_mile = []
male_drive_time = []
female_drive_mile = []
female_drive_time = []
while today < 20220425000000:
next_day = today + 1000000
male_day = male_data[(male_data['DEP_TIME'] >= today) & (male_data['DEP_TIME'] < next_day)]
male_drive_mile.append(male_day['DRIVE_MILE'].sum())
male_drive_time.append(male_day['DRIVE_TIME'].sum())
male_day.to_csv('C:\\Users\\11357\\Desktop\\男女司机\\日期\\male_day'+ str(today)[4:8]+'.csv',index = None,encoding = 'utf-8_sig')
female_day = female_data[(female_data['DEP_TIME'] >= today) & (female_data['DEP_TIME'] < next_day)]
female_drive_mile.append(female_day['DRIVE_MILE'].sum())
female_drive_time.append(female_day['DRIVE_TIME'].sum())
female_day.to_csv('C:\\Users\\11357\\Desktop\\男女司机\\日期\\female_day'+ str(today)[4:8]+'.csv',index = None,encoding = 'utf-8_sig')
today += count
time_and_mile = pd.DataFrame([male_drive_mile,
male_drive_time,
female_drive_mile,
female_drive_time],columns = ['2022.4.18','2022.4.19','2022.4.20','2022.4.21','2022.4.22','2022.4.23','2022.4.24'])
time_and_mile.index = ['male_drive_mile','male_drive_time','female_drive_mile','female_drive_time']
def average(time_and_mile):
i = 18
average = 0
while i < 25:
average += time_and_mile['2022.4.'+ str(i)]
i+=1
average /= 7
return average
time_and_mile ['average'] = time_and_mile.apply(average,axis = 1)
time_and_mile.to_csv('C:\\Users\\11357\\Desktop\\mile和time计算.csv',encoding = 'utf-8_sig')
这里数据是有误的,可能是之前筛选的问题,方法无误。