Python和Pandas对时间数据的处理:以电动汽车充电数据为例

1、数据

电动汽车的充电数据形式如下 

订单号充电开始时间充电完成时间订单/时段总充电量(KWh)尖时电量峰时电量平时电量谷时电量
2023020105000026122023-02-01 00:03:262023-02-01 00:40:5228.4410.0000.0000.00028.441
2023020105000045702023-02-01 00:10:282023-02-01 01:01:3545.3190.0000.0000.00045.319
2023020105000047282023-02-01 00:10:322023-02-01 01:27:1357.3120.0000.0000.00057.312
2023020105000049592023-02-01 00:11:322023-02-01 01:16:4525.4920.0000.0000.00025.492
2023020105000069692023-02-01 00:22:592023-02-01 01:36:3844.8890.0000.0000.00044.889
2023020105000070762023-02-01 00:24:132023-02-01 00:40:539.1310.0000.0000.0009.131
2023020105000070822023-02-01 00:24:352023-02-01 00:40:518.1870.0000.0000.0008.187
2023020105000078652023-02-01 00:29:272023-02-01 01:31:3933.8940.0000.0000.00033.894
2023020105000086632023-02-01 00:35:202023-02-01 01:20:3128.2400.0000.0000.00028.240
2023020105000087552023-02-01 00:35:502023-02-01 01:32:0431.7520.0000.0000.00031.752

 2、目标任务

以5分钟为一个时段,需要知道每天每个时段的总充电量,最终结果如下所示

3、数据处理

Python代码在《excel_pandas.py》中,具体如下

import pandas as pd
import time
import os

start_time = time.time()  # 开始计时
carDF = pd.read_excel(io='./carChageTime.xlsx')# 读取xlsx(相对路径)


# 将时间字符串转换为时间戳
carDF['start_time'] = pd.to_datetime(carDF['充电开始时间'])
carDF['end_time'] = pd.to_datetime(carDF['充电完成时间'])

# 建立空表,以5分钟为间隔
start_index_time = pd.to_datetime('2023-02-01')
end_index_time = pd.to_datetime('2023-07-31')
power_index = pd.date_range(start=start_index_time, end=end_index_time, freq='5min')

power_excel = pd.DataFrame(data=0, columns=['power'], index=power_index)
pass

start_index_bill = 0
end_index_bill = 52000
for index in range(start_index_bill, end_index_bill):  # 260
    print(start_index_bill, '/', index, '/', end_index_bill)
    count = 0
    
    num_5minutes = (carDF['end_time'][index] - carDF['start_time'][index]) // pd.Timedelta("5T")
    if num_5minutes == 0:  # 0不能做除数
        power_average = carDF['订单/时段总充电量(KWh)'][index]  # 将每一个订单的总电量均分
        # print('num_5minutes=0', 'index=', index, 'power=', power_average)
        # print(carDF['start_time'][index], carDF['end_time'][index])
        for x in power_index:
            if carDF['start_time'][index] <= x:
                power_excel.loc[x] = power_excel.loc[x] + power_average
                break
    else:
        power_average = carDF['订单/时段总充电量(KWh)'][index] / num_5minutes
        for x in power_index:
            if carDF['start_time'][index] <= x <= carDF['end_time'][index]:
                count = count + 1
                if count > num_5minutes:
                    break
                else:
                    power_excel.loc[x] = power_excel.loc[x] + power_average
        pass
    
# 保存结果
realTime = time.strftime('%Y%m%d%H%M%S', time.localtime())
file_name = os.path.basename(__file__)[0:-3]
power_excel.to_csv('{0}_{1}_{2}_{3}.csv'.format(file_name, realTime, start_index_bill, end_index_bill))

# 打印程序的运行时间
end_time = time.time()
print("end_time-start_time={}".format(end_time - start_time))

最终生成一个《excel_pandas_slice_20230918194213_0_52000.csv》,内容如下

4、进一步处理

 Python代码在《csv_to_excel.py》中,具体如下

import pandas as pd
import time
import os

start_time = time.time()  # 开始计时
df_RNN = pd.read_csv('excel_pandas_slice_20230918194213_0_52000.csv')  # 读取xlsx(相对路径)

# 行名
start_index_time = pd.to_datetime('2023-02-01 00:05:00')  # Timestamp('2023-02-01 00:00:00')
end_index_time = pd.to_datetime('2023-02-02 00:00:00')  # Timestamp('2023-07-31 00:00:00')
power_index = pd.date_range(start=start_index_time, end=end_index_time, freq='5min')

# 列名
start_column_time = pd.to_datetime('2023-02-01')  # Timestamp('2023-02-01 00:00:00')
end_column_time = pd.to_datetime('2023-7-31')  # Timestamp('2023-07-31 00:00:00')
power_column = pd.date_range(start=start_column_time, end=end_column_time, freq='D')

power_excel = pd.DataFrame(data=0, columns=power_column, index=power_index)

# num_5minutes用于记录一天被划分为了多少个时间段
# 这里以5分钟为一个时间段,所以num_5minutes=288
t_step = pd.Timedelta("5T")  # 时间间隔5minutes
num_5minutes = (end_index_time - start_index_time) // t_step + 1  # 288
pass

for i in range(df_RNN.shape[0] // num_5minutes):
    power_excel.loc[:, power_column[i]] = list(df_RNN['power'][i * num_5minutes:(i + 1) * num_5minutes])

# 保存结果
realTime = time.strftime('%Y%m%d%H%M%S', time.localtime())
file_name = os.path.basename(__file__)[0:-3]
power_excel.to_excel('{0}_{1}.xlsx'.format(file_name, realTime))

# 打印程序的运行时间
end_time = time.time()
print("end_time-start_time={}".format(end_time - start_time))

 最终生成一个《excel_to_csv_20230918203118.xlsx》,内容如下

 这里在excel对《excel_to_csv_20230918203118.xlsx》的索引进行了设置单元格式处理

5、绘图 

绘图参考

EXCEL如何在一个图上画多条曲线-百度经验 (baidu.com)

参考另一篇博客

Python模块之pandas中时间戳之间的相互转换_飞由于度的博客-CSDN博客

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值