用pandas填充时间序列缺失值
例如,下有时间缺失值:
Date_time current_demand Temp_Mean humidity_Mean
0 2018-05-01 00:00 15951.0 300.904267 49.600000
1 2018-05-01 00:15 16075.0 300.904267 49.600000
2 2018-05-01 00:30 15977.0 300.904267 49.600000
3 2018-05-01 00:45 15945.0 300.837600 50.333333
4 2018-05-01 01:00 15868.0 298.889333 59.133333
5 2018-05-01 01:15 15583.0 298.889333 59.133333
6 2018-05-01 01:30 15470.0 298.756000 59.800000
7 2018-05-01 01:45 15301.0 298.756000 59.800000
8 2018-05-01 02:15 14946.0 298.756000 59.800000
9 2018-05-01 02:30 14736.0 298.756000 59.800000
10 2018-05-01 02:45 14630.0 298.502333 59.000000
11 2018-05-01 03:15 14350.0 298.502333 59.000000
csv文件(修改):
Date_time,current_demand,Temp_Mean,humidity_Mean
2018-05-01 00:00,15951.0,300.904267,49.600000
2018-05-01 00:15,16075.0,300.904267,49.600000
2018-05-01 00:30,15977.0,300.904267,49.600000
2018-05-01 00:45,15945.0,300.837600,50.333333
2018-05-01 01:00,15868.0,298.889333,59.133333
2018-05-01 01:15,15583.0,298.889333,59.133333
2018-05-01 01:30,15470.0,298.756000,59.800000
2018-05-01 01:45,15301.0,298.756000,59.800000
2018-05-01 02:15,14946.0,298.756000,59.800000
2018-05-01 02:30,14736.0,298.756000,59.800000
2018-05-01 02:45,14630.0,298.502333,59.000000
2018-05-01 03:15,14350.0,298.502333,59.000000
import pandas as pd
import numpy as np
df = pd.read_csv(r'submission.csv',sep = ',')
df.shape
df['Date_time'] = pd.to_datetime(df['Date_time'])
grouper = pd.Grouper(key='Date_time', freq='15T')
res = df.groupby(grouper).first().ffill().reset_index()
res
结果如下: