重采样
重采样:指的是将时间序列从一个频率转化为另一个频率进行处理的过程,将高频率数据转化为低频率数据为降采样,低频率转化为高频率为升采样。
降采样
t = pd.DataFrame(np.random.uniform(10,50,(100,1)),index=pd.date_range('20170101',periods=100))
t
0
2017-01-01 42.009320
2017-01-02 27.031279
2017-01-03 14.262344
2017-01-04 29.221443
2017-01-05 22.895785
2017-01-06 26.814856
2017-01-07 19.565748
2017-01-08 47.893685
2017-01-09 43.359727
2017-01-10 39.692903
2017-01-11 24.697132
2017-01-12 16.197977
2017-01-13 27.223503
2017-01-14 38.642974
2017-01-15 46.869408
2017-01-16 12.787360
2017-01-17 11.652354
2017-01-18 35.901554
2017-01-19 17.519836
2017-01-20 44.663240
2017-01-21 44.642161
2017-01-22 37.451723
2017-01-23 21.280267
2017-01-24 44.991936
2017-01-25 49.983729
2017-01-26 44.994922
2017-01-27 26.077919
2017-01-28 25.978752
2017-01-29 14.818770
2017-01-30 14.156555
... ...
2017-03-12 43.302700
2017-03-13 15.709008
2017-03-14 35.354453
2017-03-15 37.885999
2017-03-16 38.062864
2017-03-17 29.039956
2017-03-18 37.100101
2017-03-19 14.473501
2017-03-20 48.391104
2017-03-21 24.301725
2017-03-22 36.347639
2017-03-23 42.361770
2017-03-24 30.042126
2017-03-25 27.018687
2017-03-26 22.962364
2017-03-27 47.031464
2017-03-28 28.647002
2017-03-29 43.053664
2017-03-30 32.750043
2017-03-31 29.264535
2017-04-01 49.336224
2017-04-02 21.064076
2017-04-03 18.191110
2017-04-04 40.548393
2017-04-05 17.578473
2017-04-06 19.759165
2017-04-07 28.063757
2017-04-08 26.345850
2017-04-09 35.661071
2017-04-10 32.292340
100 rows × 1 columns
这是一个100行的时间序列的数据,并且间隔是天,现在我们把间隔变成月,这样它的频率就从高变低了,称之为降采样
# 时间间隔从天 变到月
# 利用resample函数
t.resample('M').sum()
0
2017-01-31 948.297346
2017-02-28 888.290936
2017-03-31 960.922047
2017-04-30 288.840458
升采样
与降采样相反的,把低频的变成高频的就是升采样
#升采样
frame = pd.DataFrame(np.random.randn(2, 4),
index=pd.date_range('1/1/2000', periods=2,freq='W-WED'),
columns=['上海', '北京', '深圳', '广州'])
frame
上海 北京 深圳 广州
2000-01-05 1.010248 0.251598 1.131810 0.035474
2000-01-12 -0.221884 1.136224 -0.761822 0.056637
# asfreq 反转频率 就变成升采样了
frame.resample('D').asfreq()
# 没有的值会用nan填充
上海 北京 深圳 广州
2000-01-05 1.010248 0.251598 1.131810 0.035474
2000-01-06 NaN NaN NaN NaN
2000-01-07 NaN NaN NaN NaN
2000-01-08 NaN NaN NaN NaN
2000-01-09 NaN NaN NaN NaN
2000-01-10 NaN NaN NaN NaN
2000-01-11 NaN NaN NaN NaN
2000-01-12 -0.221884 1.136224 -0.761822 0.056637
# 填充缺失值
frame.resample('D').ffill()# 会以上一个值填充nan
上海 北京 深圳 广州
2000-01-05 1.010248 0.251598 1.131810 0.035474
2000-01-06 1.010248 0.251598 1.131810 0.035474
2000-01-07 1.010248 0.251598 1.131810 0.035474
2000-01-08 1.010248 0.251598 1.131810 0.035474
2000-01-09 1.010248 0.251598 1.131810 0.035474
2000-01-10 1.010248 0.251598 1.131810 0.035474
2000-01-11 1.010248 0.251598 1.131810 0.035474
2000-01-12 -0.221884 1.136224 -0.761822 0.056637