数据重采样
- 时间数据由一个频率转换到另一个频率
- 降采样
- 升采样
1,降采样
rng = pd.date_range('1/1/2011', periods=90, freq='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts.head()
2011-01-01 -1.025562
2011-01-02 0.410895
2011-01-03 0.660311
2011-01-04 0.710293
2011-01-05 0.444985
Freq: D, dtype: float64
ts.resample('M').sum()
2011-01-31 2.510102
2011-02-28 0.583209
2011-03-31 2.749411
Freq: M, dtype: float64
ts.resample('3D').sum()
2011-01-01 0.045643
2011-01-04 -2.255206
2011-01-07 0.571142
2011-01-10 0.835032
2011-01-13 -0.396766
2011-01-16 -1.156253
2011-01-19 -1.286884
2011-01-22 2.883952
2011-01-25 1.566908
2011-01-28 1.435563
2011-01-31 0.311565
2011-02-03 -2.541235
2011-02-06 0.317075
2011-02-09 1.598877
2011-02-12 -1.950509
2011-02-15 2.928312
2011-02-18 -0.733715
2011-02-21 1.674817
2011-02-24 -2.078872
2011-02-27 2.172320
2011-03-02 -2.022104
2011-03-05 -0.070356
2011-03-08 1.276671
2011-03-11 -2.835132
2011-03-14 -1.384113
2011-03-17 1.517565
2011-03-20 -0.550406
2011-03-23 0.773430
2011-03-26 2.244319
2011-03-29 2.951082
Freq: 3D, dtype: float64
2.升采样
day3Ts = ts.resample('3D').mean()
day3Ts
day3Ts.resample('D').asfreq()
2011-01-01 0.015214
2011-01-02 NaN
2011-01-03 NaN
2011-01-04 -0.751735
2011-01-05 NaN
2011-01-06 NaN
2011-01-07 0.190381
2011-01-08 NaN
2011-01-09 NaN
2011-01-10 0.278344
2011-01-11 NaN
2011-01-12 NaN
2011-01-13 -0.132255
2011-01-14 NaN
2011-01-15 NaN
2011-01-16 -0.385418
2011-01-17 NaN
2011-01-18 NaN
2011-01-19 -0.428961
2011-01-20 NaN
2011-01-21 NaN
2011-01-22 0.961317
2011-01-23 NaN
2011-01-24 NaN
2011-01-25 0.522303
2011-01-26 NaN
2011-01-27 NaN
2011-01-28 0.478521
2011-01-29 NaN
2011-01-30 NaN
…
2011-02-28 NaN
2011-03-01 NaN
2011-03-02 -0.674035
2011-03-03 NaN
2011-03-04 NaN
2011-03-05 -0.023452
2011-03-06 NaN
2011-03-07 NaN
2011-03-08 0.425557
2011-03-09 NaN
2011-03-10 NaN
2011-03-11 -0.945044
2011-03-12 NaN
2011-03-13 NaN
2011-03-14 -0.461371
2011-03-15 NaN
2011-03-16 NaN
2011-03-17 0.505855
2011-03-18 NaN
2011-03-19 NaN
2011-03-20 -0.183469
2011-03-21 NaN
2011-03-22 NaN
2011-03-23 0.257810
2011-03-24 NaN
2011-03-25 NaN
2011-03-26 0.748106
2011-03-27 NaN
2011-03-28 NaN
2011-03-29 0.983694
Freq: D, Length: 88, dtype: float64
插值方法:
- ffill 空值取前面的值
- bfill 空值取后面的值
- interpolate 线性取值
day3Ts.resample(‘D’).ffill(1)
2011-01-01 0.015214
2011-01-02 0.015214
2011-01-03 NaN
2011-01-04 -0.751735
2011-01-05 -0.751735
2011-01-06 NaN
2011-01-07 0.190381
2011-01-08 0.190381
2011-01-09 NaN
2011-01-10 0.278344
2011-01-11 0.278344
2011-01-12 NaN
2011-01-13 -0.132255
2011-01-14 -0.132255
2011-01-15 NaN
2011-01-16 -0.385418
2011-01-17 -0.385418
2011-01-18 NaN
2011-01-19 -0.428961
2011-01-20 -0.428961
2011-01-21 NaN
2011-01-22 0.961317
2011-01-23 0.961317
2011-01-24 NaN
2011-01-25 0.522303
2011-01-26 0.522303
2011-01-27 NaN
2011-01-28 0.478521
2011-01-29 0.478521
2011-01-30 NaN
…
2011-02-28 0.724107
2011-03-01 NaN
2011-03-02 -0.674035
2011-03-03 -0.674035
2011-03-04 NaN
2011-03-05 -0.023452
2011-03-06 -0.023452
2011-03-07 NaN
2011-03-08 0.425557
2011-03-09 0.425557
2011-03-10 NaN
2011-03-11 -0.945044
2011-03-12 -0.945044
2011-03-13 NaN
2011-03-14 -0.461371
2011-03-15 -0.461371
2011-03-16 NaN
2011-03-17 0.505855
2011-03-18 0.505855
2011-03-19 NaN
2011-03-20 -0.183469
2011-03-21 -0.183469
2011-03-22 NaN
2011-03-23 0.257810
2011-03-24 0.257810
2011-03-25 NaN
2011-03-26 0.748106
2011-03-27 0.748106
2011-03-28 NaN
2011-03-29 0.983694
Freq: D, Length: 88, dtype: float64
day3Ts.resample('D').bfill(1)
day3Ts.resample('D').interpolate('linear')