TSAP : TimeSeries Analysis with Python
import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2011', periods=10, freq='H')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
# 时间跨度为小时
ts
2011-01-01 00:00:00 -1.065583
2011-01-01 01:00:00 -0.586701
2011-01-01 02:00:00 -0.554193
2011-01-01 03:00:00 -0.316603
2011-01-01 04:00:00 0.534045
2011-01-01 05:00:00 -0.764800
2011-01-01 06:00:00 0.196573
2011-01-01 07:00:00 0.201643
2011-01-01 08:00:00 -0.694384
2011-01-01 09:00:00 0.555979
Freq: H, dtype: float64
# 改变时间跨度(间隔为45分钟), value的值向后填充
converted = ts.asfreq('45Min', method='pad')
converted
2011-01-01 00:00:00 -1.065583
2011-01-01 00:45:00 -1.065583
2011-01-01 01:30:00 -0.586701
2011-01-01 02:15:00 -0.554193
2011-01-01 03:00:00 -0.316603
2011-01-01 03:45:00 -0.316603
2011-01-01 04:30:00 0.534045
2011-01-01 05:15:00 -0.764800
2011-01-01 06:00:00 0.196573
2011-01-01 06:45:00 0.196573
2011-01-01 07:30:00 0.201643
2011-01-01 08:15:00 -0.694384
2011-01-01 09:00:00 0.555979
Freq: 45T, dtype: float64
改变时间点的采样频率
缺失值的填充方式.
- method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}
# backfill在缺失的时间点上,value的值向前填充
ts.asfreq('45Min', method='backfill')
2011-01-01 00:00:00 -1.065583
2011-01-01 00:45:00 -0.586701
2011-01-01 01:30:00 -0.554193
2011-01-01 02:15:00 -0.316603
2011-01-01 03:00:00 -0.316603
2011-01-01 03:45:00 0.534045
2011-01-01 04:30:00 -0.764800
2011-01-01 05:15:00 0.196573
2011-01-01 06:00:00 0.196573
2011-01-01 06:45:00 0.201643
2011-01-01 07:30:00 -0.694384
2011-01-01 08:15:00 0.555979
2011-01-01 09:00:00 0.555979
Freq: 45T, dtype: float64
# bfill = backfill
ts.asfreq('45Min', method='bfill')
2011-01-01 00:00:00 -1.065583
2011-01-01 00:45:00 -0.586701
2011-01-01 01:30:00 -0.554193
2011-01-01 02:15:00 -0.316603
2011-01-01 03:00:00 -0.316603
2011-01-01 03:45:00 0.534045
2011-01-01 04:30:00 -0.764800
2011-01-01 05:15:00 0.196573
2011-01-01 06:00:00 0.196573
2011-01-01 06:45:00 0.201643
2011-01-01 07:30:00 -0.694384
2011-01-01 08:15:00 0.555979
2011-01-01 09:00:00 0.555979
Freq: 45T, dtype: float64
# ffill 向后填充缺失
# 01:30:00 用 01:00:00的值来填充
converted.asfreq('45Min', method='ffill')
2011-01-01 00:00:00 -1.065583
2011-01-01 00:45:00 -1.065583
2011-01-01 01:30:00 -0.586701
2011-01-01 02:15:00 -0.554193
2011-01-01 03:00:00 -0.316603
2011-01-01 03:45:00 -0.316603
2011-01-01 04:30:00 0.534045
2011-01-01 05:15:00 -0.764800
2011-01-01 06:00:00 0.196573
2011-01-01 06:45:00 0.196573
2011-01-01 07:30:00 0.201643
2011-01-01 08:15:00 -0.694384
2011-01-01 09:00:00 0.555979
Freq: 45T, dtype: float64
# 时间频率切换到低频,向前填充
converted.asfreq('90Min', method = 'ffill')
2011-01-01 00:00:00 -1.065583
2011-01-01 01:30:00 -0.586701
2011-01-01 03:00:00 -0.316603
2011-01-01 04:30:00 0.534045
2011-01-01 06:00:00 0.196573
2011-01-01 07:30:00 0.201643
2011-01-01 09:00:00 0.555979
Freq: 90T, dtype: float64
resample VS asfreq( )
ts.asfreq('D').sum()
-1.0655834142614131
ts.resample('D').sum()
2011-01-01 -2.494026
Freq: D, dtype: float64
ts.asfreq('2H')
2011-01-01 00:00:00 -1.065583
2011-01-01 02:00:00 -0.554193
2011-01-01 04:00:00 0.534045
2011-01-01 06:00:00 0.196573
2011-01-01 08:00:00 -0.694384
Freq: 2H, dtype: float64
ts.resample('2H').sum()
2011-01-01 00:00:00 -1.652284
2011-01-01 02:00:00 -0.870797
2011-01-01 04:00:00 -0.230756
2011-01-01 06:00:00 0.398216
2011-01-01 08:00:00 -0.138405
Freq: 2H, dtype: float64
What is the difference between .resample() and .asfreq()?
- asfreq() : 采样时间点的value
- resample() : 采样时间段内value