时间序列数据分析--Time Series--时间数据重采样

最新推荐文章于 2024-04-10 15:27:34 发布

bboysky45

最新推荐文章于 2024-04-10 15:27:34 发布

阅读量3.3k

点赞数 1

分类专栏： python 文章标签： Time Series

python 专栏收录该内容

59 篇文章 0 订阅

订阅专栏

重采样（resample）

将时间序列从一个频率转换到另一个频率的过程，需要使用聚合显示

pandas中的resample方法实现重采样

产生Resample对象
高频 -> 低频率
resample(freq).sum(),resample(freq).mean()............

#resample

import pandas as pd
import numpy as np

date_rng = pd.date_range('20170101', periods=100, freq='D')
ser_obj = pd.Series(range(len(date_rng)), index=date_rng)
print(ser_obj.head(10))
answer：
2017-01-01    0
2017-01-02    1
2017-01-03    2
2017-01-04    3
2017-01-05    4
2017-01-06    5
2017-01-07    6
2017-01-08    7
2017-01-09    8
2017-01-10    9
Freq: D, dtype: int64

# 统计每个月的数据总和
resample_month_sum = ser_obj.resample('M').sum()
# 统计每个月的数据平均
resample_month_mean = ser_obj.resample('M').mean()

print('按月求和：', resample_month_sum)
print('按月求均值：', resample_month_mean)
answer：
按月求和： 2017-01-31     465
2017-02-28    1246
2017-03-31    2294
2017-04-30     945
Freq: M, dtype: int64
按月求均值： 2017-01-31    15.0
2017-02-28    44.5
2017-03-31    74.0
2017-04-30    94.5
Freq: M, dtype: float64

降采样

将数据聚合到规整的低频率
OHLC重采样，open，high，low，close
使用groupby降采样

# 将数据聚合到5天的频率
five_day_sum_sample = ser_obj.resample('5D').sum()
five_day_mean_sample = ser_obj.resample('5D').mean()
five_day_ohlc_sample = ser_obj.resample('5D').ohlc()

print('降采样，sum..')
print(five_day_sum_sample.head())
降采样，sum..
2017-01-01     10
2017-01-06     35
2017-01-11     60
2017-01-16     85
2017-01-21    110
dtype: int64

print('降采样，ohlc')
print(five_day_ohlc_sample.head())
降采样，ohlc
            open  high  low  close
2017-01-01     0     4    0      4
2017-01-06     5     9    5      9
2017-01-11    10    14   10     14
2017-01-16    15    19   15     19
2017-01-21    20    24   20     24

# 使用groupby降采样
print(ser_obj.groupby(lambda x: x.month).sum())
answer
1     465
2    1246
3    2294
4     945
dtype: int32

print(ser_obj.groupby(lambda x: x.weekday).sum())
answer
0    750
1    665
2    679
3    693
4    707
5    721
6    735
dtype: int32

3.升采样

数据从低频至高频，需要插值，否则为NaN
常用的插值方法
- ffill(limit)，空值取前面limit个值填充
- bfill(limit)
- fillna('ffill‘）/ 'bfill'
- interpolate 插值算法

#升采样
df = pd.DataFrame(np.random.randn(5, 3),
                 index=pd.date_range('20170101', periods=5, freq='W-MON'),
                 columns=['S1', 'S2', 'S3'])
print(df)
answer
  S1        S2        S3
2017-01-02  0.087264 -0.047404 -0.754223
2017-01-09  1.148830  2.439266 -0.889873
2017-01-16  0.331767  0.918984  1.164783
2017-01-23 -0.582157  0.923737  1.938061
2017-01-30 -0.637087  0.143846 -1.500307

# 直接重采样会产生空值
print(df.resample('D').asfreq().head(10))
answer
S1        S2        S3
2017-01-02  0.003409 -0.939362  2.036451
2017-01-03       NaN       NaN       NaN
2017-01-04       NaN       NaN       NaN
2017-01-05       NaN       NaN       NaN
2017-01-06       NaN       NaN       NaN
2017-01-07       NaN       NaN       NaN
2017-01-08       NaN       NaN       NaN
2017-01-09  0.291274 -0.655332 -1.034041
2017-01-10       NaN       NaN       NaN
2017-01-11       NaN       NaN       NaN

#ffill
print(df.resample('D').ffill(2).head())
answer
S1        S2        S3
2017-01-02  0.003409 -0.939362  2.036451
2017-01-03  0.003409 -0.939362  2.036451
2017-01-04  0.003409 -0.939362  2.036451
2017-01-05       NaN       NaN       NaN
2017-01-06       NaN       NaN       NaN
2017-01-07       NaN       NaN       NaN
2017-01-08       NaN       NaN       NaN
2017-01-09  0.291274 -0.655332 -1.034041
2017-01-10  0.291274 -0.655332 -1.034041
2017-01-11  0.291274 -0.655332 -1.034041

print(df.resample('D').bfill())
print(df.resample('D').fillna('ffill'))
print(df.resample('D').interpolate('linear'))

bboysky45

关注

1
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
时间序列数据分析--Time Series--时间数据重采样

重采样（resample）将时间序列从一个频率转换到另一个频率的过程，需要使用聚合显示 pandas中的resample方法实现重采样产生Resample对象高频 -> 低频率 resample(freq).sum(),resample(freq).mean()............ #resampleimport pandas as pdimpor...
复制链接

扫一扫