Pandas的时间序列数据-resample重采样（31）

最新推荐文章于 2025-03-12 06:40:38 发布

†徐先森®

最新推荐文章于 2025-03-12 06:40:38 发布

阅读量1.6k

点赞数

分类专栏： Pandas总结文章标签： pandas重采样 pandas resample

本文链接：https://blog.csdn.net/qq_36622490/article/details/108600064

版权

在pandas里对时序的频率的调整称之重新采样，即从一个时频调整为另一个时频的操作，可以借助resample的函数来完成。有upsampling和downsampling(高频变低频)两种。resample后的数据类型有类似'groupby'的接口函数可以调用得到相关数据信息。时序数据经resample后返回Resamper Object，而Resampler 是定义在pandas.core.resample模块里的一个类，可以通过dir查看该类的一些接口函数。

liao@liao:~/md$ python
Python 2.7.12 (default, Nov 12 2018, 14:36:49) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> import pandas.core.resample as pcr
>>> dir(pcr.Resampler)
['__bytes__', ......, '_wrap_result', 'agg', 'aggregate', 'apply', 'asfreq', 'ax', 'backfill', 'bfill', 'count', 'ffill', 'fillna', 'first', 'get_group', 'groups', 'indices', 'interpolate', 'last', 'max', 'mean', 'median', 'min', 'ndim', 'nearest', 'ngroups', 'nunique', 'obj', 'ohlc', 'pad', 'pipe', 'plot', 'prod', 'sem', 'size', 'std', 'sum', 'transform', 'var']

可以看出有mean、pad、ohlc、std、fisrt、fillna等接口函数可以对resample后的数据进行处理

1 downsampling 下(降)采用处理

以高频时间序列变低频时间粒度变大数据聚合，原来有100个时间点，假设变为低频的10个点，那么会将原数据每10个数据组成一组(bucket)，原来是100个时间点，100个数据，现在是10个时间点，应该有10个数据，那么这10个数据应该是什么呢？可以对每组里的数据的均值mean，或组里的第一个值first、或最后一个last，最为重采样后的数据来进行下一步处理或....。这就是要借助resample后的数据类型调用相应的接口函数来取得。由于resample函数的参数众多，较为难理解，现在先做一个时序，如下图所示：

最低0.47元/天解锁文章