resample 重采样

最新推荐文章于 2024-03-29 03:49:58 发布

Lonelypatients°

最新推荐文章于 2024-03-29 03:49:58 发布

阅读量1.5k

点赞数

本文链接：https://blog.csdn.net/return_li/article/details/114142589

版权

resample采样统称:重采样

重采样是按照频率划分:降采样和升采样

#降采集

import pandas as pd

data_index = pd.data_range('20190701',periods=12)
DataSeries = pd.Series(index=data_index,data=np.arange(1,13))
data_5d = DataSeries.resample('5D').sum() #五个数据为一组,每一组进行求和  D:天
#需要填充或不需要 必须重新对数据进行采样
data_5d.resample('5D').ffill()

注释:

pd.data_range()函数: 该函数主要用于生成一个固定频率的时间索引，在调用构造方法时，必须指定start、end、periods中的两个参数值，否则报错。
start: 生成时间索引的开始位置
end: 结束位置
periods: 生成时间索引的长度区间 int
pd.Series: 将Data_index作为索引 生成一个series类型(一维)
index:指定索引
data: 指定每个索引所对应的内容
np.arange()函数: 生成区间数
resample()函数:对指定的数据进行采样,
rule:使用怎样的方式去采集 ("年:A"-"月:M"-"日:D"-"时:H"-"分:T"-"秒:S"-"周:w")他们之间可以相互组合.前面可以都带int,例如:5天(5D)
convention=: start或end  默认end 
sum():求和--->也可以在resample中进行 how='sum', 但是how 已经过时了...
ffill(): 存在Nan 空值时 用它的前一位的值填充Nan
bfill(): 存在Nan 空值时 用它的后一位的值填充Nan
asfreq(): 不进行填充NAn

输出结果:

>>>>
2019-07-01     1
2019-07-02     2
2019-07-03     3
2019-07-04     4
2019-07-05     5
2019-07-06     6
2019-07-07     7
2019-07-08     8
2019-07-09     9
2019-07-10    10
2019-07-11    11
2019-07-12    12
Freq: D, dtype: int32
```**