Pandas学习笔记-时间序列

最新推荐文章于 2024-05-25 13:29:22 发布

bkzy

最新推荐文章于 2024-05-25 13:29:22 发布

阅读量1k

点赞数

分类专栏： Python 文章标签： python 数据分析 numpy

本文链接：https://blog.csdn.net/weixin_41621706/article/details/107869628

版权

Python 专栏收录该内容

8 篇文章 3 订阅

订阅专栏

Pandas学习笔记-时间序列

时间序列基础
含有重复索引的时间序列
- 聚合非唯一时间索引的数据
日期范围、频率和移位
时间区间和区间算术
- 使用period_range构造规则区间序列
- 区间频率转换
重新采样与频率转换
- 向下采样
- 向下采样
移动窗口滤波

import numpy as np
import pandas as pd
from datetime import datetime

时间序列基础

#示例数据准备
dates=[datetime(2020,1,2),datetime(2020,1,5),datetime(2020,1,7),datetime(2020,1,8),datetime(2020,1,10),datetime(2020,1,12)]
ts=pd.Series(np.random.randn(6),index=dates)
ts

2020-01-02    1.356709
2020-01-05    0.986863
2020-01-07    2.623709
2020-01-08    1.137484
2020-01-10   -0.050560
2020-01-12   -3.103191
dtype: float64

ts[::2]#每隔2个元素进行切片

2020-01-02    1.356709
2020-01-07    2.623709
2020-01-10   -0.050560
dtype: float64

不同索引的时间序列之间的算术运算在日期上自动对齐

ts+ts[::2]

2020-01-02    2.713418
2020-01-05         NaN
2020-01-07    5.247418
2020-01-08         NaN
2020-01-10   -0.101120
2020-01-12         NaN
dtype: float64

Pandas的时间戳

ts.index.dtype #时间戳为datetime64数据类型，纳秒级的分辨率

dtype('<M8[ns]')

#pandas的时间戳为Timestamp对象，还可以存储频率信息(如果有的话)
stamp=ts.index[0]
stamp

Timestamp('2020-01-02 00:00:00')

时间序列的索引、选择和子集

基于标签进行索引和选择

ts[stamp]

1.3567089293080754

基于日期字符串进行索引和选择

ts['20200102']

1.3567089293080754

ts['2020-1-2']

1.3567089293080754

ts['1/2/2020']

1.3567089293080754

基于时间范围的选择

longer_ts=pd.Series(np.random.randn(1000),index=pd.date_range('2000-1-1',periods=1000))
longer_ts

2000-01-01   -0.890305
2000-01-02   -1.249276
2000-01-03   -0.471520
2000-01-04   -1.565868
2000-01-05   -0.093146
                ...   
2002-09-22   -0.409223
2002-09-23   -0.905034
2002-09-24    2.135735
2002-09-25   -1.632820
2002-09-26    0.353677
Freq: D, Length: 1000, dtype: float64

#选择某一年的数据
longer_ts['2001']

2001-01-01   -0.486395
2001-01-02    2.311954
2001-01-03    0.895244
2001-01-04   -1.074664
2001-01-05    0.677931
                ...   
2001-12-27    0.501836
2001-12-28   -1.063273
2001-12-29   -2.059671
2001-12-30    0.837781
2001-12-31   -0.194221
Freq: D, Length: 365, dtype: float64

#选择某一月的数据
longer_ts['2001-05']

2001-05-01    0.283530
2001-05-02   -2.267712
2001-05-03    0.525122
2001-05-04    0.242721
2001-05-05   -0.741405
2001-05-06   -0.616603
2001-05-07   -1.251290
2001-05-08    0.769645
2001-05-09   -0.845802
2001-05-10    0.538778
2001-05-11   -0.213336
2001-05-12    2.654860
2001-05-13    1.233259
2001-05-14    1.305880
2001-05-15    1.084824
2001-05-16    1.524103
2001-05-17    0.932347
2001-05-18    2.043876
2001-05-19   -0.709258
2001-05-20   -0.372154
2001-05-21   -2.402400
2001-05-22   -0.863163
2001-05-23    0.854118
2001-05-24    0.746379
2001-05-25    0.721601
2001-05-26   -2.334035
2001-05-27    1.729615
2001-05-28   -1.442796
2001-05-29    1.914014
2001-05-30   -0.183838
2001-05-31    1.250308
Freq: D, dtype: float64

#使用datetime对象进行切片选择
longer_ts[datetime(2001,5,25):datetime(2001,6,3)]

2001-05-25    0.721601
2001-05-26   -2.334035
2001-05-27    1.729615
2001-05-28   -1.442796
2001-05-29    1.914014
2001-05-30   -0.183838
2001-05-31    1.250308
2001-06-01    2.921073
2001-06-02    0.227493
2001-06-03    0.925846
Freq: D, dtype: float64

#使用不包含在时间序列中的时间戳进行切片
print("时间序列数据ts:\n",ts)
ts['2020-1-6':'2020-1-11']

时间序列数据ts:
 2020-01-02    1.356709
2020-01-05    0.986863
2020-01-07    2.623709
2020-01-08    1.137484
2020-01-10   -0.050560
2020-01-12   -3.103191
dtype: float64





2020-01-07    2.623709
2020-01-08    1.137484
2020-01-10   -0.050560
dtype: float64

含有重复索引的时间序列

dates=pd.DatetimeIndex(['2000-1-1','2000-1-2','2000-1-2','2000-1-2','2000-1-3'])
dup_ts=pd.Series(np.arange(5),index=dates)
print(dup_ts)

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-01-02    3
2000-01-03    4
dtype: int32

dup_ts.index.is_unique#检验索引是否唯一

False

dup_ts['2000-1-3'] #检索不重复的索引，输出为标量值

dup_ts['2000-1-2'] #检索重复的索引，输出为切片

2000-01-02    1
2000-01-02    2
2000-01-02    3
dtype: int32

聚合非唯一时间索引的数据

grouped=dup_ts.groupby(level=0)
grouped.mean() #以均值方式聚合重复索引

2000-01-01    0
2000-01-02    2
2000-01-03    4
dtype: int32

grouped.count()#以计数方式聚合重复索引

2000-01-01    1
2000-01-02    3
2000-01-03    1
dtype: int64

日期范围、频率和移位

生成日期范围

指定开始时间和结束时间

index=pd.date_range('2020-02-01','2020-04-01') #指定开始时间和结束时间
print(index)

DatetimeIndex(['2020-02-01', '2020-02-02', '2020-02-03', '2020-02-04',
               '2020-02-05', '2020-02-06', '2020-02-07', '2020-02-08',
               '2020-02-09', '2020-02-10', '2020-02-11', '2020-02-12',
               '2020-02-13', '2020-02-14', '2020-02-15', '2020-02-16',
               '2020-02-17', '2020-02-18', '2020-02-19', '2020-02-20',
               '2020-02-21', '2020-02-22', '2020-02-23', '2020-02-24',
               '2020-02-25', '2020-02-26', '2020-02-27', '2020-02-28',
               '2020-02-29', '2020-03-01', '2020-03-02', '2020-03-03',
               '2020-03-04', '2020-03-05', '2020-03-06', '2020-03-07',
               '2020-03-08', '2020-03-09', '2020-03-10', '2020-03-11',
               '2020-03-12', '2020-03-13', '2020-03-14', '2020-03-15',
               '2020-03-16', '2020-03-17', '2020-03-18', '2020-03-19',
               '2020-03-20', '2020-03-21', '2020-03-22', '2020-03-23',
               '2020-03-24', '2020-03-25', '2020-03-26', '2020-03-27',
               '2020-03-28', '2020-03-29', '2020-03-30', '2020-03-31',
               '2020-04-01'],
              dtype='datetime64[ns]', freq='D')

指定开始时间和时间长度

pd.date_range(start='2020-02-01',periods=20) #指定开始时间和时间长度

DatetimeIndex(['2020-02-01', '2020-02-02', '2020-02-03', '2020-02-04',
               '2020-02-05', '2020-02-06', '2020-02-07', '2020-02-08',
               '2020-02-09', '2020-02-10', '2020-02-11', '2020-02-12',
               '2020-02-13', '2020-02-14', '2020-02-15', '2020-02-16',
               '2020-02-17', '2020-02-18', '2020-02-19', '2020-02-20'],
              dtype='datetime64[ns]', freq='D')

指定结束时间和时间长度

pd.date_range(end='2020-03-01',periods=20) #指定结束时间和时间长度

DatetimeIndex(['2020-02-11', '2020-02-12', '2020-02-13', '2020-02-14',
               '2020-02-15', '2020-02-16', '2020-02-17', '2020-02-18',
               '2020-02-19', '2020-02-20', '2020-02-21', '2020-02-22',
               '2020-02-23', '2020-02-24', '2020-02-25', '2020-02-26',
               '2020-02-27', '2020-02-28', '2020-02-29', '2020-03-01'],
              dtype='datetime64[ns]', freq='D')

指定开始时间和时间长度及频率

pd.date_range(start='2020-02-01',periods=20,freq="1h30min") #指定开始时间和时间长度及频率

DatetimeIndex(['2020-02-01 00:00:00', '2020-02-01 01:30:00',
               '2020-02-01 03:00:00', '2020-02-01 04:30:00',
               '2020-02-01 06:00:00', '2020-02-01 07:30:00',
               '2020-02-01 09:00:00', '2020-02-01 10:30:00',
               '2020-02-01 12:00:00', '2020-02-01 13:30:00',
               '2020-02-01 15:00:00', '2020-02-01 16:30:00',
               '2020-02-01 18:00:00', '2020-02-01 19:30:00',
               '2020-02-01 21:00:00', '2020-02-01 22:30:00',
               '2020-02-02 00:00:00', '2020-02-02 01:30:00',
               '2020-02-02 03:00:00', '2020-02-02 04:30:00'],
              dtype='datetime64[ns]', freq='90T')

基准时间含有时间信息，但需要标准化零时的时间戳

pd.date_range('2020-03-01 12:56:37',periods=10,normalize=True) #基准时间含有时间信息，但需要标准化零时的时间戳

DatetimeIndex(['2020-03-01', '2020-03-02', '2020-03-03', '2020-03-04',
               '2020-03-05', '2020-03-06', '2020-03-07', '2020-03-08',
               '2020-03-09', '2020-03-10'],
              dtype='datetime64[ns]', freq='D')

移位（前向和后向）日期

移位是指将日期按时间向前或者向后移动。Series和DataFrame都有一个shift方法用于进行简单的前向或者后向移位，而不改变索引。

ts=pd.Series(np.arange(4),index=pd.date_range('2020-1-1',periods=4,freq='M'))
ts

2020-01-31    0
2020-02-29    1
2020-03-31    2
2020-04-30    3
Freq: M, dtype: int32

#后移两位
ts.shift(2)

2020-01-31    NaN
2020-02-29    NaN
2020-03-31    0.0
2020-04-30    1.0
Freq: M, dtype: float64

#迁移两位
ts.shift(-2)

2020-01-31    2.0
2020-02-29    3.0
2020-03-31    NaN
2020-04-30    NaN
Freq: M, dtype: float64

#在频率一定的情况下,可以移动时间戳而不是数据
ts.shift(2,freq='M')

2020-03-31    0
2020-04-30    1
2020-05-31    2
2020-06-30    3
Freq: M, dtype: int32

#其他频率也可以传递
ts.shift(3,freq='D')

2020-02-03    0
2020-03-03    1
2020-04-03    2
2020-05-03    3
dtype: int32

ts.shift(1,freq='90T')#90T代表90分钟,也可以是90min

2020-01-31 01:30:00    0
2020-02-29 01:30:00    1
2020-03-31 01:30:00    2
2020-04-30 01:30:00    3
Freq: M, dtype: int32

使用偏置进行移位日期

from pandas.tseries.offsets import Day,MonthEnd
now=datetime.now()
now

datetime.datetime(2020, 8, 7, 15, 24, 29, 392792)

#增加3天
now+3*Day()

Timestamp('2020-08-10 15:24:29.392792')

#定位到月底（normalize=True可以定位到月底的0点）
now+MonthEnd()

Timestamp('2020-08-31 15:24:29.392792')

#定位到下一个月底的同一时间
now+MonthEnd(2)

Timestamp('2020-09-30 15:24:29.392792')

#使用rollforward和rollback分别显式地将日期向前或向后滚动
offset=MonthEnd()
offset.rollforward(now)

Timestamp('2020-08-31 15:24:29.392792')

offset.rollback(now)

Timestamp('2020-07-31 15:24:29.392792')

时间区间和区间算术

#2020年第一周(从周一开始算)
p=pd.Period(2020,freq='W')
p

Period('2019-12-30/2020-01-05', 'W-SUN')

#后一周
p+1

Period('2020-01-06/2020-01-12', 'W-SUN')

#前一周
p-1

Period('2019-12-23/2019-12-29', 'W-SUN')

使用period_range构造规则区间序列

rng=pd.period_range('2020-01-01','2020-07-01',freq='M')
rng

PeriodIndex(['2020-01', '2020-02', '2020-03', '2020-04', '2020-05', '2020-06',
             '2020-07'],
            dtype='period[M]', freq='M')

PeriodIndex类存储的是区间的序列，可以作为任意pandas数据结构的轴索引

pd.Series(np.random.randn(7),index=rng)

2020-01   -0.180939
2020-02    1.128570
2020-03    0.172954
2020-04    1.478395
2020-05   -0.892944
2020-06   -1.834816
2020-07   -0.436892
Freq: M, dtype: float64

区间频率转换

可以使用asfreq将区间和PreiodIndex对象转换为其他的频率

rng.asfreq("W")

PeriodIndex(['2020-01-27/2020-02-02', '2020-02-24/2020-03-01',
             '2020-03-30/2020-04-05', '2020-04-27/2020-05-03',
             '2020-05-25/2020-05-31', '2020-06-29/2020-07-05',
             '2020-07-27/2020-08-02'],
            dtype='period[W-SUN]', freq='W-SUN')

重新采样与频率转换

重新采样

重新采样是指将时间序列从一个频率转换为另一个频率的过程。

向下采样–将高频率的数据聚合到低频率

向上采样–将低频率转换到高频率

resample是pandas对象频率转换工具函数。
resample拥有类似于groupby的API,可以调用resample对数据分组，之后在调用聚合函数。下面是resample方法的参数说明。

参数	描述
rule	表明所需采样频率的字符串或者DateOffset对象(例如,‘M’,'5min’或Second(1))
axis=0	{0 or ‘index’, 1 or ‘columns’}, default 0 Which axis to use for up- or down-sampling. For `Series` this will default to 0, i.e. along the rows. Must be `DatetimeIndex`, `TimedeltaIndex` or `PeriodIndex`.
closed: Union[str, NoneType] = None	{‘right’, ‘left’}, default None Which side of bin interval is closed. The default is 'left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’.
label: Union[str, NoneType] = None	{‘right’, ‘left’}, default None Which bin edge label to label bucket with. The default is 'left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’.
convention: str = ‘start’	{‘start’, ‘end’, ‘s’, ‘e’}, default 'start’ For `PeriodIndex` only, controls whether to use the start or end of `rule`.
kind: Union[str, NoneType] = None	{‘timestamp’, ‘period’}, optional, default None Pass ‘timestamp’ to convert the resulting index to a `DateTimeIndex` or ‘period’ to convert it to a `PeriodIndex`. By default the input representation is retained.
loffset=None	timedelta, default None Adjust the resampled time labels.
base: int = 0	int, default 0 For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0.
on=None	str, optional For a DataFrame, column to use instead of index for resampling. Column must be datetime-like.
level=None	str or int, optional For a MultiIndex, level (name or number) to use for resampling. `level` must be datetime-like.

rng=pd.date_range('2000-01-01',periods=100,freq='D')
ts=pd.Series(np.random.randn(len(rng)),index=rng)
ts

2000-01-01   -0.906298
2000-01-02    0.199623
2000-01-03   -0.815801
2000-01-04   -0.947802
2000-01-05   -1.072683
                ...   
2000-04-05   -0.443355
2000-04-06   -0.038154
2000-04-07   -0.638983
2000-04-08    0.329782
2000-04-09    0.255591
Freq: D, Length: 100, dtype: float64

ts.resample('M').mean()

2000-01-31   -0.251985
2000-02-29   -0.172288
2000-03-31    0.261040
2000-04-30   -0.172212
Freq: M, dtype: float64

向下采样

向下采样是有些事情需要考虑：

每段间隔的哪一边是闭合的。
如何再建个的起始或者结束为止标记每个依据和的箱体。

rng=pd.date_range('2020-01-01',periods=12,freq='T')
ts=pd.Series(np.arange(12),index=rng)
ts

2020-01-01 00:00:00     0
2020-01-01 00:01:00     1
2020-01-01 00:02:00     2
2020-01-01 00:03:00     3
2020-01-01 00:04:00     4
2020-01-01 00:05:00     5
2020-01-01 00:06:00     6
2020-01-01 00:07:00     7
2020-01-01 00:08:00     8
2020-01-01 00:09:00     9
2020-01-01 00:10:00    10
2020-01-01 00:11:00    11
Freq: T, dtype: int32

#时间区间左开右闭，使用左边的时间标签作为轴标签
#默认closed='left',label='left'
#新时间戳00:00=原[00:00,00:01,00:02,00:03,00:04]
#新时间戳00:05=原[00:05,00:06,00:07,00:08,00:09]
#新时间戳00:10=原[00:10,00:11]
ts.resample('5min').sum()

2020-01-01 00:00:00    10
2020-01-01 00:05:00    35
2020-01-01 00:10:00    21
Freq: 5T, dtype: int32

#时间区间左开右闭，使用右边的时间标签作为轴标签
#新时间戳00:00=原[00:00]
#新时间戳00:05=原[00:01,00:02,00:03,00:04,00:05]
#新时间戳00:10=原[00:06,00:07,00:08,00:09,00:10]
#新时间戳00:15=原[00:11]
ts.resample('5min',closed='right',label='right').sum()

2020-01-01 00:00:00     0
2020-01-01 00:05:00    15
2020-01-01 00:10:00    40
2020-01-01 00:15:00    11
Freq: 5T, dtype: int32

#可以用loffset参数来调整时间戳的值
ts.resample('5min',closed='right',label='right',loffset='-1s').sum()

2019-12-31 23:59:59     0
2020-01-01 00:04:59    15
2020-01-01 00:09:59    40
2020-01-01 00:14:59    11
Freq: 5T, dtype: int32

向下采样

#准备测试数据
frame=pd.DataFrame(np.random.randn(2,4),
                   index=pd.date_range('2020-1-1',periods=2,freq='W-WED'),
                   columns=['科州','德州','新乡','鄂州'])
frame

	科州	德州	新乡	鄂州
2020-01-01	-0.230099	-0.221443	-0.000833	-0.057745
2020-01-08	-0.164813	-0.232991	-0.026346	0.828313

#将每周的数据转换为以日为频率的数据
#转换为高频后，间隙中产生缺失值
df_daily=frame.resample('D').asfreq()
df_daily

	科州	德州	新乡	鄂州
2020-01-01	-0.230099	-0.221443	-0.000833	-0.057745
2020-01-02	NaN	NaN	NaN	NaN
2020-01-03	NaN	NaN	NaN	NaN
2020-01-04	NaN	NaN	NaN	NaN
2020-01-05	NaN	NaN	NaN	NaN
2020-01-06	NaN	NaN	NaN	NaN
2020-01-07	NaN	NaN	NaN	NaN
2020-01-08	-0.164813	-0.232991	-0.026346	0.828313

#用fillna和reindex方法的填充和差值可以填充缺失值
frame.resample('D').ffill()

	科州	德州	新乡	鄂州
2020-01-01	-0.230099	-0.221443	-0.000833	-0.057745
2020-01-02	-0.230099	-0.221443	-0.000833	-0.057745
2020-01-03	-0.230099	-0.221443	-0.000833	-0.057745
2020-01-04	-0.230099	-0.221443	-0.000833	-0.057745
2020-01-05	-0.230099	-0.221443	-0.000833	-0.057745
2020-01-06	-0.230099	-0.221443	-0.000833	-0.057745
2020-01-07	-0.230099	-0.221443	-0.000833	-0.057745
2020-01-08	-0.164813	-0.232991	-0.026346	0.828313

#可以用limit参数限制填充的数量
frame.resample('D').ffill(limit=2)

	科州	德州	新乡	鄂州
2020-01-01	-0.230099	-0.221443	-0.000833	-0.057745
2020-01-02	-0.230099	-0.221443	-0.000833	-0.057745
2020-01-03	-0.230099	-0.221443	-0.000833	-0.057745
2020-01-04	NaN	NaN	NaN	NaN
2020-01-05	NaN	NaN	NaN	NaN
2020-01-06	NaN	NaN	NaN	NaN
2020-01-07	NaN	NaN	NaN	NaN
2020-01-08	-0.164813	-0.232991	-0.026346	0.828313

移动窗口滤波

#准备数据
import io
data="""
Time      Value                      
"2020-08-07 18:33:31"  1620.296021
"2020-08-07 18:33:32"  1613.534180
"2020-08-07 18:33:33"  1605.522705
"2020-08-07 18:33:35"  1594.238159
"2020-08-07 18:33:37"  1596.968994
"2020-08-07 18:33:39"  1603.658325
"2020-08-07 18:33:40"  1623.842773
"2020-08-07 18:33:42"  1623.842773
"2020-08-07 18:33:44"  1635.868896
"2020-08-07 18:33:46"  1641.348633
"2020-08-07 18:33:47"  1637.948730
"2020-08-07 18:33:49"  1638.491211
"2020-08-07 18:33:51"  1643.952759
"2020-08-07 18:33:53"  1651.620483
"2020-08-07 18:33:54"  1660.644775
"2020-08-07 18:33:56"  1660.644775
"2020-08-07 18:33:58"  1661.675415
"2020-08-07 18:34:00"  1657.371460
"2020-08-07 18:34:02"  1648.238647
"2020-08-07 18:34:03"  1640.046265
"2020-08-07 18:34:05"  1637.496460
"2020-08-07 18:34:09"  1651.855713
"2020-08-07 18:34:10"  1651.855713
"2020-08-07 18:34:12"  1654.098022
"2020-08-07 18:34:14"  1648.111938
"2020-08-07 18:34:16"  1640.751587
"2020-08-07 18:34:17"  1634.512451
"2020-08-07 18:34:19"  1635.977295
"2020-08-07 18:34:23"  1649.973145
"2020-08-07 18:34:24"  1649.973145
"2020-08-07 18:34:26"  1654.405273
"2020-08-07 18:34:28"  1656.955322
"2020-08-07 18:34:30"  1652.325562
"2020-08-07 18:34:31"  1652.325562
"""
df = pd.read_table(io.StringIO(data), delim_whitespace=True)
df=df.set_index(df['Time'])
#转换字符串索引为DatetimeIndex
df.index = pd.to_datetime(df.index)
df=df.drop('Time',axis=1)

	Value
Time
2020-08-07 18:33:31	1620.296021
2020-08-07 18:33:32	1613.534180
2020-08-07 18:33:33	1605.522705
2020-08-07 18:33:35	1594.238159
2020-08-07 18:33:37	1596.968994
2020-08-07 18:33:39	1603.658325
2020-08-07 18:33:40	1623.842773
2020-08-07 18:33:42	1623.842773
2020-08-07 18:33:44	1635.868896
2020-08-07 18:33:46	1641.348633
2020-08-07 18:33:47	1637.948730
2020-08-07 18:33:49	1638.491211
2020-08-07 18:33:51	1643.952759
2020-08-07 18:33:53	1651.620483
2020-08-07 18:33:54	1660.644775
2020-08-07 18:33:56	1660.644775
2020-08-07 18:33:58	1661.675415
2020-08-07 18:34:00	1657.371460
2020-08-07 18:34:02	1648.238647
2020-08-07 18:34:03	1640.046265
2020-08-07 18:34:05	1637.496460
2020-08-07 18:34:09	1651.855713
2020-08-07 18:34:10	1651.855713
2020-08-07 18:34:12	1654.098022
2020-08-07 18:34:14	1648.111938
2020-08-07 18:34:16	1640.751587
2020-08-07 18:34:17	1634.512451
2020-08-07 18:34:19	1635.977295
2020-08-07 18:34:23	1649.973145
2020-08-07 18:34:24	1649.973145
2020-08-07 18:34:26	1654.405273
2020-08-07 18:34:28	1656.955322
2020-08-07 18:34:30	1652.325562
2020-08-07 18:34:31	1652.325562

#按指定点为窗口滚动滤波
#由于开头的时候达不到设定的滤波数，会缺失值
df.Value.rolling(10).mean()

Time
2020-08-07 18:33:31            NaN
2020-08-07 18:33:32            NaN
2020-08-07 18:33:33            NaN
2020-08-07 18:33:35            NaN
2020-08-07 18:33:37            NaN
2020-08-07 18:33:39            NaN
2020-08-07 18:33:40            NaN
2020-08-07 18:33:42            NaN
2020-08-07 18:33:44            NaN
2020-08-07 18:33:46    1615.912146
2020-08-07 18:33:47    1617.677417
2020-08-07 18:33:49    1620.173120
2020-08-07 18:33:51    1624.016125
2020-08-07 18:33:53    1629.754358
2020-08-07 18:33:54    1636.121936
2020-08-07 18:33:56    1641.820581
2020-08-07 18:33:58    1645.603845
2020-08-07 18:34:00    1648.956714
2020-08-07 18:34:02    1650.193689
2020-08-07 18:34:03    1650.063452
2020-08-07 18:34:05    1650.018225
2020-08-07 18:34:09    1651.354675
2020-08-07 18:34:10    1652.144971
2020-08-07 18:34:12    1652.392724
2020-08-07 18:34:14    1651.139441
2020-08-07 18:34:16    1649.150122
2020-08-07 18:34:17    1646.433826
2020-08-07 18:34:19    1644.294409
2020-08-07 18:34:23    1644.467859
2020-08-07 18:34:24    1645.460547
2020-08-07 18:34:26    1647.151428
2020-08-07 18:34:28    1647.661389
2020-08-07 18:34:30    1647.708374
2020-08-07 18:34:31    1647.531128
Name: Value, dtype: float64

#按指定时间为窗口滚动滤波
#会从头开始进行滤波，不会缺失数据
df.Value.rolling("10s").mean()

Time
2020-08-07 18:33:31    1620.296021
2020-08-07 18:33:32    1616.915101
2020-08-07 18:33:33    1613.117635
2020-08-07 18:33:35    1608.397766
2020-08-07 18:33:37    1606.112012
2020-08-07 18:33:39    1605.703064
2020-08-07 18:33:40    1608.294451
2020-08-07 18:33:42    1608.012288
2020-08-07 18:33:44    1613.069987
2020-08-07 18:33:46    1620.921732
2020-08-07 18:33:47    1627.751688
2020-08-07 18:33:49    1633.557169
2020-08-07 18:33:51    1636.908834
2020-08-07 18:33:53    1641.538452
2020-08-07 18:33:54    1645.667765
2020-08-07 18:33:56    1648.883789
2020-08-07 18:33:58    1652.838236
2020-08-07 18:34:00    1655.984944
2020-08-07 18:34:02    1656.699259
2020-08-07 18:34:03    1654.770223
2020-08-07 18:34:05    1650.912170
2020-08-07 18:34:09    1647.001709
2020-08-07 18:34:10    1645.898560
2020-08-07 18:34:12    1647.070435
2020-08-07 18:34:14    1648.683569
2020-08-07 18:34:16    1649.334595
2020-08-07 18:34:17    1646.864237
2020-08-07 18:34:19    1644.217834
2020-08-07 18:34:23    1641.865283
2020-08-07 18:34:24    1642.237525
2020-08-07 18:34:26    1644.968262
2020-08-07 18:34:28    1649.456836
2020-08-07 18:34:30    1652.726489
2020-08-07 18:34:31    1652.659668
Name: Value, dtype: float64

#向上采样，以秒为周期
df_sec=df.resample('1s').ffill()
df_sec

	Value
Time
2020-08-07 18:33:31	1620.296021
2020-08-07 18:33:32	1613.534180
2020-08-07 18:33:33	1605.522705
2020-08-07 18:33:34	1605.522705
2020-08-07 18:33:35	1594.238159
...	...
2020-08-07 18:34:27	1654.405273
2020-08-07 18:34:28	1656.955322
2020-08-07 18:34:29	1656.955322
2020-08-07 18:34:30	1652.325562
2020-08-07 18:34:31	1652.325562

61 rows × 1 columns

#按时间滚动滤波
df_sec.Value.rolling("10s").mean()

Time
2020-08-07 18:33:31    1620.296021
2020-08-07 18:33:32    1616.915101
2020-08-07 18:33:33    1613.117635
2020-08-07 18:33:34    1611.218903
2020-08-07 18:33:35    1607.822754
                          ...     
2020-08-07 18:34:27    1643.715161
2020-08-07 18:34:28    1645.959448
2020-08-07 18:34:29    1648.057251
2020-08-07 18:34:30    1649.692078
2020-08-07 18:34:31    1651.326904
Freq: S, Name: Value, Length: 61, dtype: float64

df.plot()
df.Value.rolling("10s").mean().plot()
df_sec.plot()
df_sec.Value.rolling("10s").mean().plot()

<matplotlib.axes._subplots.AxesSubplot at 0x1487fc43550>

在这里插入图片描述

bkzy

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录