一、时间周期类型
时间戳(timestamp)
某一个时间点
时间周期(period)
天、年、月、季度等表示一定周期的时间
时间间隔(interval)
结束时间与起止时间的差值
二、创建时间序列
1、pandas中的date_range方法创建时间序列,periods表示要生成的时间序列个数,freq为时间周期。
python
import pandas as pd
import numpy as np
data2020=pd.date_range('2020/01/15',periods=12,freq='M')
data2020
DatetimeIndex([‘2020-01-31’, ‘2020-02-29’, ‘2020-03-31’, ‘2020-04-30’,
‘2020-05-31’, ‘2020-06-30’, ‘2020-07-31’, ‘2020-08-31’,
‘2020-09-30’, ‘2020-10-31’, ‘2020-11-30’, ‘2020-12-31’],
dtype=‘datetime64[ns]’, freq=‘M’)
2、按照对时间周期进行处理后生成,例如3天:3D
data=pd.date_range('2020/01/15',periods=20,freq='3D')
data
DatetimeIndex([‘2020-01-15’, ‘2020-01-18’, ‘2020-01-21’, ‘2020-01-24’,
‘2020-01-27’, ‘2020-01-30’, ‘2020-02-02’, ‘2020-02-05’,
‘2020-02-08’, ‘2020-02-11’, ‘2020-02-14’, ‘2020-02-17’,
‘2020-02-20’, ‘2020-02-23’, ‘2020-02-26’, ‘2020-02-29’,
‘2020-03-03’, ‘2020-03-06’, ‘2020-03-09’, ‘2020-03-12’],
dtype=‘datetime64[ns]’, freq=‘3D’)
3、输入起止时间与频率生成 时间序列
pd.date_range('2016/01/01','2018/1/1',freq='M')
DatetimeIndex([‘2016-01-31’, ‘2016-02-29’, ‘2016-03-31’, ‘2016-04-30’,
‘2016-05-31’, ‘2016-06-30’, ‘2016-07-31’, ‘2016-08-31’,
‘2016-09-30’, ‘2016-10-31’, ‘2016-11-30’, ‘2016-12-31’,
‘2017-01-31’, ‘2017-02-28’, ‘2017-03-31’, ‘2017-04-30’,
‘2017-05-31’, ‘2017-06-30’, ‘2017-07-31’, ‘2017-08-31’,
‘2017-09-30’, ‘2017-10-31’, ‘2017-11-30’, ‘2017-12-31’],
dtype=‘datetime64[ns]’, freq=‘M’)
4、案例,以时间序列为索引
time=pd.Series(np.random.randn(20),index=pd.date_range('2016-01-01',periods=20))
print(time)
2016-01-01 -0.448743
2016-01-02 -1.547889
2016-01-03 1.497664
2016-01-04 0.231215
2016-01-05 0.162859
2016-01-06 0.592765
2016-01-07 -0.212843
2016-01-08 0.290066
2016-01-09 0.071178
2016-01-10 0.560857
2016-01-11 0.143158
2016-01-12 -0.480019
2016-01-13 -0.468277
2016-01-14 -0.336452
2016-01-15 0.423408
2016-01-16 1.401993
2016-01-17 -0.130837
2016-01-18 1.289786
2016-01-19 -0.109606
2016-01-20 -0.659027
Freq: D, dtype: float64
根据时间序列可对数据进行切片处理
例如:读取2016-01-10后的数据
time['2016-01-10':]
2016-01-10 0.560857
2016-01-11 0.143158
2016-01-12 -0.480019
2016-01-13 -0.468277
2016-01-14 -0.336452
2016-01-15 0.423408
2016-01-16 1.401993
2016-01-17 -0.130837
2016-01-18 1.289786
2016-01-19 -0.109606
2016-01-20 -0.659027
Freq: D, dtype: float64
三、truncate过滤操作
before:截取指定时间后的数据da
after:截取指定时间前的数据
time.truncate(before='2016-01-10')
2016-01-10 2.254659
2016-01-11 -0.087015
2016-01-12 -0.215592
2016-01-13 2.065045
2016-01-14 0.356837
2016-01-15 0.328997
2016-01-16 0.065772
2016-01-17 -1.089229
2016-01-18 1.009450
2016-01-19 1.189240
2016-01-20 0.190670
Freq: D, dtype: float64
time.truncate(after='2016-01-10')
2016-01-01 1.209789
2016-01-02 1.383691
2016-01-03 -0.168758
2016-01-04 -0.403158
2016-01-05 0.357184
2016-01-06 1.129640
2016-01-07 -0.586380
2016-01-08 1.141797
2016-01-09 -0.228825
2016-01-10 2.254659
Freq: D, dtype: float64
四、时间戳
可以指定到固定时间
pd.Timestamp('2020-07-10')
Timestamp(‘2020-07-10 00:00:00’)
#指定到固定时间
pd.Timestamp('2020-07-10 10')
Timestamp(‘2020-07-10 10:00:00’)
#指定时间区间
pd.Period('2020-12')
Period(‘2020-12’, ‘M’)
pd.Period('2020/2/14')
Period(‘2020-02-14’, ‘D’)
#time offsets
pd.Timedelta('1 day')
Timedelta(‘1 days 00:00:00’)
五、时间偏移量计算
使用timedelta(时间周期)进行偏移量计算
pd.Period('2020-02-14 10:10')+pd.Timedelta('1 day')
Period(‘2020-02-15 10:10’, ‘T’)
pd.Period('2020-02-14 10:10')+pd.Timedelta('1D1H')
Period(‘2020-02-15 11:10’, ‘T’)
创建时间戳 period_range函数
#创建时间戳
p1=pd.period_range('07-10-16 8:00',periods=10,freq='H')
p1
PeriodIndex([‘2016-07-10 08:00’, ‘2016-07-10 09:00’, ‘2016-07-10 10:00’,
‘2016-07-10 11:00’, ‘2016-07-10 12:00’, ‘2016-07-10 13:00’,
‘2016-07-10 14:00’, ‘2016-07-10 15:00’, ‘2016-07-10 16:00’,
‘2016-07-10 17:00’],
dtype=‘period[H]’, freq=‘H’)
五、数据重采样
- 时间从一个频率切换到另外一个频率
- 降采样
- 升采样
降采样
rng=pd.date_range('1/1/2010',periods=90,freq='D')
ts=pd.Series(np.random.randn(len(rng)),index=rng)
ts.head()
2011-01-01 0.658997
2011-01-02 -0.134594
2011-01-03 2.559234
2011-01-04 1.289609
2011-01-05 0.616857
Freq: D, dtype: float64
按月汇总(sum)/平均值(mean)
ts.resample('M').sum()
2011-01-31 7.616693
2011-02-28 -2.844589
2011-03-31 -6.117556
Freq: M, dtype: float64
按3天汇总
ts3=ts.resample('3D').sum()
ts3
2011-01-01 3.083638
2011-01-04 0.538523
2011-01-07 -0.254472
2011-01-10 -1.186759
…
升采样
由于数据升采样过程中(如由按3天到按每天统计)会产生空值,所以引入插入方法补充空值
- ffill:空值取前面的值
- bfill:空值取后面的值
- interpolate:线性取值
#取前面一个值填充
ts3.resample('D').ffill(1)
2011-01-01 3.083638
2011-01-02 3.083638
2011-01-03 NaN
2011-01-04 0.538523
2011-01-05 0.538523
…
2011-03-25 NaN
2011-03-26 -0.440214
2011-03-27 -0.440214
2011-03-28 NaN
2011-03-29 -0.470015
Freq: D, Length: 88, dtype: float64
#后补齐,bfill(limit),括号里可以加上参数,不加默认为1
ts3.resample('D').bfill(2)
2011-01-01 3.083638
2011-01-02 0.538523
2011-01-03 0.538523
2011-01-04 0.538523
2011-01-05 -0.254472
…
2011-03-25 -0.440214
2011-03-26 -0.440214
2011-03-27 -0.470015
2011-03-28 -0.470015
2011-03-29 -0.470015
Freq: D, Length: 88, dtype: float64
#使用线性函数补齐,降两点连成一条直线
ts3.resample('D').interpolate()
2011-01-01 3.083638
2011-01-02 2.235266
2011-01-03 1.386895
2011-01-04 0.538523
2011-01-05 0.274192
…
2011-03-25 -0.743012
2011-03-26 -0.440214
2011-03-27 -0.450148
2011-03-28 -0.460082
2011-03-29 -0.470015
Freq: D, Length: 88, dtype: float64
六、滑动窗口
统计窗口的固定步长的数值。窗口不断移动,取窗口内10个值得平均值
生成随机数
rng=pd.date_range('1/1/2020',freq='D',periods=600)
df=pd.DataFrame(np.random.randn(600),index=rng)
df
#指定长度为10的窗口
r=df.rolling(window=10)
r
Rolling [window=10,center=False,axis=0]
#窗口为10的长度的均值
r.mean()
0
2020-01-01 NaN
2020-01-02 NaN
2020-01-03 NaN
2020-01-04 NaN
2020-01-05 NaN
… …
2021-08-18 0.184945
2021-08-19 0.030524
2021-08-20 -0.051377
2021-08-21 -0.038885
2021-08-22 -0.066645
#窗口内的最大值
x=r.max().head(20)
2020-01-01 NaN
2020-01-02 NaN
2020-01-03 NaN
2020-01-04 NaN
2020-01-05 NaN
2020-01-06 NaN
2020-01-07 NaN
2020-01-08 NaN
2020-01-09 NaN
2020-01-10 1.889574
2020-01-11 1.889574
2020-01-12 1.217886
2020-01-13 1.217886
2020-01-14 2.097739
2020-01-15 2.097739
2020-01-16 2.097739
2020-01-17 2.097739
2020-01-18 2.097739
2020-01-19 2.097739
2020-01-20 2.097739
图形演示
df.plot(style='r--')
df.rolling(window=10).mean().plot(style='b')