29Python时间序列分析（美国消费者信心指数及维基百科点击量EDA，含实例数据）

最新推荐文章于 2023-07-05 01:02:26 发布

小食青年

最新推荐文章于 2023-07-05 01:02:26 发布

阅读量5.6k

点赞数 12

文章标签： python 数据分析深度学习人工智能

本文链接：https://blog.csdn.net/qq_45425321/article/details/105351095

版权

本文介绍了Python中用于时间序列分析的pandas库，包括如何生成时间序列、数据重采样（降采样和升采样）以及滑动窗口操作。还探讨了ARIMA模型，强调了数据平稳性的重要性，并展示了如何通过ACF和PACF图表确定模型参数。最后，通过美国消费者信心指数和维基百科点击量的案例，演示了实际数据分析过程。

摘要由CSDN通过智能技术生成

唐宇迪《python数据分析与机器学习实战》学习笔记
29Python时间序列分析

一、pandas生成时间序列

常见的时间序列：时间戳（timestamp）：具体时间点2020.4.6的20：58的15秒
固定周期（period）
时间间隔（interval）
创建时间序列，最简单函数：date_range H：小时、D:天、M：月

#时间的几种表达方式：2016 Jul 1 ，7/1/2016，2016-07-01，2016/07/01
rng = pd.date_range('2016/07/01',periods=10,freq='D')#频率默认按天加，3D就3天加
#这里也用(起始时间、终止时间、频率）的方式创建，
rng

DatetimeIndex([‘2016-07-01’, ‘2016-07-02’, ‘2016-07-03’, ‘2016-07-04’, ‘2016-07-05’, ‘2016-07-06’, ‘2016-07-07’, ‘2016-07-08’, ‘2016-07-09’, ‘2016-07-10’],dtype=‘datetime64[ns]’, freq=‘D’)

将时间作为索引，这样方便之后通过时间就拿出数据或者对数据进行切片

time = pd.Series(np.random.randn(10),
                index = pd.date_range("2016-1-1",periods=10))
time

2016-01-01 -0.070435
2016-01-02 -0.916814
2016-01-03 0.370355
2016-01-04 0.215701
2016-01-05 -0.266909
2016-01-06 -0.476030
2016-01-07 -0.600339
2016-01-08 0.792472
2016-01-09 -0.074305
2016-01-10 1.772727
Freq: D, dtype: float64

truncate过滤
before这里2016-1-3之前的数据都不要了，也可以用after操作。

time.truncate(before='2016-1-5')

2016-01-05 -0.266909
2016-01-06 -0.476030
2016-01-07 -0.600339
2016-01-08 0.792472
2016-01-09 -0.074305
2016-01-10 1.772727
Freq: D, dtype: float64

时间戳的指定：

print(pd.Timestamp("2016-07-1"))
print(pd.Timestamp("2016-07-1 10"))
print(pd.Timestamp("2016-07-1 10:15:1"))

2016-07-01 00:00:00
2016-07-01 10:00:00
2016-07-01 10:15:01

时间区间的指定：

pd.Period("2016-07")

Period(‘2016-07’, ‘M’)

pd.Period("2016-07-1 10:5:1")

Period(‘2016- 07-01 10:05:01’, ‘S’)

时间加减：

pd.Timestamp('2016-01-01 10:10')+pd.Timedelta('1 day')

Timestamp(‘2016-01-02 10:10:00’)

时间戳和时间周期的转换及区别

ts = pd.Series(range(10),pd.date_range('07-10-16 8:00',periods=10,freq='H'))
ts_period = ts.to_period()
ts_period

2016-07-10 08:00 0
2016-07-10 09:00 1
2016-07-10 10:00 2
2016-07-10 11:00 3
2016-07-10 12:00 4
2016-07-10 13:00 5
2016-07-10 14:00 6
2016-07-10 15:00 7
2016-07-10 16:00 8
2016-07-10 17:00 9
Freq: H, dtype: int64

print(ts_period['2016-07-10 08:30':'2016-07-10 11:45'])
print('————————')
print(ts['2016-07-10 08:30':'2016-07-10 11:45'])

2016-07-10 08:00 0
2016-07-10 09:00 1
2016-07-10 10:00 2
2016-07-10 11:00 3
Freq: H, dtype: int64
————————
2016-07-10 09:00:00 1
2016-07-10 10:00:00 2
2016-07-10 11:00:00 3
Freq: H, dtype: int64

二、数据重采样（多角度多维度分析数据）

时间数据由一个频率转换到另外一个频率。例如将天变换为月，为降采样。降月变为天，为升采样。
这里重采样直接使用resample操作

#升采样
rng = pd.date_range("1/1/2011",periods=90,freq='D')
ts = pd.Series(np.random.randn(len(rng)),index=rng)
print(ts.head())
print('——————————')
print(ts.resample('M').sum()) #可以按月展示和，也可mean展示均值
print('——————————')
print(ts.resample('3D').sum().head())

2011-01-01 -0.054447
2011-01-02 2.114083
2011-01-03 0.593211
2011-01-04 0.923929
2011-01-05 -0.480473
Freq: D, dtype: float64
——————————
2011-01-31 4.719470
2011-02-28 2.434292
2011-03-31 1.882250
Freq: M, dtype: float64
——————————
2011-01-01 2.652847
2011-01-04 2.060122
2011-01-07 -2.649867
2011-01-10 -2.167812
2011-01-13 0.268306
Freq: 3D, dtype: float64

升采样比如3天综合数据扩展为每天的数据，会造成空值，这时就需要指定升采样策略了，插值填充。

day3Ts = ts.resample('3D').mean()
print(day3Ts.resample('D').asfreq().head(6))

2011-01-01 0.884282
2011-01-02 NaN
2011-01-03 NaN
2011-01-04 0.686707
2011-01-05 NaN
2011-01-06 NaN
Freq: D, dtype: float64

升采样插值方法： ffill（空值取前面的值）、bfill（空值取后面的值）、interpolate(线性取值)

#前值填充，只对一个值填充
print(day3Ts.resample('D').ffill(1).head(6))
print("——————————————")
#前值填充，两个值都填充
print(day3Ts.resample('D').ffill(2).head(6))
print("——————————————")
#后值填充
print(day3Ts.resample('D').bfill(2).head(6))
print('——————————————')
#线性拟合填充，将原始点连线，然后线上取点插值
print(day3Ts.resample('D').interpolate('linear').head())

2011-01-01 0.884282
2011-01-02 0.884282
2011-01-03 NaN
2011-01-04 0.686707
2011-01-05 0.686707
2011-01-06 NaN
Freq: D, dtype: float64
——————————————
2011-01-01 0.884282
2011-01-02 0.884282
2011-01-03 0.884282
2011-01-04 0.686707
2011-01-05 0.686707
2011-01-06 0.686707
Freq: D, dtype: float64
———————————————
2011-01-01 0.884282
2011-01-02 0.686707
2011-01-03 0.686707
2011-01-04 0.686707
2011-01-05 -0.883289
2011-01-06 -0.883289
Freq: D, dtype: float64
——————————————
2011-01-01 0.884282
2011-01-02 0.818424
2011-01-03 0.752566
2011-01-04 0.686707
2011-01-05 0.163375
Freq: D, dtype: float64

三、滑动窗口

   假设有一份数据(2016~2017年365天365个点)，现在想了解2016/2/

最低0.47元/天解锁文章

小食青年

关注

12
点赞
踩
61

收藏

觉得还不错? 一键收藏
4
评论
29Python时间序列分析（美国消费者信心指数及维基百科点击量EDA，含实例数据）

唐宇迪《python数据分析与机器学习实战》学习笔记29Python时间序列分析一、pandas生成时间序列常见的时间序列：时间戳（timestamp）：具体时间点2020.4.6的20：58的15秒固定周期（period）时间间隔（interval）创建时间序列，最简单函数：date_range H：小时、D:天、M：月#时间的几种表达方式：2016 Jul 1 ，7/1/20...
复制链接

扫一扫