13.Pandas处理时间序列

最新推荐文章于 2024-04-15 23:54:40 发布

鸿神

最新推荐文章于 2024-04-15 23:54:40 发布

阅读量2.1k

点赞数 2

分类专栏： Pandas学习文章标签： python 数据分析 pandas

本文链接：https://blog.csdn.net/qq_45488242/article/details/107878551

版权

Pandas学习专栏收录该内容

14 篇文章 3 订阅

订阅专栏

Pandas处理时间序列

Pandas最初是为了处理金融模型而创建的,因此Pandas具有一些非常强大的日期,时间,带时间索引数据的处理工具

本章将介绍的日期和时间数据主要包含三类:

时间戳:时间戳表示某个具体的时间点,例如2020年7月30日晚上10点41分
时间间隔与周期:时间间隔值两个时间戳之间的时间长度,周期是具有相同长度,彼此不重叠的特殊的时间间隔
时间增量或持续时间:表示精确的时间间隔,例如一个程序的运行时间是24秒

下面将介绍Pandas中的三种日期 / 时间数据类型的具体用法

Python的日期与时间工具

在讲解Pandas的日期与时间工具之前首先讲解下Python的原生的日期和时间工具(包括除了Pandas之外的第三方库)

尽管Pandas提供的时间序列工具更加适合处理数据科学问题,但是了解Python标准库和其他时间序列工具将会大有裨益

原生Python的日期与时间工具:datetime与dateutil

Python原生的基本的日期和时间功能都在datetime标准库的datetime模块中,我们可以和第三方库dateutil结合就可以快速实现许多处理日期和时间的功能

创建日期

我们可以使用datetime模块中的datetime对象创建一个日期

from datetime import datetime


datetime_1=datetime(year=2020,month=7,day=30)
print(datetime_1)
>>>
2020-07-30 00:00:00

或者我们可以使用dateutil库的parser模块来对字符串格式的日期进行解析(但是只能用英美日期格式),使用parse函数解析将会得到一个datetime对象

from dateutil import parser

datetime_1=parser.parse('30th of July, 2020')
datetime_2=parser.parse('July 30th,2020')
print(datetime_1)
print(datetime_2)
>>>
2020-07-30 00:00:00
2020-07-30 00:00:00

指定输出

一旦我们具有了一个datetime对象,我们就可以通过datetime对象的strftime方法来指定输出的日期格式

from datetime import datetime
from dateutil import parser

datetime_1=parser.parse('30th of July, 2020')
print(datetime_1.strftime('%A'))
>>>
Thursday

这里我们通过标准字符串格式%A来指定输出当前datetime对象的星期

实际上Python的datetime和dateutil对象模块在灵活性和易用性上的表现非常出色,但是就像前面不断提起的,使用Python原生的datetime对象处理时间信息在面对大型数据集的时候性能没有Numpy中经过编码的日期类型数组性能好

Numpy的日期与时间工具:datetime64类型

Numpy团队为了解决Python原生数组的性能弱点开发了自己的时间序列类型,datetime64类型将日期编码为64位整数,这样能够让日期数组非常紧凑,从而节省内存.

Numpy创建日期数组

我们在创建Numpy的日期数组时,只需要指定数据类型为np.datetime64即可创建日期数组

date_1=np.array('2020-07-30',dtype=np.datetime64)
print(date_1)
print(type(date_1))
print(date_1.dtype)
>>>
2020-07-30
<class 'numpy.ndarray'>
datetime64[D]

Numpy日期数组的运算

Numpy数组的广播规则,对于日期数组也是成立的,只不过此时计算转变为日期之间的计算

date_1=np.array('2020-07-01',dtype=np.datetime64)
print(date_1)
print(date_1+np.arange(start=1,stop=12,step=1))
>>>
2020-07-01
['2020-07-02' '2020-07-03' '2020-07-04' '2020-07-05' '2020-07-06'
 '2020-07-07' '2020-07-08' '2020-07-09' '2020-07-10' '2020-07-11'
 '2020-07-12']

正式因为Numpy处理日期时将日期数组进行了编码(编码为datetime64),而且支持广播运算,因此在处理大型数据的时候将会很快

Numpy的datetime64对象

前面讲过,Numpy可以用日期数组来表示日期

其实也可以使用datetim64对象来表示,即使用64位精度来表示一个日期,这就使得datetime64对象最大可以表示2⁶⁴的基本时间单位

datetime64对象的创建

date_1=np.datetime64('2020-07-01')
print(date_1)
print(type(date_1))
>>>
2020-07-01
<class 'numpy.datetime64'>

datetime64对象的单位

datetime64对象所以如果我们指定日期的单位是纳秒的话,就能表示0~2⁶⁴纳秒的时间跨度

注意,日期的单位在没有指定的时候将会按照给定的日期来自动匹配,例如

date_1=np.datetime64('2020-07-01')									//自动匹配日期单位为天
date_2=np.datetime64('2020-07-01 12:00')							//自动匹配日期单位为分钟
date_3=np.datetime64('2020-07-01 12:00:00.500000000')				//自动匹配日期单位为纳秒

此外,我们也可以指定日期单位

date_1=np.datetime64('2020-07-01','ns')

指定日期时间单位的代码为

代码	含义	时间跨度
Y	年	-9.2e18~9.2e18年
M	月	-7.6e17~7.6e17年
W	周	-1.7e17~1.7e17年
D	日	-2.5e16~2.5e16年
h	时	-1.0e15~1.0e15年
m	分	-1.7e13~1.7e13年
s	秒	-2.9e12~2.9e12年
ms	毫秒	-2.9e9~2.9e9年
us	微秒	-2.9e6~2.9e6年
ns	纳秒	-292~292年
ps	皮秒	-106天~106天
fs	飞秒	-2.6小时-2.6小时
as	原秒	-9.2秒~9.2秒

其中日期的零点是按照1970年1月1日0点0分0秒来计算的

最后,虽然Numpy的datetime64对象弥补了Python原生的datetime对象的不足,但是却缺少了许多datetime,尤其是dateutil原本便捷的方法和函数,为此,解决日期和时间相关内容的最佳工具就是Pandas

Pandas的日期和时间工具

Pandas所有的日期和时间处理方法全部都是通过Timestamp对象实现的

Timestamp对象有机的结合了np.datetime64对象的有效存储和向量化接口将datetime和dateutil的易用性

创建Timestamp对象

我们可以使用to_datetime函数来创建一个Timestamp对象

Timestamp_1=pd.to_datetime('2020-07-01')
print(Timestamp_1)
print(type(Timestamp_1))
>>>
2020-07-01 00:00:00
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

调用datetime和dateutil的方法

我们可以直接将Timestamp对象视为datetime对象,然后直接调用dateutil和datetime中的方法

Timestamp_1=pd.to_datetime('2020-07-01')
print(Timestamp_1)
print(Timestamp_1.strftime('%A'))
>>>
2020-07-01 00:00:00
Wednesday

最后,Pandas通过一组Timestamp对象就可以创建一个能够作为DataFrame对象或者Index对象索引的DatetimeIndex对象

就像前面在讲解Pandas数据透视表的泰坦尼克号的例子中就已经有所展现

DatetimeIndex对象不仅仅具有Index对象的功能和特性,同时还具有用于日期处理的属性和方法

Pandas的时间序列:以时间作为索引

Pandas的时间序列攻击非常适合用于处理以时间戳为索引的数据

创建时间序列索引

和Index对象的创建类似,我们只需要直接创建即可

DateTimeIndex_1=pd.DatetimeIndex(['2020-07-01','2020-07-02','2020-07-03'])
print(DateTimeIndex_1)
print(type(DateTimeIndex_1))
>>>
DatetimeIndex(['2020-07-01', '2020-07-02', '2020-07-03'], dtype='datetime64[ns]', freq=None)
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>

指定时间序列为索引

和我们前面为Series和DataFrame对象指定索引一样,我们也指定DatetimeIndex对象为索引

DateTimeIndex_1=pd.DatetimeIndex(['2020-07-01','2020-07-02','2020-07-03'])
Series_1=pd.Series(np.random.randint(0,10,3),index=DateTimeIndex_1)
DataFrame_1=pd.DataFrame(np.random.randint(0,10,(3,4)),columns=list('ABCD'),index=DateTimeIndex_1)
print(Series_1)
print(DataFrame_1)
>>>
2020-07-01    8
2020-07-02    4
2020-07-03    3
dtype: int64
            A  B  C  D
2020-07-01  8  1  7  0
2020-07-02  9  0  2  1
2020-07-03  9  6  3  2

从上面的两个例子中我们不难看出,Pandas对于时间序列的处理非常的强大,下面就将讲解Pandas中针对不同时间信息给出的不同的对象

Pandas的时间序列对象

本节将介绍Pandas中用来处理时间序列的不同的对象

针对时间戳数据,Pandas提供了Timestamp对象,就像前面介绍的一样,Timestamp是Python原生的datetime类的替代品,但是是基于性能更好的datetime64对象构建的.对应的以时间戳为索引的就是DatetimeIndex对象
针对时间周期数据,Pandas提供了Period对象.同样,Period对象也是基于datetime64对象将固定频率的时间间隔进行编码,对应的以周期为为索引的对象是PeriodIndex对象
针对时间增量或持续时间,Pandas提供了Timedelta类.Timedelta类是代替Python原生的datetime.timedelta类的高性能对象.同样Timedelta类是基于numpy的timedelta64对象.以Timedelta为索引的对象是TimedeltaIndex

前面已经讲解过了Pandas的Timestamp对象的创建和DatetimeIndex对象的创建,下面将讲解Pandas中常用的to_datetime()函数

to_datetime()函数

Pandas的to_datetime()函数可以解析许多的日期与时间格式,例如Python原生的datetime类,英美日期时间格式等等

而且根据传入的值的不同,将会返回不同的对象,如果仅仅传递一个值,那么将会返回Timestamp对象,如果传递多个时间值,将会返回一个DatetimeIndex对象

datetime_1=datetime(year=2020,month=7,day=31)
day_1='4th of July,2020'
day_2='20200731'
day_3='31-07-2020'
Timestamp_1=pd.to_datetime(datetime_1)
Timestamp_2=pd.to_datetime(day_1)
Timestamp_3=pd.to_datetime(day_2)
Timestamp_4=pd.to_datetime(day_3)
Timestamp_5=pd.to_datetime([Timestamp_1,Timestamp_2,Timestamp_3,Timestamp_4])
print(Timestamp_1,'\t',type(Timestamp_1))
print(Timestamp_2,'\t',type(Timestamp_2))
print(Timestamp_3,'\t',type(Timestamp_3))
print(Timestamp_4,'\t',type(Timestamp_4))
print(Timestamp_5,'\t',type(Timestamp_5))
>>>
2020-07-31 00:00:00      <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2020-07-04 00:00:00      <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2020-07-31 00:00:00      <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2020-07-31 00:00:00      <class 'pandas._libs.tslibs.timestamps.Timestamp'>
DatetimeIndex(['2020-07-31', '2020-07-04', '2020-07-31', '2020-07-31'], dtype='datetime64[ns]', freq=None)       <class 'pandas.core.indexes.datetimes.DatetimeIndex'>

to_period()方法

注意,上面使用to_datetime()方法得到的DatetimeIndex对象的数据类型是datetime64[ns],

我们如果想要改变其中每一个值的编码方式,来扩大表示范围的话,可以调用DatetimeIndex对象的to_period()方法来指定

datetime_1=datetime(year=2020,month=7,day=31)
day_1='4th of July,2020'
day_2='20200731'
day_3=pd.Timestamp('31-07-2020')
DatetimeIndex_1=pd.to_datetime([datetime_1,day_1,day_2,day_3])
print(DatetimeIndex_1)
print(DatetimeIndex_1.to_period('D'),'\t',type(DatetimeIndex_1.to_period('D')))
>>>
DatetimeIndex(['2020-07-31', '2020-07-04', '2020-07-31', '2020-07-31'], dtype='datetime64[ns]', freq=None)

PeriodIndex(['2020-07-31', '2020-07-04', '2020-07-31', '2020-07-31'], dtype='period[D]', freq='D')       <class 'pandas.core.indexes.period.PeriodIndex'>

此外,当一个日期减去另外一个日期的时候,将会返回Timedelta对象

datetime_1=datetime(year=2020,month=7,day=1)
day_1='2th of July,2020'
day_2='20200703'
day_3=pd.Timestamp('07-04-2020')
DatetimeIndex_1=pd.to_datetime([datetime_1,day_1,day_2,day_3])
print(DatetimeIndex_1)
print(DatetimeIndex_1-DatetimeIndex_1[0],'\t',type(DatetimeIndex_1-DatetimeIndex_1[0]))
>>>
DatetimeIndex(['2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04'], dtype='datetime64[ns]', freq=None)
TimedeltaIndex(['0 days', '1 days', '2 days', '3 days'], dtype='timedelta64[ns]', freq=None)     <class 'pandas.core.indexes.timedeltas.TimedeltaIndex'>

date_range()函数

为了能够便捷的创建有规律的时间序列,Pandas提供了一些方法:

pd.date_range()可以处理时间戳
pd.period_range()可以处理周期
pd.timedelta_rang()可以处理时间间隔

类似于np.arange()函数,pd.date_range()函数通过开始日期,结束日期和频率代码来创建一个序列

Date_1=pd.date_range(start='20200701',end='20200712',freq='D')
Date_2=pd.date_range(start='20200701 12:00',end='20200701 20:00',freq='H')
print(Date_1)
print(Date_2)
>>>
DatetimeIndex(['2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04',
               '2020-07-05', '2020-07-06', '2020-07-07', '2020-07-08',
               '2020-07-09', '2020-07-10', '2020-07-11', '2020-07-12'],
              dtype='datetime64[ns]', freq='D')
DatetimeIndex(['2020-07-01 12:00:00', '2020-07-01 13:00:00',
               '2020-07-01 14:00:00', '2020-07-01 15:00:00',
               '2020-07-01 16:00:00', '2020-07-01 17:00:00',
               '2020-07-01 18:00:00', '2020-07-01 19:00:00',
               '2020-07-01 20:00:00'],
              dtype='datetime64[ns]', freq='H')

此外,我们也可以不指定结束日期,仅指定开始日期和周期数以及频率代码来创建一个时间序列

Date_1=pd.date_range(start='20200701',periods=8,freq='D')
Date_2=pd.date_range(start='20200701 12:00',periods=8,freq='H')
print(Date_1)
print(Date_2)
>>>
DatetimeIndex(['2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04',
               '2020-07-05', '2020-07-06', '2020-07-07', '2020-07-08'],
              dtype='datetime64[ns]', freq='D')
DatetimeIndex(['2020-07-01 12:00:00', '2020-07-01 13:00:00',
               '2020-07-01 14:00:00', '2020-07-01 15:00:00',
               '2020-07-01 16:00:00', '2020-07-01 17:00:00',
               '2020-07-01 18:00:00', '2020-07-01 19:00:00'],
              dtype='datetime64[ns]', freq='H')

period_range()与timedelta_range()函数

我们如果要创建有规律的周期或者时间间隔序列,可以使用period_range()或者timedelta_range()函数

Date_1=pd.period_range(start='20200701',periods=8,freq='D')
Date_2=pd.timedelta_range(start='20200701',periods=8,freq='D')
print(Date_1)
print(Date_2)
>>>
PeriodIndex(['2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04',
             '2020-07-05', '2020-07-06', '2020-07-07', '2020-07-08'],
            dtype='period[D]', freq='D')
TimedeltaIndex(['0 days 00:00:00.020200', '1 days 00:00:00.020200',
                '2 days 00:00:00.020200', '3 days 00:00:00.020200',
                '4 days 00:00:00.020200', '5 days 00:00:00.020200',
                '6 days 00:00:00.020200', '7 days 00:00:00.020200'],
               dtype='timedelta64[ns]', freq='D')

Pandas时间频率与偏移量

前面介绍的种种便捷生成时间序列的函数,我们都可以指定freq参数来指定生成的每个时间之间的间隔

实际上我们也可以组合各种时间频率来达到我们预期的时间间隔

首先给出所有freq参数支持的时间频率代码

代码	描述
D	天
W	周
M	月末
Q	季末
A	年末
H	小时
T	分钟
S	秒
L	毫秒
U	微秒
N	纳秒
B	天,仅含工作日
BM	月末,仅含工作日
BQ	季末,仅含工作日
BA	年末,仅含工作日
BH	工作时间
MS	月初
BMS	月初,仅含工作日
QS	季初
BQS	季初,仅含工作日
AS	年初
BAS	年初,仅含工作日

此外,我们还可以在频率代码后面加上三位月份缩写,来指定开始时间

例如: Q-JAN,BQ-FEB,AS-MAR

我们也可以将频率组合起来来创建新的周期

例如: 2H30T

Date_1=pd.date_range('20200701 8:00',periods=3,freq='QS-Apr')
Date_2=pd.date_range('20200701 8:00',periods=3,freq='2H30T')
Date_3=pd.date_range('20200701 8:00',periods=3,freq='1D2H30T')
print(Date_1)
print(Date_2)
print(Date_3)
>>>
DatetimeIndex(['2020-07-01 08:00:00', '2020-10-01 08:00:00',
               '2021-01-01 08:00:00'],
              dtype='datetime64[ns]', freq='QS-APR')
DatetimeIndex(['2020-07-01 08:00:00', '2020-07-01 10:30:00',
               '2020-07-01 13:00:00'],
              dtype='datetime64[ns]', freq='150T')
DatetimeIndex(['2020-07-01 08:00:00', '2020-07-02 10:30:00',
               '2020-07-03 13:00:00'],
              dtype='datetime64[ns]', freq='1590T')

Pandas的重新取样,迁移和窗口

下面将结合Google股价的历史变化来讲解Pandas的重新取样,迁移和窗口

获取Google的股价数据

pandas-datareader是一个基于Pandas的程序包,它可以从一些可靠的数据来源获取金融数据,包括Yahoo财经,Google财经即其他数据源

但是由于pandas-datareader中从Yahoo财经下载数据的Python脚本中出现了问题,因此往往会下载到错误的数据,所以我们还需要下载fix-yahoo-finance库来修复这个问题

import  pandas_datareader.data as web
import yfinance as yf
import datetime
yf.pdr_override()


google=web.get_data_yahoo('GOOGL',start='2004-08-19',end='2020-07-31',data_source='google')
print(google.head())
print(type(google))
>>>
[*********************100%***********************]  1 of 1 completed
                 Open       High        Low      Close  Adj Close    Volume
Date                                                                       
2004-08-19  50.050049  52.082081  48.028027  50.220219  50.220219  44659000
2004-08-20  50.555557  54.594593  50.300301  54.209209  54.209209  22834300
2004-08-23  55.430431  56.796795  54.579578  54.754753  54.754753  18256100
2004-08-24  55.675674  55.855854  51.836838  52.487488  52.487488  15247300
2004-08-25  52.532532  54.054054  51.991993  53.053055  53.053055   9188600
<class 'pandas.core.frame.DataFrame'>

由于我们每次运行都是现场从网上爬取数据,因此速度会比较慢,我们不妨先调用DataFrame对象的to_csv()方法将爬取到的数据保存在当前文件路径下,然后后面要用只需要调用即可,但是每次新读取之后都需要首先把索引改变为DatetimeIndex对象

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import  pandas_datareader.data as web
import yfinance as yf
import datetime
yf.pdr_override()


google=web.get_data_yahoo('GOOGL',start='2004-08-19',end='2020-07-31',data_source='google')
google.to_csv('GoogleStock.csv')

接下来就我们就将取其中的收盘价(截取DataFrame对象的列会返回一个Series对象),调用Series对象的plot方法来绘图

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import  pandas_datareader.data as web
import yfinance as yf
import datetime
yf.pdr_override()


google=web.get_data_yahoo('GOOGL',start='2004-08-19',end='2020-07-31',data_source='google')
print(google.head())
print(type(google))
google['Close'].plot()
plt.xlabel('Date')
plt.ylabel('Price')
plt.title('Price of the GOOGLE Stock')
plt.show()
>>>
[*********************100%***********************]  1 of 1 completed
                 Open       High        Low      Close  Adj Close    Volume
Date                                                                       
2004-08-19  50.050049  52.082081  48.028027  50.220219  50.220219  44659000
2004-08-20  50.555557  54.594593  50.300301  54.209209  54.209209  22834300
2004-08-23  55.430431  56.796795  54.579578  54.754753  54.754753  18256100
2004-08-24  55.675674  55.855854  51.836838  52.487488  52.487488  15247300
2004-08-25  52.532532  54.054054  51.991993  53.053055  53.053055   9188600
<class 'pandas.core.frame.DataFrame'>

在这里插入图片描述

重新取样与频率转换

我们在处理时间序列数据的时候,经常需要按照新的频率来对数据进行重新取样.例如上面我的Google股价是以天为频率的,我们如果想以月为频率的话,就需要每隔一个月进行重新取样

对于重新取样,我们可以调用resample()方法还活着asfreq()方法来完成,但是resample方法是以数据累计为基础,即我们对月进行重取样的结果是一个月的所有值的和,我们需要手动求平均;而asfreq()方法则是以数据选择为基础,即选取上个月的最后一个值

下面我们将使用两种方法对数据进行向后采样(down-sample),这里不是降采样,而是向后采样

google=pd.read_csv('GoogleStock.csv')
DatetimeIndex_1=pd.to_datetime(google['Date'])
google.index=DatetimeIndex_1
del google['Date']
google_close=google['Close']
google_close.plot(alpha=0.5,style='g-')
google_close.resample('BA').mean().plot(style='b:')
google_close.asfreq('BA').plot(style='r--')
plt.legend(['Original Curve','Resample Curve','Asfreq Curve'],loc='upper left')
plt.xlabel('Date')
plt.ylabel('Price')
plt.title('Price of the GOOGLE Stock')
plt.show()

在这里插入图片描述

接下来我们再进行向前取样(up-sample)由于向前取样时会出现缺失值,所以asfreq()方法中有一个参数method来指定填充缺失值的方法

下面我们对工作日数据按天进行重取样,然后比较向前和向后填充,这里我们用到了在matplotlib中将会讲解的ax等内容.

google=pd.read_csv('GoogleStock.csv')
DatetimeIndex_1=pd.to_datetime(google['Date'])
google.index=DatetimeIndex_1
del google['Date']
google_close=google['Close']
fig, ax=plt.subplots(2,sharex=True)

data=google_close.iloc[:10]
data.asfreq('D').plot(ax=ax[0],marker='o')
data.asfreq('D',method='bfill').plot(ax=ax[1],style='-o')
data.asfreq('D',method='ffill').plot(ax=ax[1],style='--o')
ax[1].legend(['Back-fill','Forward-fill'])
plt.show()

在这里插入图片描述

时间迁移

Pandas中另外一种常用的时间序列操作就是时间迁移,时间迁移指的就是将数据对应的时间进行改变

Pandas中有两种解决时间迁移问题的方法:shift()和tshitf()方法

shift()方法是对数据进行迁移,而tshift是对索引进行迁移

google=pd.read_csv('GoogleStock.csv')
DatetimeIndex_1=pd.to_datetime(google['Date'])
google.index=DatetimeIndex_1
del google['Date']
google_close=google['Close']
fig, ax=plt.subplots(3,sharex=True)

google_close=google_close.asfreq('D',method='pad')                  #去除缺失值的影响

google_close.plot(ax=ax[0])
google_close.shift(900).plot(ax=ax[1])
google_close.shift(900).plot(ax=ax[2])

local_max=pd.to_datetime('2007-11-05')
offset=pd.Timedelta(900,'D')

ax[0].legend(['Original Curve'],loc=2)
ax[0].get_xticklabels()[4].set(weight='heavy',color='red')
ax[0].axvline(local_max,alpha=0.3,color='red')

ax[1].legend(['Shift(900)'],loc=2)
ax[1].get_xticklabels()[4].set(weight='heavy',color='red')
ax[1].axvline(local_max+offset,alpha=0.3,color='red')


ax[2].legend(['Tshift(900)'],loc=2)
ax[2].get_xticklabels()[1].set(weight='heavy',color='red')
ax[2].axvline(local_max+offset,alpha=0.3,color='red')

plt.show()

在这里插入图片描述

移动时间窗口

Pandas处理时间序列的第三种操作是移动统计值,计算移动统计值可以通过Series或者DataFrame对象的rolling()方法来实现

rolling()方法将会返回与groupby操作类似的结果

google=pd.read_csv('GoogleStock.csv')
DatetimeIndex_1=pd.to_datetime(google['Date'])
google.index=DatetimeIndex_1
del google['Date']
google_close=google['Close']

google_close=google_close.asfreq('D',method='pad')    

rolling=google_close.rolling(365,center=True)
data=pd.DataFrame({'Origin':google_close,'One-year Rolling Mean':rolling.mean(),'One-year Rolling Std':rolling.std()})
ax=data.plot(style=['-','--',':'])
ax.lines[0].set_alpha(0.3)
plt.show()