pandas时间序列

最新推荐文章于 2024-05-10 02:53:37 发布

zk仔的博客

最新推荐文章于 2024-05-10 02:53:37 发布

阅读量429

点赞数

分类专栏： python_数据分析

本文链接：https://blog.csdn.net/weixin_39532362/article/details/86553075

版权

python_数据分析专栏收录该内容

15 篇文章 0 订阅

订阅专栏

pandas时间序列

datatime模块
- 日期和时间数据类型
- 字符串与datetime之间的转换
时间序列基础
时区处理
- 时区的转换
- Timestamp对象与时区
时期
时间序列绘图
- 移动窗口函数

datatime模块

日期和时间数据类型

now=datetime.datetime.now() #生成当前日期和时间
now.year
now.day
now.second

delta=datetime.timedelta(100,10)
delta.days
delta.seconds

datetime模块中的数据类型

类型	说明
date	储存日历日期，（年，月，日）
time	储存时间（时，分，秒，毫秒）
datetime	存储日期和时间（年，月，日，时，分，秒，毫秒）
timedelta	表示两个datetime值之间的差（日，秒，毫秒）

字符串与datetime之间的转换

from dateutil.parser import parse

# 时间转回字符串
stamp=datetime.datetime(2011,1,3)
str(stamp)
stamp.strftime('%Y-%m-%d')


# 字符串转为时间
value='2011-01-03'
values=['2011-01-03','2012-02-06']
datetime.datetime.strptime(value,'%Y-%m-%d')

parse(value) #几乎能解释所有日期表示形式
parse('6/12/2011',dayfirst=True)

print(pd.to_datetime(value))
print(pd.to_datetime(values)) #返回DatatimeIndex

datetime格式定义

代码	说明
%Y	4位数年
%y	2位数年
%m	2位数月
%d	2位数日
%H	2位24制小时
%I	2位12制小时
%M	2位数分
%S	2位数秒
%w	用整数表示星期，0为星期天
%U	一年的第几周，星期天为第一天，0周是第一个星期天的前几天
%W	一年的第几周，星期一位第一天，0周是第一个星期一的前几天
%z	_HHMM或-HHMM表示UTC时区偏移量
%F	%Y-%m-%d的简写
%D	%m-%d-%Y

特定于当前环境的日期格式

代码	说明
%a	星期几简写
%A	星期几全称
%b	月份简写
%B	月份全称
%c	完整的日期和时间
%p	不同环境的AM或PM
%x	适合于当前环境的日期格式
%X	适合于当前环境的时间格式

时间序列基础

如果index数据类型为datetime.datetime,则数据为TimeSeries对象，index为DatetimeIndex
pandas用numpy的datetime64类型储存时间拽s.index.dtype
DateTimeIndex中各个标量值是pandas的Timestamp对象，有需要可随时自动转换为datetime对象

索引，选取，子集，构造

可以传入字符串日期，datetime，Timestamp
传入可被解释的日期字符串ts['1/10/2012']
可传入年或年月ts['2011']``ts['2011-09']
日期切片只对符合规则有效ts[datetime(2011,1,7):]
可以用不存在索引列表中的时间切片选取范围
ts.truncate(after='1/9/2001')

重复索引时间序列

ts.index.unique返回False表示不唯一
对非唯一时间拽进行聚合ts.groupby(level=0).mean()

日期的范围，频率，移动

生成时间范围

index=pd.date_range('4/1/2012','6/1/2013')
index=pd.date_range(start='4/1/2012',periods=20)
index=pd.date_range(end='4/1/2012',periods=20)

index=pd.date_range(start='4/1/2012 12:56:31',periods=20) #时间拽会被保留

index=pd.date_range(start='4/1/2012 12:56:31',periods=20,normalize=True) #把时间设置为0点

频率和日期偏移

频率类型

from pandas.tseries.offsets import Hour,Minute
hour=Hour()
hour=Hour(4)
Hour(2)+Minute(30) #频率加法，范围更小的单位类型

index=pd.date_range(start='4/1/2012 12:56:31',periods=20,freq='4h') #频率为4小时
index=pd.date_range(start='4/1/2012 12:56:31',periods=20,freq='1h30min') #频率为4小时
index=pd.date_range('1/1/2012','1/1/2013',freq='BM') #设置频率 每月最后一个工作日
index=pd.date_range('1/1/2012','1/1/2013',freq='WOM-3FRI') #设置频率 每月的第3个星期5

偏移

ts/ts.shift(1)-1 #计算百分比，正数时数据下移，负数时数据上移

ts.shift(2,freg='M') #移动索引，正数时索引上移，，负数时索引下移，可以理解为数据移动并去除NA值索引

now=datetime.now()
now+MonthEnd() #返回当前月的最后一天
now+MonthEnd(2) #返回下个月的最后一天

MonthEnd().rollforward(now) #返回当前月的最后一天
MonthEnd().rollback(now) #返回上一个月的最后一天

ts.groupby(MonthEnd().rollforward).mean() #按月分组统计
ts.resample('M',how='mean') #按月分组统计

时间序列的基本频率

别名	偏移量类型	说明
D	Day	每日历日
B	BusinessDay	每工作日
H	Hour	每小时
T或min	Minute	每分钟
S	Second	每秒
L或ms	Milli	每毫秒(千分之一秒)
U	Micro	每微妙(百万分之一秒)
M	MonthEnd	每月最后一个日历日
BM	BusinessMonthEnd	每月最后一个工作日
MS	MonthBegin	每月第一个日历日
BMS	BusinessMonthBegin	每月第一个工作日
W-MON	Week	每周，指定起算的星期，(MON,TUE,WED,THU,FRI,SAT,SUN)
WOM-1MON	WeekOfMonth	产生每月的第几个星期几
Q-JAN	QuarterEnd	每季度，每年以指定月份结束，标记为指定月最后一个日历日，(JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC)
BQ-JAN	BusinessQuarterEnd	每季度，每年以指定月份结束，标记为指定月最后一个工作日
QS-JAN	QuarterBegin	每季度，每年以指定月份结束，标记为指定月第一个日历日
BQS-JAN	BusinessQuarterBegin	每季度，每年以指定月份结束，标记为指定月第一个工作日
A-JAN	YearEnd	每年，指定结束月，标记为指定月最后一个日历日
BA-JAN	BusinessYearEnd	每年，指定结束月，标记为指定月最后一个工作日
AS-JAM	YearBegin	每年，指定结束月，标记为指定月第一个日历日
BAS-JAM	BusinessYearBegin	每年，指定结束月，标记为指定月第一个工作日

时区处理

不同时区的运算时，实际时间拽用UTC存储

时区的转换

tz_localize和tz_convert是DateTimeIndex的实例方法

import pytz

pytz.common_timezones[-5:] #获取时区名称

pytz.timezone('US/Eastern') #获取时区对象

index=pd.date_range('3/9/2012',periods=10,freq='D',tz='UTC') #设置时区集

ts.tz_localize('UTC') #本地化的设置，把什么实际设置为本地

ts.tz_convert('US/Eastern') #时区转换

Timestamp对象与时区

stamp=pd.Timestamp('2011-03-12 04:00')
stamp=pd.Timestamp('2011-03-12 04:00',tz='UTC')
stamp.tz_localize('UTC')
stamp.tz_convert('US/Eastern')
stamp.value #返回1970年1月1日算起的纳秒数
stamp+Hour()

时期

pd.Period('2007',freq='A-DEC') #生成时期对象

pd.period_range('2006','2009',freq='A-DEC') #创建时期范围 返回PeriodIndex

pd.PeriodIndex(['2001Q3','2002Q2','2003Q3'],freq='Q-DEC') #创建PeriodIndex

pd.PeriodIndex(year=data.year,quarter=data.quarter,freq='Q-DEC') #合并两列时间

时期的频率转换

在这里插入图片描述

PeriodIndex和TimeSeries的频率转换

按季度计算的时期频率

在这里插入图片描述

Timestamp与Period的转换

ts.to_period('M') 

to_timestamp(how='end')

重采样及频率转换

在这里插入图片描述

降采样
升采样
convention
OHLC重采样

resample方法的参数

参数	说明
rule	表示重采样频率的字符串或DataOffset类型
how	用于产生聚合值得函数名或函数列表，‘mean’,‘ohlc’,np.max，默认为mean，还有first，last，median，max，min等
axis	重采样轴，默认0
fill_method	差值方式ffill，bfill
closed	设置闭合（包含）端，默认为right
label	设置聚合标签，默认right
loffset	面元标签校正值，如‘-1s’/Second(-1)
limit	允许填充最大时期数
kind	聚合到时期（period）或时间拽（timestamp），默认聚合元索引类型
convention	高频转换时采用的约定，默认为end，还有start

时间序列绘图

df.ix['2009'].plot()
ser.ix['01-2001':'03-2011'].plot()

移动窗口函数

#250日均线
pd.rolling_mean(ser,250).plot()
#250每日标准差
pd.rolling_std(ser,250,min_periods=10)

移动窗口和指数加权函数

函数	说明
rolling_count	返回各窗口非na观测值的数量
rolling_sum	移动窗口的和
rolling_mean	平均值
rolling_median	中位数
rolling_var,rolling_std	方差，标准差
rolling_skew,rolling_kurt	偏度，峰度
rolling_min,rolling_max	最小值，最大值
rolling_quantile	指定百分位/样本分位数位置的值
rolling_corr，rolling_cov	相关系数，协方差
rolling_apply	对移动窗口应用普通数组函数
ewma	指数加权移动平均
ewmvar,ewmstd	指数加权移动方差，标准差
ewmcorr,ewmcov	指数加权移动相关系数，协方差

zk仔的博客

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
pandas时间序列

pandas时间序列datatime模块日期和时间数据类型日期与datetime之间的转换时间序列基础索引，选取，子集，构造重复索引时间序列日期的范围，频率，移动datatime模块日期和时间数据类型now=datetime.datetime.now() #生成当前日期和时间now.yearnow.daynow.seconddelta=datetime.timedelta(100,...
复制链接

扫一扫

专栏目录