Python时间序列处理之datetime与pandas模块

最新推荐文章于 2023-03-27 17:07:19 发布

Reclusiveman

最新推荐文章于 2023-03-27 17:07:19 发布

阅读量2.6k

点赞数 1

分类专栏：机器学习文章标签：时间序列 datetimne pandas 滑动窗口重采样

本文链接：https://blog.csdn.net/qq_40707407/article/details/81916339

版权

机器学习专栏收录该内容

5 篇文章 3 订阅

订阅专栏

每次遇到时间类型的数据做处理的时候，我会非常头疼，我忍无可忍之下决定硬着头皮学习一下，发现也不是很复杂，掌握一些基础方法就可以做，下面我将一一介绍这些有效的方法。

datetime模块

#导入datetime包
import datetime

#获取当前时间
now = datetime.now()
print(now)

#格式化输出一下
print('年: {}, 月: {}, 日: {}'.format(now.year, now.month, now.day))

#算时间差
diff = datetime(2018, 8, 20, 17) - datetime(2017, 7, 20, 15)
print(diff)

字符串和datetime转换

#datetime转string
dt_obj = datetime(2018, 8, 20)
str_obj = str(dt_obj)
print(type(str_obj))
print(str_obj)

#string转datetime
dt_str = '2018-08-20'
dt_obj2 = datetime.strptime(dt_str, '%Y-%m-%d')#必须是这种形式否则会报错
print(type(dt_obj2))
print(dt_obj2)

dateutil解析

#导入包
from dateutil.parser import parse
dt_str2 = '2018/08/20'#可以是各种可以被解析的格式
dt_obj3 = parse(dt_str2)
print(type(dt_obj3))
print(dt_obj3)

pandas的datetime

#将一组转化为时间类型
import pandas as pd
s_obj = pd.Series(['2018/08/18', '2018/08/19', '2018-08-25', '2018-08-26'])
s_obj2 = pd.to_datetime(s_obj)
print(s_obj2)

pandas的时间序列处理

#导入包
from datetime import datetime
import pandas as pd
import numpy as np

#将index变为datetime的列表形式(这样会让处理变得十分方便）
date_list = [datetime(2018, 2, 18), datetime(2018, 2, 19), 
             datetime(2018, 2, 25), datetime(2018, 2, 26), 
             datetime(2018, 3, 4), datetime(2018, 3, 5)]
time_s = pd.Series(np.random.randn(len(date_list)), index=date_list)
print(times_s)

#pd.date_range()生成一组日期
dates = pd.date_range('2018-08-18', # 起始日期
                      periods=5,    # 周期
                      freq='W-SAT') # 频率(周六开始）
print(dates)
print(pd.Series(np.random.randn(5), index=dates))

#索引，index为时间之后，索引变得很方便
#传入可被解析的字符串
print(time_s['2018/08/18'])
#传入年月
print(time_s['2018-8'])

#切片与过滤
print(time_s['2018-8-19':])

print(time_s.truncate(before='2018-8-20'))
print(time_s.truncate(after='2017-8-20'))

#还可以这样生成日期
time = pd.date_range('2018/08/18', '2018/08/28', freq='2D')#freq是频率，2D代表两天，可以3D,5D......

#shift移动数据
ts = pd.Series(np.random.randn(5), index=pd.date_range('20180818', periods=5, freq='W-SAT'))
print(ts)
#后移
print(ts.shift(1))
#前移
print(ts.shift(-1))

时间数据重采样resample（重点）

import pandas as pd
import numpy as np

#数据生成
date_rng = pd.date_range('20180101', periods=100, freq='D')
ser_obj = pd.Series(range(len(date_rng)), index=date_rng)
print(ser_obj.head(10))

#按月求和
resample_month_sum = ser_obj.resample('M').sum()
#按月求平均
resample_month_sum = ser_obj.resample('M').mean()
print(resample_month_sum)
#还可以按5天或者10天......
resample_month_sum = ser_obj.resample('5D').sum()
resample_month_sum = ser_obj.resample('10D').mean()

#以上做的其实是降采样，也就是将长时间间隔变为短的来处理一些数据，比如从月为间隔变为天为间隔，进行求和平均等待，其实还可升采样，但是会存在缺失数据的问题，可以通过一些方式来弥补缺失数据。

#升采样以及缺失数据处理
#按周生成数据
df = pd.DataFrame(np.random.randn(5, 3),
                 index=pd.date_range('20180101', periods=5, freq='W-MON'),
                 columns=['S1', 'S2', 'S3'])
print(df)
#按天升采样
print(df.resample('D').asfreq())
#前补数据，将缺失数据补全为前面的数据
print(df.resample('D').ffill(2))#补两个，不指定数字全补全
#后补
print(df.resample('D').bfill())
#拟合补数据
print(df.resample('D').fillna('ffill'))#做线性拟合

时间序列数据统计——滑动窗口

import pandas as pd
import numpy as np

#生成数据
ser_obj = pd.Series(np.random.randn(1000), 
                    index=pd.date_range('20180101', periods=1000))
ser_obj = ser_obj.cumsum()#累加
print(ser_obj.head())

#rolling滑动
r_obj = ser_obj.rolling(window=5)#窗口为5
print(r_obj)
print(r_obj.mean())#求均值，即第五个数据是前五个数据的均值，以此类推

# 画图查看
import matplotlib.pyplot as plt

#pandas直接plot，很方便，index默认是x，这也能看出index设置为时间序列的好处
ser_obj.plot(style='r--')
ser_obj.rolling(window=10).mean().plot(style='b')
plt.show()

ok，以上就是分析时序数据的一些常用的方法，希望给读者带来帮助。

Reclusiveman

关注

1
点赞
踩
12

收藏

觉得还不错? 一键收藏
2
评论
Python时间序列处理之datetime与pandas模块

每次遇到时间类型的数据做处理的时候，我会非常头疼，我忍无可忍之下决定硬着头皮学习一下，发现也不是很复杂，掌握一些基础方法就可以做，下面我将一一介绍这些有效的方法。datetime模块#导入datetime包import datetime#获取当前时间now = datetime.now()print(now)#格式化输出一下print('年: {}, 月: {}, 日:...
复制链接

扫一扫