日期数据操作第2期 pandas库

最新推荐文章于 2024-07-14 15:07:49 发布

邓旭东HIT

最新推荐文章于 2024-07-14 15:07:49 发布

阅读量278

点赞数

本文链接：https://blog.csdn.net/weixin_38008864/article/details/103105120

版权

pandas库时间处理函数

获取当前时间，并返回出年月日规范格式。形如 2017-01-04

常用的方法有：

pd.date_range() 生成一个时间段
pd.bdate_range() 生成一个时间段，跟date_range()不同，可见下面代码
df.asfreq() 生成以一定时间间隔的序列

根据始末时间生成时间段

pd.date_range(start, end, freq) 生成一个时间段

freq参数由英文(M D H Min 。。。)、英文数字结合。D表示一天，M表示一月如20D表示20天，5M表示5个月。

#生成20171011-20171030
pd.date_range('20171011', '20171030',freq='5D')

DatetimeIndex(['2017-10-11', '2017-10-16', '2017-10-21', '2017-10-26'], dtype='datetime64[ns]', freq='5D')

根据起始向后生成时间段

pd.date_range(日期字符串, periods=5, freq='T') 生成一个时间段

periods ：时间段长度，整数类型

freq： 时间单位。月日时分秒。M D H ...

import pandas as pd

#20171231 12:50时间点开始，生成以月为间隔，长度为5的时间段

tm_rng = pd.date_range('20171231 12:50',periods=5,freq='M')

print(type(tm_rng))

print(tm_rng)

<class 'pandas.core.indexes.datetimes.DatetimeIndex'>

DatetimeIndex(['2017-12-31 12:50:00', '2018-01-31 12:50:00','2018-02-28 12:50:00', '2018-03-31 12:50:00',
'2018-04-30 12:50:00'],dtype='datetime64[ns]', freq='M')

我们发现date_range()生成的是index，那么我们就可以索引为日期类型的dateframe

#生成一个Series，时间段为索引

tm_series = pd.Series(range(len(tm_rng)),index=tm_rng)

tm_series

2017-12-31 12:50:00    0
2018-01-31 12:50:00    1
2018-02-28 12:50:00    2
2018-03-31 12:50:00    3
2018-04-30 12:50:00    4
Freq: M, dtype: int64

根据给定时间点向前（向后）生成时间段

pd.bdate_range(end,periods,freq) 根据end时间点开始,以freq为单位，向前生成周期为period的时间序列

pd.bdate_range(start,periods,freq) 根据start时间点开始,以freq为单位，向后生成周期为period的时间序列

#向前5天

print(pd.bdate_range(end='20180101',periods=5,freq='D'))

DatetimeIndex(['2017-12-28', '2017-12-29', '2017-12-30', '2017-12-31','2018-01-01'],dtype='datetime64[ns]', freq='D')

#向后5天

print(pd.bdate_range(start='20180101',periods=5,freq='D'))

DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04','2018-01-05'],dtype='datetime64[ns]', freq='D')

改变时间间隔

对dateframe或者series对象操作，更改对象中时间的时间间隔。

dateframe.asfreq(freq='时间间隔',method='填充方式',fill_value='对Nan值进行填充')


freq格式：M D H Min 。。。与数字结合。如20D表示20天，5M表示5个月。

method：有pad、backfill两种填充方式

fill_value：缺失值更改为fill_value的值。

#改变时间间隔,以20天为间隔

tm_series.asfreq('20D',method='pad')

2017-12-31 12:50:00    0
2018-01-20 12:50:00    0
2018-02-09 12:50:00    1
2018-03-01 12:50:00    2
2018-03-21 12:50:00    2
2018-04-10 12:50:00    3
2018-04-30 12:50:00    4
Freq: 20D, dtype: int64

#改变时间间隔,以20天为间隔

tm_series.asfreq('20D',method='backfill')

2017-12-31 12:50:00    0
2018-01-20 12:50:00    1
2018-02-09 12:50:00    2
2018-03-01 12:50:00    3
2018-03-21 12:50:00    3
2018-04-10 12:50:00    4
2018-04-30 12:50:00    4
Freq: 20D, dtype: int64

#改变时间间隔,以100小时为间隔

tm_series.asfreq('100H')

2017-12-31 12:50:00    0.0
2018-01-04 16:50:00    NaN
2018-01-08 20:50:00    NaN
2018-01-13 00:50:00    NaN
.....
2018-04-10 12:50:00    NaN
2018-04-14 16:50:00    NaN
2018-04-18 20:50:00    NaN
2018-04-23 00:50:00    NaN
2018-04-27 04:50:00    NaN
Freq: 100H, dtype: float64

#改变时间间隔,以100小时为间隔
tm_series.asfreq('100H',fill_value='缺失值')

2017-12-31 12:50:00      0
2018-01-04 16:50:00    缺失值
2018-01-08 20:50:00    缺失值
2018-01-13 00:50:00    缺失值
.....
2018-04-14 16:50:00    缺失值
2018-04-18 20:50:00    缺失值
2018-04-23 00:50:00    缺失值
2018-04-27 04:50:00    缺失值
Freq: 100H, dtype: object

可以统一日期格式

data = pd.Series(['May 20, 2017','2017-07-12','20170930','2017/10/11','2017 12 11'])

pd.to_datetime(data)

0   2017-05-20
1   2017-07-12
2   2017-09-30
3   2017-10-11
4   2017-12-11
dtype: datetime64[ns]

提取指定日期的数据

如下tm_rng是以5小时时间间隔，生成了20个数据。我们只要2018-01-02的数据。对Series或Dataframe都可以使用日期字符串操作，选取指定时间范围的数据。

import pandas as pd
import numpy as np

tm_rng = pd.date_range('2017-12-31 12:00:00',periods=20,freq='5H')

tm_series = pd.Series(np.random.randn(len(tm_rng)), index=tm_rng)

print(type(tm_series))

print(tm_series)

<class 'pandas.core.series.Series'>
2017-12-31 12:00:00    0.618465
2017-12-31 17:00:00   -0.963631
2017-12-31 22:00:00   -0.782348
.....
2018-01-04 06:00:00   -0.681123
2018-01-04 11:00:00   -0.710626
Freq: 5H, dtype: float64

#我们只要tm_series中是2018-01-02的数据
tm_series['2018-01-02']

2018-01-02 04:00:00    0.293941
2018-01-02 09:00:00   -1.437363
2018-01-02 14:00:00   -0.527275
2018-01-02 19:00:00    1.140872
Freq: 5H, dtype: float64

#我们要2018年的数据，结果全保留
tm_series['2018']

2018-01-01 03:00:00   -0.363019
2018-01-01 08:00:00    0.426922
2018-01-01 13:00:00   -1.118425
2018-01-01 18:00:00    0.956300
.....
2018-01-03 20:00:00   -1.967839
2018-01-04 01:00:00   -0.654029
2018-01-04 06:00:00   -0.681123
2018-01-04 11:00:00   -0.710626
Freq: 5H, dtype: float64

dft = pd.DataFrame(np.random.randn(len(tm_rng)), index=tm_rng)

print(type(dft))
print(dft)

<class 'pandas.core.frame.DataFrame'>
                            
2017-12-31 12:00:00  0.213331
2017-12-31 17:00:00  1.920131
2017-12-31 22:00:00 -1.608645
2018-01-01 03:00:00 -0.226439
2018-01-01 08:00:00 -0.558741
.....

2018-01-03 20:00:00  0.866822
2018-01-04 01:00:00 -0.361902
2018-01-04 06:00:00  0.902717
2018-01-04 11:00:00 -0.431569

#对dataframe中的时间操作，只要2018-01-04日的数据

print(type(dft['2018-01-04']))

print(dft['2018-01-04'])

<class 'pandas.core.frame.DataFrame'>
                            
2018-01-04 01:00:00 -0.361902
2018-01-04 06:00:00  0.902717
2018-01-04 11:00:00 -0.431569

近期文章

代码不到40行的超燃动态排序图

Python网络爬虫与文本数据分析

日期数据操作第1期 datetime库

Python语法快速入门

Python爬虫快速入门

文本数据分析文章汇总(2016-至今)

当文本分析遇到乱码(à¸‡'âŒ£')à¸‡怎么办？

Loughran&McDonald金融文本情感分析库

使用分析师报告中含有的情感信息预测上市公司股价变动

当pandas遇上数据类型问题

如何理解pandas中的transform函数