当DataFrame的索引为时间对象时,可以很方便的进行数据的筛选。
时间范围生成
#生成时间范围
pd.date_range('2010-01-01','2010-01-08')
>>>DatetimeIndex(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-04',
'2010-01-05', '2010-01-06', '2010-01-07', '2010-01-08'],
dtype='datetime64[ns]', freq='D')
#指定时间长度生成时间范围
pd.date_range('2010-01-01',periods=10)
>>>DatetimeIndex(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-04',
'2010-01-05', '2010-01-06', '2010-01-07', '2010-01-08',
'2010-01-09', '2010-01-10'],
dtype='datetime64[ns]', freq='D')
#指定时间长度(10)、间隔(隔一个月)生成时间范围
pd.date_range('2010-01-01',periods=10,freq='M')
>>>DatetimeIndex(['2010-01-31', '2010-02-28', '2010-03-31', '2010-04-30',
'2010-05-31', '2010-06-30', '2010-07-31', '2010-08-31',
'2010-09-30', '2010-10-31'],
dtype='datetime64[ns]', freq='M')
#指定时间长度(10)、间隔(隔1个小时20分钟)生成时间范围
pd.date_range('2010-01-01',periods=10,freq='1h20min')
>>>DatetimeIndex(['2010-01-01 00:00:00', '2010-01-01 01:20:00',
'2010-01-01 02:40:00', '2010-01-01 04:00:00',
'2010-01-01 05:20:00', '2010-01-01 06:40:00',
'2010-01-01 08:00:00', '2010-01-01 09:20:00',
'2010-01-01 10:40:00', '2010-01-01 12:00:00'],
dtype='datetime64[ns]', freq='80T')
#按工作日生成时间范围
pd.date_range('2010-01-01',periods=10,freq='B')
>>>DatetimeIndex(['2010-01-01', '2010-01-04', '2010-01-05', '2010-01-06',
'2010-01-07', '2010-01-08', '2010-01-11', '2010-01-12',
'2010-01-13', '2010-01-14'],
dtype='datetime64[ns]', freq='B')
时间筛选
#创建一个索引为日期1000天,值从0到999的Series
a=pd.Series(np.arange(1000),index=pd.date_range('2010-01-05',periods=1000))
>>> 2010-01-05 0
2010-01-06 1
2010-01-07 2
...
2012-09-29 998
2012-09-30 999
#筛选1月份的数据
a['2010-01']
>>> 2010-01-05 0
2010-01-06 1
2010-01-07 2
...
2010-01-30 25
2010-01-31 26
#时间切片筛选2010到2011-02-03的数据
a['2010':'2011-02-02']
>>> 2010-01-05 0
2010-01-06 1
...
2011-02-01 392
2011-02-02 393
筛选数据运算
a=pd.Series(np.arange(1000),index=pd.date_range('2010-01-05',periods=1000))
>>> 2010-01-05 0
2010-01-06 1
2010-01-07 2
...
2012-09-29 998
2012-09-30 999
#按每周求和,每7天进行相加求和输出
a.resample('W').sum()
>>> 2010-01-10 15
2010-01-17 63
2010-01-24 112
2010-01-31 161
...
2012-09-09 6825
2012-09-16 6874
2012-09-23 6923
2012-09-30 6972
#按每月求平均值
a.resample('M').mean()
>>> 2010-01-31 13.0
2010-02-28 40.5
2010-03-31 70.0
...
2012-07-31 923.0
2012-08-31 954.0
2012-09-30 984.5
读取excel时将字符串列转化为时间类型的列
pd.read_csv('demo.csv',parse_dates = ['col1'],infer_datetime_format=True)