1.带有重复索引的时间序列
dates=pd.DatetimeIndex(['1/1/2000','1/2/2000','1/2/2000','1/2/2000','1/3/2000'])
dup_ts=Series(np.arange(5),index=dates)
print dup_ts
结果为:
2000-01-01 0
2000-01-02 1
2000-01-02 2
2000-01-02 3
2000-01-03 4
dtype: int32
通过检查索引的is_unique属性,我们就可以知道它是不是唯一的
print dup_ts.index.is_unique
结果为:
False
对这个时间序列进行索引,要么产生标量值,要么产生切片,具体要看所选的时间点是否重复
print dup_ts['1/3/2000'] #不重复
print dup_ts['1/2/2000'] #重复
结果为:
4
2000-01-02 1
2000-01-02 2
2000-01-02 3
dtype: int32
假设你想要对具有唯一时间戳的数据进行聚合。一个办法是使用groupby,并传入level=0(索引的唯一 一层!)
grouped=dup_ts.groupby(level=0)
print grouped.mean()
print grouped.count()
结果为:
2000-01-01 0
2000-01-02 2
2000-01-03 4
dtype: int32
2000-01-01 1
2000-01-02 3
2000-01-03 1
dtype: int64
2.日期的范围、频率以及移动
dates=[datetime.datetime(2011,1,2),datetime.datetime(2011,1,5),
datetime.datetime(2011,1,7),datetime.datetime(2011,1,8),
datetime.datetime(2011,1,10),datetime.datetime(2011,1,12)]
ts=Series(np.random.randn(6),index=dates)
print ts
print ts.resample('D')
结果为:
2011-01-02 1.068995
2011-01-05 0.564281
2011-01-07 1.910822
2011-01-08 -0.339067
2011-01-10 -1.671388
2011-01-12 -0.679710
dtype: float64
2011-01-02 1.068995
2011-01-03 NaN
2011-01-04 NaN
2011-01-05 0.564281
2011-01-06 NaN
2011-01-07 1.910822
2011-01-08 -0.339067
2011-01-09 NaN
2011-01-10 -1.671388
2011-01-11 NaN
2011-01-12 -0.679710
Freq: D, dtype: float64
3.生成日期范围
index=pd.date_range('4/1/2012','6/1/2012')
print index
结果为:
DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
'2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
'2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
'2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
'2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20',
'2012-04-21', '2012-04-22', '2012-04-23', '2012-04-24',
'2012-04-25', '2012-04-26', '2012-04-27', '2012-04-28',
'2012-04-29', '2012-04-30', '2012-05-01', '2012-05-02',
'2012-05-03', '2012-05-04', '2012-05-05', '2012-05-06',
'2012-05-07', '2012-05-08', '2012-05-09', '2012-05-10',
'2012-05-11', '2012-05-12', '2012-05-13', '2012-05-14',
'2012-05-15', '2012-05-16', '2012-05-17', '2012-05-18',
'2012-05-19', '2012-05-20', '2012-05-21', '2012-05-22',
'2012-05-23', '2012-05-24', '2012-05-25', '2012-05-26',
'2012-05-27', '2012-05-28', '2012-05-29', '2012-05-30',
'2012-05-31', '2012-06-01'],
dtype='datetime64[ns]', freq='D')
默认情况下,date_range会产生按天计算的时间点。如果只传入起始或结束日期,那就还得传入一个表示一段时间的数字
print pd.date_range(start='4/1/2012',periods=20)
print pd.date_range(end='6/1/2012',periods=20)
结果为:
DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
'2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
'2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
'2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
'2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20'],