Pandas时间序列:索引方式

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/bqw18744018044/article/details/80906343
from datetime import datetime
import pandas as pd
import numpy as np

一、构建以datetime为索引的Series

dates = [datetime(2018,7,1),datetime(2018,7,3),datetime(2018,7,5),datetime(2018,7,7),datetime(2018,7,9),datetime(2018,7,11)]
ts = pd.Series(np.random.randn(6),index=dates)
ts
2018-07-01    0.578942
2018-07-03    0.465359
2018-07-05    0.037308
2018-07-07   -2.784810
2018-07-09   -0.053657
2018-07-11   -0.421860
dtype: float64

二、索引

stamp = ts.index[2]
ts[stamp]
0.037308214634515072

同一日期的不同写法进行索引会得到同样的结果

print(ts['2018-07-01'])
print(ts['20180701'])
0.578941720925
0.578941720925

通过“年”或“年月”可以轻松切片

longer_ts = pd.Series(np.random.randn(1000),index=pd.date_range('1/1/2000',periods=1000))
longer_ts.shape
(1000,)
longer_ts['2001'] # 按年切片
2001-01-01   -0.183849
2001-01-02    0.393347
2001-01-03   -1.777193
2001-01-04    0.113331
2001-01-05   -0.283405
2001-01-06    0.448394
2001-01-07   -0.062456
2001-01-08    1.162382
2001-01-09    0.912657
2001-01-10   -1.054139
2001-01-11    1.639748
2001-01-12    0.026114
2001-01-13    1.526940
2001-01-14    0.590537
2001-01-15   -2.123738
2001-01-16    0.836568
2001-01-17   -0.741981
2001-01-18    0.039363
2001-01-19   -1.251572
2001-01-20   -0.223897
2001-01-21   -1.306646
2001-01-22    0.977636
2001-01-23   -1.874165
2001-01-24    0.074528
2001-01-25   -0.547662
2001-01-26   -1.464749
2001-01-27   -0.400133
2001-01-28    1.082618
2001-01-29    0.370398
2001-01-30   -0.745193
                ...   
2001-12-02   -0.699917
2001-12-03   -0.127025
2001-12-04    0.126134
2001-12-05    0.906386
2001-12-06    0.534549
2001-12-07    0.419908
2001-12-08    1.415131
2001-12-09   -1.601909
2001-12-10   -1.232961
2001-12-11   -0.676176
2001-12-12   -0.714718
2001-12-13    1.143975
2001-12-14   -1.087204
2001-12-15    1.752753
2001-12-16   -3.039599
2001-12-17   -0.597569
2001-12-18    0.055790
2001-12-19    0.379972
2001-12-20   -1.410376
2001-12-21   -2.095945
2001-12-22   -0.035397
2001-12-23   -0.202549
2001-12-24    0.377027
2001-12-25   -0.820194
2001-12-26   -1.138857
2001-12-27    0.491915
2001-12-28    1.188331
2001-12-29   -0.680069
2001-12-30    1.608267
2001-12-31    1.723339
Freq: D, Length: 365, dtype: float64
longer_ts['2001-05'] # 按年月切片
2001-05-01   -1.440953
2001-05-02   -2.132345
2001-05-03    1.132536
2001-05-04   -0.365506
2001-05-05    0.997308
2001-05-06    0.017255
2001-05-07    1.880290
2001-05-08    0.819983
2001-05-09    1.697819
2001-05-10   -3.067531
2001-05-11    0.637673
2001-05-12   -0.587333
2001-05-13    0.518774
2001-05-14    0.823871
2001-05-15   -0.474210
2001-05-16   -0.746972
2001-05-17    0.822030
2001-05-18    2.103642
2001-05-19    1.074490
2001-05-20    1.012978
2001-05-21    0.324720
2001-05-22   -0.096673
2001-05-23   -0.085382
2001-05-24    1.455619
2001-05-25    0.120917
2001-05-26   -0.639450
2001-05-27    0.804710
2001-05-28    0.721796
2001-05-29   -0.887137
2001-05-30    0.416457
2001-05-31    0.960286
Freq: D, dtype: float64

使用日期进行切片只对Series有效

ts[datetime(2018,7,3):]
2018-07-03    0.465359
2018-07-05    0.037308
2018-07-07   -2.784810
2018-07-09   -0.053657
2018-07-11   -0.421860
dtype: float64

判断重复索引

dates = pd.DatetimeIndex(['7/4/2018','7/5/2018','7/5/2018','7/5/2018','7/6/2018'])
dup_ts = pd.Series(np.arange(5),index=dates)
dup_ts.index.is_unique
False

通过groupby去除重复索引

dup_ts['7/5/2018']
2018-07-05    1
2018-07-05    2
2018-07-05    3
dtype: int32
dup_ts.groupby(level=0).mean()
2018-07-04    0
2018-07-05    2
2018-07-06    4
dtype: int32
展开阅读全文

没有更多推荐了,返回首页