pandas 时间序列基础

最新推荐文章于 2024-07-15 01:22:16 发布

code_new_life

最新推荐文章于 2024-07-15 01:22:16 发布

阅读量344

点赞数

分类专栏： python函数辨析文章标签： python

本文链接：https://blog.csdn.net/weixin_38656890/article/details/79776403

版权

python函数辨析专栏收录该内容

18 篇文章 3 订阅

订阅专栏

pandas 中最常用的时间序列类型就是以时间戳为索引的Series :

from datetime import datetime

dates = [datetime(2011, 1, 2), datetime(2011, 1, 5), datetime(2011, 1, 7),
         datetime(2011, 1, 8), datetime(2011, 1, 10), datetime(2011, 1, 12)]
ts = Series(np.random.randn(6), index=dates)

ts
2011-01-02   -1.739738
2011-01-05   -0.813930
2011-01-07   -0.083642
2011-01-08    0.418713
2011-01-10    0.116473
2011-01-12   -1.048764
dtype: float64

 #这时候变量ts就自动变成一个TimeSeries了：
type(ts)
pandas.core.series.Series


ts.index   # 索引也被放在一个DatetimeIndex 中
DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)


# 时间序列之间的算术运算会自动对齐

ts + ts[::2]

2011-01-02   -3.479476
2011-01-05         NaN
2011-01-07   -0.167284
2011-01-08         NaN
2011-01-10    0.232945
2011-01-12         NaN
dtype: float64

由于TimeSeries 是 Series 的一个子类，因此索引选取的行为大多是一样的，这里还有一

种更为方便的用法：传入一日期字符串也是可以的

ts['1/10/2011']
0.11647269713229616

ts['20110110']
0.11647269713229616

当传入比较长的时间序列时候，传入‘年’，或者‘年月’ 可以选取对应的年份或者月份
传入一个时间范围可以返回在这个范围的日期，注意这里切片产生的原时间的视图。
此外，还有一个等价的实例方法可以截取两个日期之间的TimeSeries:

ts.truncate(after='1/9/2011')

2011-01-02   -1.739738
2011-01-05   -0.813930
2011-01-07   -0.083642
2011-01-08    0.418713
dtype: float64

当有重复时间点时候可以使用 is_unique 属性判断是不是唯一的

dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000', '1/2/2000',
                          '1/3/2000'])
dup_ts = Series(np.arange(5), index=dates)
dup_ts

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-01-02    3
2000-01-03    4
dtype: int32

dup_ts.index.is_unique
False


#此时对重复日期索引会返回所有的值


dup_ts['1/2/2000']
2000-01-02    1
2000-01-02    2
2000-01-02    3
dtype: int32




#想对非唯一的数据进行聚合，使用groupby  传入 level = 0
grouped = dup_ts.groupby(level=0)
grouped.mean()

2000-01-01    0
2000-01-02    2
2000-01-03    4
dtype: int32

grouped.count()

2000-01-01    1
2000-01-02    3
2000-01-03    1
dtype: int64