一、索引
时间序列标签索引,支持各种时间字符串,以及datetime.datetime
- 时间序列由于按照时间先后排序,故不用考虑顺序问题
- 索引方法同样适用于Dataframe
import numpy as np
import pandas as pd
from datetime import datetime
# 索引
rng = pd.date_range('2017/1', '2017/3')
ts = pd.Series(np.random.rand(len(rng)), index=rng)
print("ts = \n", ts)
print("-" * 50)
print("ts.head() = \n", ts.head())
print("-" * 200)
# 基本下标位置索引
print("下标位置索引: ts[0] = ", ts[0])
print("下标位置索引: ts[:2] = \n", ts[:2])
print("-" * 200)
# 时间序列标签索引,支持各种时间字符串,以及datetime.datetime
# 时间序列由于按照时间先后排序,故不用考虑顺序问题
# 索引方法同样适用于Dataframe
print("ts['2017/1/2'] = ", ts['2017/1/2'])
print("ts['20170103'] = ", ts['20170103'])
print("ts['1/10/2017'] = ", ts['1/10/2017'])
print("ts[datetime(2017, 1, 20)] = ", ts[datetime(2017, 1, 20)])
print("-" * 200)
打印结果:
ts =
2017-01-01 0.551172
2017-01-02 0.676984
2017-01-03 0.449515
2017-01-04 0.029888
2017-01-05 0.760317
2017-01-06 0.237550
2017-01-07 0.447621
2017-01-08 0.765687
2017-01-09 0.594706
2017-01-10 0.127133
2017-01-11 0.585002
2017-01-12 0.715092
2017-01-13 0.452857
2017-01-14 0.002166
2017-01-15 0.919406
2017-01-16 0.661433
2017-01-17 0.816985
2017-01-18 0.054109
2017-01-19 0.941522
2017-01-20 0.577710
2017-01-21 0.896383
2017-01-22 0.062862
2017-01-23 0.765347
2017-01-24 0.592148
2017-01-25 0.278556
2017-01-26 0.090711
2017-01-27 0.772405
2017-01-28 0.685413
2017-01-29 0.564777
2017-01-30 0.249494
2017-01-31 0.353693
2017-02-01 0.641812
2017-02-02 0.744452
2017-02-03 0.802991
2017-02-04 0.286702
2017-02-05 0.505531
2017-02-06 0.147288
2017-02-07 0.412554
2017-02-08 0.690443
2017-02-09 0.219935
2017-02-10 0.631287
2017-02-11 0.283691
2017-02-12 0.637356
2017-02-13 0.414368
2017-02-14 0.670913
2017-02-15 0.982919
2017-02-16 0.787294
2017-02-17 0.783862
2017-02-18 0.110436
2017-02-19 0.631306
2017-02-20 0.857404
2017-02-21 0.697764
2017-02-22 0.990373
2017-02-23 0.876479
2017-02-24 0.617759
2017-02-25 0.370738
2017-02-26 0.523457
2017-02-27 0.074906
2017-02-28 0.875270
2017-03-01 0.455254
Freq: D, dtype: float64
--------------------------------------------------
ts.head() =
2017-01-01 0.551172
2017-01-02 0.676984
2017-01-03 0.449515
2017-01-04 0.029888
2017-01-05 0.760317
Freq: D, dtype: float64
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
下标位置索引: ts[0] = 0.5511722618400913
下标位置索引: ts[:2] =
2017-01-01 0.551172
2017-01-02 0.676984
Freq: D, dtype: float64
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ts['2017/1/2'] = 0.6769837637858711
ts['20170103'] = 0.4495150651749722
ts['1/10/2017'] = 0.12713279349021678
ts[datetime(2017, 1, 20)] = 0.5777095683188953
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Process finished with exit code 0
二、切片
和Series按照index索引原理一样,也是末端包含。
import numpy as np
import pandas as pd
# 切片
rng = pd.date_range('2017/1', '2017/3', freq='12H')
ts = pd.Series(np.random.rand(len(rng)), index=rng)
print("ts = \n", ts)
print('-' * 200)
# 和Series按照index索引原理一样,也是末端包含
data1 = ts['2017/1/5':'2017/1/10']
print("data1 = ts['2017/1/5':'2017/1/10'] = \n", data1)
print('-' * 200)
# 传入月,直接得到一个切片
data2 = ts['2017/2']
data3 = data2.head()
print("data2 = ts['2017/2'] = \n", data2)
print('-' * 50)
print("data3 = ts['2017/2'].head() = \n", data3)
print('-' * 200)
打印结果:
ts =
2017-01-01 00:00:00 0.494033
2017-01-01 12:00:00 0.820702
2017-01-02 00:00:00 0.616621
2017-01-02 12:00:00 0.011143
2017-01-03 00:00:00 0.940433
...
2017-02-27 00:00:00 0.978302
2017-02-27 12:00:00 0.414231
2017-02-28 00:00:00 0.218717
2017-02-28 12:00:00 0.580957
2017-03-01 00:00:00 0.090996
Freq: 12H, Length: 119, dtype: float64
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
data1 = ts['2017/1/5':'2017/1/10'] =
2017-01-05 00:00:00 0.652041
2017-01-05 12:00:00 0.773052
2017-01-06 00:00:00 0.463288
2017-01-06 12:00:00 0.335351
2017-01-07 00:00:00 0.099362
2017-01-07 12:00:00 0.883344
2017-01-08 00:00:00 0.426475
2017-01-08 12:00:00 0.580315
2017-01-09 00:00:00 0.863783
2017-01-09 12:00:00 0.494119
2017-01-10 00:00:00 0.577613
2017-01-10 12:00:00 0.168280
Freq: 12H, dtype: float64
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
data2 = ts['2017/2'] =
2017-02-01 00:00:00 0.906511
2017-02-01 12:00:00 0.208719
2017-02-02 00:00:00 0.831267
2017-02-02 12:00:00 0.496934
2017-02-03 00:00:00 0.882586
2017-02-03 12:00:00 0.269308
2017-02-04 00:00:00 0.767492
2017-02-04 12:00:00 0.928533
2017-02-05 00:00:00 0.404165
2017-02-05 12:00:00 0.573177
2017-02-06 00:00:00 0.298927
2017-02-06 12:00:00 0.987986
2017-02-07 00:00:00 0.097949
2017-02-07 12:00:00 0.971335
2017-02-08 00:00:00 0.194750
2017-02-08 12:00:00 0.224471
2017-02-09 00:00:00 0.628354
2017-02-09 12:00:00 0.487055
2017-02-10 00:00:00 0.166684
2017-02-10 12:00:00 0.644644
2017-02-11 00:00:00 0.479011
2017-02-11 12:00:00 0.035003
2017-02-12 00:00:00 0.694782
2017-02-12 12:00:00 0.784163
2017-02-13 00:00:00 0.740384
2017-02-13 12:00:00 0.983730
2017-02-14 00:00:00 0.010376
2017-02-14 12:00:00 0.026971
2017-02-15 00:00:00 0.012298
2017-02-15 12:00:00 0.679321
2017-02-16 00:00:00 0.594517
2017-02-16 12:00:00 0.260168
2017-02-17 00:00:00 0.405923
2017-02-17 12:00:00 0.856798
2017-02-18 00:00:00 0.615552
2017-02-18 12:00:00 0.261799
2017-02-19 00:00:00 0.786273
2017-02-19 12:00:00 0.316262
2017-02-20 00:00:00 0.457370
2017-02-20 12:00:00 0.975753
2017-02-21 00:00:00 0.232189
2017-02-21 12:00:00 0.373186
2017-02-22 00:00:00 0.506089
2017-02-22 12:00:00 0.849335
2017-02-23 00:00:00 0.623559
2017-02-23 12:00:00 0.215287
2017-02-24 00:00:00 0.985915
2017-02-24 12:00:00 0.998497
2017-02-25 00:00:00 0.294932
2017-02-25 12:00:00 0.993772
2017-02-26 00:00:00 0.852245
2017-02-26 12:00:00 0.957576
2017-02-27 00:00:00 0.978302
2017-02-27 12:00:00 0.414231
2017-02-28 00:00:00 0.218717
2017-02-28 12:00:00 0.580957
Freq: 12H, dtype: float64
--------------------------------------------------
data3 = ts['2017/2'].head() =
2017-02-01 00:00:00 0.906511
2017-02-01 12:00:00 0.208719
2017-02-02 00:00:00 0.831267
2017-02-02 12:00:00 0.496934
2017-02-03 00:00:00 0.882586
Freq: 12H, dtype: float64
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Process finished with exit code 0
三、重复索引的时间序列
import numpy as np
import pandas as pd
# index有重复,is_unique检查 → values唯一,index不唯一
dates = pd.DatetimeIndex(['1/1/2015', '1/2/2015', '1/3/2015', '1/4/2015', '1/1/2015', '1/2/2015'])
ts = pd.Series(np.random.rand(6), index=dates)
print("ts = \n", ts)
print('-' * 50)
print("ts.is_unique = {0}, ts.index.is_unique = {1}".format(ts.is_unique, ts.index.is_unique))
print('-' * 200)
# index有重复的将返回多个值
data1 = ts['20150101']
print("data1 = \n{0} \ntype(data1) = {1}".format(data1, type(data1)))
print('-' * 50)
data2 = ts['20150104']
print("data2 = \n{0} \ntype(data2) = {1}".format(data2, type(data2)))
print('-' * 200)
# 通过groupby做分组,重复的值这里用平均值处理
data3 = ts.groupby(level=0).mean()
print("data3 = \n{0} \ntype(data3) = {1}".format(data3, type(data3)))
print('-' * 200)
打印结果:
ts =
2015-01-01 0.488589
2015-01-02 0.621012
2015-01-03 0.657300
2015-01-04 0.164756
2015-01-01 0.078192
2015-01-02 0.899275
dtype: float64
--------------------------------------------------
ts.is_unique = True, ts.index.is_unique = False
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
data1 =
2015-01-01 0.488589
2015-01-01 0.078192
dtype: float64
type(data1) = <class 'pandas.core.series.Series'>
--------------------------------------------------
data2 =
2015-01-04 0.164756
dtype: float64
type(data2) = <class 'pandas.core.series.Series'>
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
data3 =
2015-01-01 0.283390
2015-01-02 0.760143
2015-01-03 0.657300
2015-01-04 0.164756
dtype: float64
type(data3) = <class 'pandas.core.series.Series'>
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Process finished with exit code 0