Pandas的时间序列数据(26)

Pandas的时间序列数据-datetime

(本章节内容较多,右上角有目录导航,可参考部分内容定位)

时间序列数据在金融、经济、神经科学、物理学里都是一种重要的结构化的数据表现形式,以时间为基本组织领域内的观测值并进行相应的分析,即时间序列分析的主要目的是根据已有的历史数据对未来进行预测。经济数据中大多数以时间序列的形式给出。根据观察时间的不同,时间序列中的时间可以是年份、季度、月份或其他任何时间形式。pandas 最基本的时间序列类型就是以时间戳(TimeStamp)为 index 元素的 Series 类型。 Python和Pandas里提供大量的内建工具、模块可以用来创建时间序列类型的数据。

  • datetime模块,Python的datetime标准模块下的1).date子类可以创建日期时间序列的数据、2).time子类可创建小时分时间数据,而3).子类datetime则可以描述日期小时分数据。
import datetime
cur = datetime.datetime(2018,12,30, 15,30,59)
print cur,type(cur)
d = datetime.date(2018,12,30)
print d
t = datetime.datetime(2018,12,30).now()
print t

程序的执行结果:

2018-12-30 15:30:59 <type 'datetime.datetime'>
2018-12-30
2018-12-16 15:35:42.757826

4).可以使用datetime的timedelta模块给出时间间隔(差)。

import datetime
cur0 = datetime.datetime(2018,12,30, 15,30,59)
print cur0
cur1 = cur0 + datetime.timedelta(days = 1)
print cur1
cur2 = cur0 + datetime.timedelta(minutes = 10)
print cur2
cur3 = cur0 + datetime.timedelta(minutes = 29,seconds = 1)
print cur3

程序执行结果:

2018-12-30 15:30:59 #cur0
2018-12-31 15:30:59 #cur1
2018-12-30 15:40:59 #cur2
2018-12-30 16:00:00 #cur3
  • 用datetime数据创建time series时间序列数据。意思就是用datetime创建的时间作为index。
from datetime import datetime, timedelta
import numpy as np
import pandas as pd
b = datetime(2018,12,16, 17,30,55)
vi = np.random.randn(60)
ind = []
for x in range(60):
    bi = b + timedelta(minutes = x)
    ind.append(bi)
ts = pd.Series(vi, index = ind)
print ts[:5]

程序执行结果:

2018-12-16 17:30:55   -1.469098
2018-12-16 17:31:55   -0.583046
2018-12-16 17:32:55   -0.775167
2018-12-16 17:33:55   -0.740570
2018-12-16 17:34:55   -0.287118
dtype: float64

结果的第一列是时间,间隔1分钟,第2列是数据值。语句ts = pd.Series(vi, index = ind)

 

Pandas的时间序列数据-Timestamp创建

在pandas里可以使用pandas.tslib.Timestamp类来实现时间序列,本章就Timestamp进行展开,了解该类的基本使用。

from pandas.tslib import Timestamp
cur0 = Timestamp("2018-12-26 17:30:36")
print cur0
cur0 = Timestamp("17:30:36")
print cur0

程序执行结果:

2018-12-26 17:30:36
2018-12-16 17:30:36

使用pandas的timedelta模块实现时间的间隔。

from pandas.tslib import Timestamp
cur0 = Timestamp("2018-12-26 17:30:36")
print cur0
cur0 = Timestamp("17:30:36")
print cur0
import pandas as pd
from datetime import datetime
cur1 = cur0 + pd.Timedelta(days = 1)
print cur1
cur2 = datetime(2018,12,16,17,30, 36) + pd.Timedelta(days = 1)
print cur2

程序执行结果:

2018-12-26 17:30:36 # cur0
2018-12-16 17:30:36 # cur0
2018-12-17 17:30:36 # cur1
2018-12-17 17:30:36 # cur2

利用pandas的timedelta构造时间序列数据:

import numpy as np
import pandas as pd
b = datetime(2018,12,16, 17,30,55)
vi = np.random.randn(60)
ind = []
for x in range(60):
    bi = b + pd.Timedelta(minutes = x)
    ind.append(bi)
ts = pd.Series(vi, index = ind)
print ts[:5]

程序的执行结果:

2018-12-16 17:30:55   -0.816316
2018-12-16 17:31:55   -0.914680
2018-12-16 17:32:55   -0.304760
2018-12-16 17:33:55   -1.339267
2018-12-16 17:34:55    1.578459
dtype: float64

Pandas的时间序列数据-date_range函数

在pandas里可以使用date_range函数产生时间集合,即一系列的时间。有点儿像range函数,但是形参不是整数而是时间。

  • freq设置一定的时间间隔。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16', '2019-01-01', freq = "2D")
print cur0
cur1 = pd.date_range('12/16/2018', '2019-01-01', freq = "W")
print cur1
cur2 = pd.date_range('2018-12-16 17:30:30', '2019-01-01', freq = "6H")
print cur2
cur3 = pd.date_range('2018-12-16', '2019-08-01', freq = "M")
print cur3
cur4 = pd.date_range('2010-12-16', '2019-01-01', freq = "Y")
print cur4
cur5 = pd.date_range('2010', '2019', freq = "AS")
print cur5

程序的执行结果:

DatetimeIndex(['2018-12-16', '2018-12-18', '2018-12-20', '2018-12-22',
               '2018-12-24', '2018-12-26', '2018-12-28', '2018-12-30',
               '2019-01-01'],
              dtype='datetime64[ns]', freq='2D')
DatetimeIndex(['2018-12-16', '2018-12-23', '2018-12-30'], dtype='datetime64[ns]', freq='W-SUN')
DatetimeIndex(['2018-12-16 17:30:30', '2018-12-16 23:30:30',
               '2018-12-17 05:30:30', '2018-12-17 11:30:30',
               '2018-12-17 17:30:30', '2018-12-17 23:30:30',
               '2018-12-18 05:30:30', '2018-12-18 11:30:30',
               '2018-12-18 17:30:30', '2018-12-18 23:30:30'],
              dtype='datetime64[ns]', freq='6H')
DatetimeIndex(['2018-12-31', '2019-01-31', '2019-02-28', '2019-03-31',
               '2019-04-30', '2019-05-31', '2019-06-30', '2019-07-31'],
              dtype='datetime64[ns]', freq='M')
DatetimeIndex(['2010-12-31', '2011-12-31', '2012-12-31', '2013-12-31',
               '2014-12-31', '2015-12-31', '2016-12-31', '2017-12-31',
               '2018-12-31'],
              dtype='datetime64[ns]', freq='A-DEC')
DatetimeIndex(['2010-01-01', '2011-01-01', '2012-01-01', '2013-01-01',
               '2014-01-01', '2015-01-01', '2016-01-01', '2017-01-01',
               '2018-01-01', '2019-01-01'],
              dtype='datetime64[ns]', freq='AS-JAN')

freq="2D"是间隔两天,freq='6H'则为间隔6小时,freq='M'间隔以月为单位。更多的date_range函数的freq参数,常用的参考参数值如下表

AliasDescription
Bbusiness day frequency
Ccustom business day frequency
Dcalendar day frequency
Wweekly frequency
Mmonth end frequency
SMsemi-month end frequency (15th and end of month)
BMbusiness month end frequency
CBMcustom business month end frequency
MSmonth start frequency
SMSsemi-month start frequency (1st and 15th)
BMSbusiness month start frequency
CBMScustom business month start frequency
Qquarter end frequency
BQbusiness quarter end frequency
QSquarter start frequency
BQSbusiness quarter start frequency
A, Yyear end frequency
BA, BYbusiness year end frequency
AS, YSyear start frequency
BAS, BYSbusiness year start frequency
BHbusiness hour frequency
Hhourly frequency
T, minminutely frequency
Ssecondly frequency
L, msmilliseconds
U, usmicroseconds
Nnanoseconds

表里的T是分钟,而B则是工作日的意思。接下来可以借助date_range来创建一个时间序列。

import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16', '2019-02-05', freq = "B")
#print cur0, len(cur0)
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts[:14]

程序执行结果:

2018-12-17    0.128278
2018-12-18   -0.128049
2018-12-19    0.872805
2018-12-20   -0.809540
2018-12-21   -0.104894
2018-12-24    0.720047
2018-12-25    0.965698
2018-12-26    0.926640
2018-12-27   -1.505794
2018-12-28    0.246031
2018-12-31   -0.536505
2019-01-01    1.609414
2019-01-02    0.459005
2019-01-03    0.347774
Freq: B, dtype: float64

从结果第一列可以看出周六、周日时间不存在,freq = "B"只产生工作日的时间。

下面的例子是产生都是周几的时间。

import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16', '2019-02-05', freq = "W-WED")
print cur0

程序执行结果:

DatetimeIndex(['2018-12-19', '2018-12-26', '2019-01-02', '2019-01-09',
               '2019-01-16', '2019-01-23', '2019-01-30'],
              dtype='datetime64[ns]', freq='W-WED')
  • period设置时间的个数。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='2h20min')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

执行结果:

2018-12-16 18:30:34   -0.289575
2018-12-16 20:50:34   -0.782106
2018-12-16 23:10:34    0.152276
2018-12-17 01:30:34   -0.661511
2018-12-17 03:50:34   -1.676650
Freq: 140T, dtype: float64

 

Pandas的时间序列数据-date_range参数详解

  • freq = "T",按分钟为间隔(频率)产生时间序列,等价于"min"。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='T')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='min')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-16 18:30:34    0.489893
2018-12-16 18:31:34    0.000442
2018-12-16 18:32:34   -0.465273
2018-12-16 18:33:34   -0.173814
2018-12-16 18:34:34   -0.603672
Freq: T, dtype: float64
2018-12-16 18:30:34    0.690540
2018-12-16 18:31:34   -0.815213
2018-12-16 18:32:34    0.460163
2018-12-16 18:33:34    1.515437
2018-12-16 18:34:34   -0.832920
Freq: T, dtype: float64
  • freq = "S",则是以秒为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='3T10S')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-16 18:30:34   -1.078270
2018-12-16 18:33:44   -0.120087
2018-12-16 18:36:54    1.863152
2018-12-16 18:40:04   -0.601866
2018-12-16 18:43:14    0.881057
Freq: 190S, dtype: float64

这里的时间间隔频率为3分10秒。

  • freq = "H",则是以小时为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='2H')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-16 18:30:34   -0.182473
2018-12-16 20:30:34    1.037907
2018-12-16 22:30:34   -0.175579
2018-12-17 00:30:34   -0.586400
2018-12-17 02:30:34   -0.334369
Freq: 2H, dtype: float64

从结果可看出时间序列前后相差2小时。

  • freq = "B",则是以工作日为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods = 10, freq='B')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-17 18:30:34    0.011285
2018-12-18 18:30:34    0.972737
2018-12-19 18:30:34    0.109900
2018-12-20 18:30:34   -0.969465
2018-12-21 18:30:34   -0.885282
2018-12-24 18:30:34   -1.722596
2018-12-25 18:30:34    0.678189
2018-12-26 18:30:34    0.402022
2018-12-27 18:30:34   -0.740186
2018-12-28 18:30:34    1.302828
Freq: B, dtype: float64

22、23日为周六、周日结果里缺少。

  • freq = "D",则是以日为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods = 10, freq='2D')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-16 18:30:34    0.327716
2018-12-18 18:30:34    0.784813
2018-12-20 18:30:34    1.432993
2018-12-22 18:30:34    1.148707
2018-12-24 18:30:34    0.996547
2018-12-26 18:30:34   -0.210021
2018-12-28 18:30:34   -0.175977
2018-12-30 18:30:34    0.473569
2019-01-01 18:30:34    0.642001
2019-01-03 18:30:34    0.675140
Freq: 2D, dtype: float64

结果里的日期时间序列是日在发生变化,相差2天。

  • freq = "W",则是以周为频率产生时间序列,默认以周日为起点来构造即"W-SUN"。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W-SUN')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-16 18:30:34    0.557365
2018-12-23 18:30:34   -0.306496
2018-12-30 18:30:34   -1.172465
2019-01-06 18:30:34    0.434073
2019-01-13 18:30:34    0.106500
2019-01-20 18:30:34    0.773861
2019-01-27 18:30:34   -0.236211
2019-02-03 18:30:34   -0.303260
2019-02-10 18:30:34    0.974439
2019-02-17 18:30:34   -0.356273
Freq: W-SUN, dtype: float64
2018-12-16 18:30:34    0.180012
2018-12-23 18:30:34   -0.977006
2018-12-30 18:30:34    0.095408
2019-01-06 18:30:34   -0.097709
2019-01-13 18:30:34   -0.401469
2019-01-20 18:30:34   -0.283461
2019-01-27 18:30:34   -1.138246
2019-02-03 18:30:34   -1.675089
2019-02-10 18:30:34    0.511324
2019-02-17 18:30:34    0.728807
Freq: W-SUN, dtype: float64

时间的起点是2018-12-15周六,产生的结果第一条是2018-12-16周日,每条时间相差7天,共10条记录(periods = 10)。

import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W-Fri')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-16 18:30:34   -1.133046
2018-12-23 18:30:34   -1.083898
2018-12-30 18:30:34   -1.503690
2019-01-06 18:30:34   -0.866094
2019-01-13 18:30:34   -0.945356
2019-01-20 18:30:34    0.021928
2019-01-27 18:30:34   -0.591696
2019-02-03 18:30:34   -1.710630
2019-02-10 18:30:34    2.121283
2019-02-17 18:30:34    0.739256
Freq: W-SUN, dtype: float64
2018-12-21 18:30:34    2.082080
2018-12-28 18:30:34    1.368807
2019-01-04 18:30:34    0.599276
2019-01-11 18:30:34   -0.149521
2019-01-18 18:30:34    1.134686
2019-01-25 18:30:34   -0.582935
2019-02-01 18:30:34   -0.470655
2019-02-08 18:30:34    0.983203
2019-02-15 18:30:34   -0.067618
2019-02-22 18:30:34   -0.736081
Freq: W-FRI, dtype: float64

语句cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W-Fri')则是从2018-12-15(周六)开始产生都是星期五的时间序列,共10个时间,2018-12-15后的第一个星期五是2018-12-21,第二个周五则是2018-12-28。因此"W-FRI"则是产生每周几这样的一个时间序列。

  • freq = "M",则是以月为频率产生时间序列,以月末为时间点,而freq = "MS"则是以月初为时间点。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='M')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='MS')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-31 18:30:34    2.844877
2019-01-31 18:30:34   -0.405763
2019-02-28 18:30:34    1.048116
2019-03-31 18:30:34   -0.353364
2019-04-30 18:30:34    1.146974
2019-05-31 18:30:34   -2.594504
2019-06-30 18:30:34    1.149964
2019-07-31 18:30:34    0.152655
2019-08-31 18:30:34    0.456799
2019-09-30 18:30:34    0.356193
Freq: M, dtype: float64
2019-01-01 18:30:34   -0.410882
2019-02-01 18:30:34   -1.349693
2019-03-01 18:30:34    0.363404
2019-04-01 18:30:34    0.352792
2019-05-01 18:30:34    0.334477
2019-06-01 18:30:34    0.181288
2019-07-01 18:30:34   -0.936703
2019-08-01 18:30:34   -0.512834
2019-09-01 18:30:34   -0.243987
2019-10-01 18:30:34    0.727383
Freq: MS, dtype: float64

2018-12-15后的第一个月末日期为2018-12-31,第一个月初为2019-01-01

  • freq = "BM",则是以月末工作日为频率产生时间序列,但不是每月的最后一天。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='BM')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='BMS')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-31 18:30:34    0.338989
2019-01-31 18:30:34   -0.074689
2019-02-28 18:30:34   -1.309663
2019-03-29 18:30:34    0.139394
2019-04-30 18:30:34   -0.519024
2019-05-31 18:30:34    0.573932
2019-06-28 18:30:34    0.551329
2019-07-31 18:30:34   -0.849871
2019-08-30 18:30:34   -0.685058
2019-09-30 18:30:34   -0.160009
Freq: BM, dtype: float64
2019-01-01 18:30:34    0.499660
2019-02-01 18:30:34   -0.912324
2019-03-01 18:30:34    0.412629
2019-04-01 18:30:34    1.222422
2019-05-01 18:30:34   -0.618880
2019-06-03 18:30:34    0.132562
2019-07-01 18:30:34    0.721672
2019-08-01 18:30:34   -1.086498
2019-09-02 18:30:34   -1.670070
2019-10-01 18:30:34   -2.165835
Freq: BMS, dtype: float64

注意2019-03-29不是3月的最后一天,2019-03-302019-03-31非工作日。 而2019-06-03也非6月第一天,但是工作日,而2019-06-012019-06-02为休息日。

  • freq = "Q",则是以季度(末)为频率产生时间序列,freq = "QS"是以季度(初)。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='q')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='qs')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-31 18:30:34    0.364439
2019-03-31 18:30:34   -0.295537
2019-06-30 18:30:34    0.562707
2019-09-30 18:30:34   -0.226738
2019-12-31 18:30:34    0.623051
2020-03-31 18:30:34   -0.675792
2020-06-30 18:30:34   -0.848371
2020-09-30 18:30:34   -0.805518
2020-12-31 18:30:34   -0.061498
2021-03-31 18:30:34    0.291014
Freq: Q-DEC, dtype: float64
2019-01-01 18:30:34   -0.236873
2019-04-01 18:30:34   -1.399436
2019-07-01 18:30:34    1.011018
2019-10-01 18:30:34    1.254754
2020-01-01 18:30:34   -0.569184
2020-04-01 18:30:34   -1.480181
2020-07-01 18:30:34   -0.396710
2020-10-01 18:30:34    1.157218
2021-01-01 18:30:34   -0.119259
2021-04-01 18:30:34    0.773836
Freq: QS-JAN, dtype: float64

当然Q也可以和B组合,像之前的M一样。

  • freq = "A",则是以年(末)为频率产生时间序列,freq = "AS"则是年初。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='a')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='as')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-31 18:30:34   -0.058588
2019-12-31 18:30:34   -0.676757
2020-12-31 18:30:34   -0.368606
2021-12-31 18:30:34   -0.820318
2022-12-31 18:30:34    0.959945
2023-12-31 18:30:34   -0.144216
2024-12-31 18:30:34    0.827481
2025-12-31 18:30:34    1.812374
2026-12-31 18:30:34   -1.473202
2027-12-31 18:30:34   -1.633083
Freq: A-DEC, dtype: float64
2019-01-01 18:30:34   -0.037793
2020-01-01 18:30:34    1.067194
2021-01-01 18:30:34   -1.517820
2022-01-01 18:30:34   -0.101716
2023-01-01 18:30:34    0.413106
2024-01-01 18:30:34   -0.912453
2025-01-01 18:30:34    0.197084
2026-01-01 18:30:34   -0.513032
2027-01-01 18:30:34   -0.027010
2028-01-01 18:30:34   -0.263569
Freq: AS-JAN, dtype: float64

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值