时间序列的操作基础
引入相关库
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
引入datetime库
from datetime import datetime
创建datetime对象
创建一个datetime,传入年月日参数
t1=datetime(2009,10,20)
t1
datetime.datetime(2009, 10, 20, 0, 0)
创建多个datetime对象,传入相关参数
date_list=[
datetime(2016,9,1),
datetime(2016,9,10),
datetime(2017,9,1),
datetime(2017,9,20),
datetime(2017,10,1)
]
date_list
[datetime.datetime(2016, 9, 1, 0, 0),
datetime.datetime(2016, 9, 10, 0, 0),
datetime.datetime(2017, 9, 1, 0, 0),
datetime.datetime(2017, 9, 20, 0, 0),
datetime.datetime(2017, 10, 1, 0, 0)]
通过datetime数据类型创建一个Series,index为date_list对象
s1=Series(np.random.rand(5),index=date_list)
s1
2016-09-01 0.886547
2016-09-10 0.642827
2017-09-01 0.926886
2017-09-20 0.187911
2017-10-01 0.277650
dtype: float64
s1.values
array([0.88654715, 0.64282712, 0.92688552, 0.18791143, 0.27764988])
s1.index
DatetimeIndex(['2016-09-01', '2016-09-10', '2017-09-01', '2017-09-20',
'2017-10-01'],
dtype='datetime64[ns]', freq=None)
访问datetime元素
访问第一个元素
s1[1]
0.6428271197631065
通过datetime传入要访问的对象
s1[datetime(2016,9,10)]
0.6428271197631065
直接传入日期参数
s1['2016-9-10']
0.6428271197631065
简化的传入日期参数方法
s1['20160910']
0.6428271197631065
2016年9月有两个数据,简写如果只填入年份和月份不写日期,会报错
s1['201609']
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
I:\anaconda\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
4410 try:
-> 4411 return libindex.get_value_at(s, key)
4412 except IndexError:
pandas\_libs\index.pyx in pandas._libs.index.get_value_at()
pandas\_libs\index.pyx in pandas._libs.index.get_value_at()
pandas\_libs\util.pxd in pandas._libs.util.get_value_at()
pandas\_libs\util.pxd in pandas._libs.util.validate_indexer()
TypeError: 'str' object cannot be interpreted as an integer
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
I:\anaconda\lib\site-packages\pandas\core\indexes\datetimes.py in get_value(self, series, key)
650 try:
--> 651 value = Index.get_value(self, series, key)
652 except KeyError:
I:\anaconda\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
4418 else:
-> 4419 raise e1
4420 except Exception:
I:\anaconda\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
4404 try:
-> 4405 return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
4406 except KeyError as e1:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.DatetimeEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.DatetimeEngine._date_check_type()
KeyError: '201609'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
I:\anaconda\lib\site-packages\dateutil\parser\_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
654 try:
--> 655 ret = self._build_naive(res, default)
656 except ValueError as e:
I:\anaconda\lib\site-packages\dateutil\parser\_parser.py in _build_naive(self, res, default)
1240
-> 1241 naive = default.replace(**repl)
1242
ValueError: month must be in 1..12
The above exception was the direct cause of the following exception:
ParserError Traceback (most recent call last)
pandas\_libs\tslibs\conversion.pyx in pandas._libs.tslibs.conversion.convert_str_to_tsobject()
pandas\_libs\tslibs\parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string()
I:\anaconda\lib\site-packages\dateutil\parser\_parser.py in parse(timestr, parserinfo, **kwargs)
1373 else:
-> 1374 return DEFAULTPARSER.parse(timestr, **kwargs)
1375
I:\anaconda\lib\site-packages\dateutil\parser\_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
656 except ValueError as e:
--> 657 six.raise_from(ParserError(e.args[0] + ": %s", timestr), e)
658
I:\anaconda\lib\site-packages\six.py in raise_from(value, from_value)
ParserError: month must be in 1..12: 201609
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
I:\anaconda\lib\site-packages\pandas\core\indexes\datetimes.py in get_value(self, series, key)
659 try:
--> 660 return self.get_value_maybe_box(series, key)
661 except (TypeError, ValueError, KeyError):
I:\anaconda\lib\site-packages\pandas\core\indexes\datetimes.py in get_value_maybe_box(self, series, key)
674 elif not isinstance(key, Timestamp):
--> 675 key = Timestamp(key)
676 values = self._engine.get_value(com.values_from_object(series), key, tz=self.tz)
pandas\_libs\tslibs\timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()
pandas\_libs\tslibs\conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()
pandas\_libs\tslibs\conversion.pyx in pandas._libs.tslibs.conversion.convert_str_to_tsobject()
ValueError: could not convert string to Timestamp
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-16-190579de8e62> in <module>
----> 1 s1['201609']
I:\anaconda\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
869 key = com.apply_if_callable(key, self)
870 try:
--> 871 result = self.index.get_value(self, key)
872
873 if not is_scalar(result):
I:\anaconda\lib\site-packages\pandas\core\indexes\datetimes.py in get_value(self, series, key)
660 return self.get_value_maybe_box(series, key)
661 except (TypeError, ValueError, KeyError):
--> 662 raise KeyError(key)
663 else:
664 return com.maybe_box(self, value, series, key)
KeyError: '201609'
通过这种方法才能返回2016年9月的数据
s1['2016-09']
2016-09-01 0.886547
2016-09-10 0.642827
dtype: float64
返回2017年9月的数据
s1['2017-09']
2017-09-01 0.926886
2017-09-20 0.187911
dtype: float64
返回2016年的所有数据
s1['2016']
2016-09-01 0.886547
2016-09-10 0.642827
dtype: float64
返回2017年的所有数据
s1['2017']
2017-09-01 0.926886
2017-09-20 0.187911
2017-10-01 0.277650
dtype: float64
s1
2016-09-01 0.886547
2016-09-10 0.642827
2017-09-01 0.926886
2017-09-20 0.187911
2017-10-01 0.277650
dtype: float64
通过date_range来产生一段时间范围以内的datetime
start 和end表示时间的起始,periods表示时间间隔,freqs表示步长默认为d
起始时间为2016-01-01后的100天
date_list_new=pd.date_range('2016-01-01', periods=100)
date_list_new
DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
'2016-01-05', '2016-01-06', '2016-01-07', '2016-01-08',
'2016-01-09', '2016-01-10', '2016-01-11', '2016-01-12',
'2016-01-13', '2016-01-14', '2016-01-15', '2016-01-16',
'2016-01-17', '2016-01-18', '2016-01-19', '2016-01-20',
'2016-01-21', '2016-01-22', '2016-01-23', '2016-01-24',
'2016-01-25', '2016-01-26', '2016-01-27', '2016-01-28',
'2016-01-29', '2016-01-30', '2016-01-31', '2016-02-01',
'2016-02-02', '2016-02-03', '2016-02-04', '2016-02-05',
'2016-02-06', '2016-02-07', '2016-02-08', '2016-02-09',
'2016-02-10', '2016-02-11', '2016-02-12', '2016-02-13',
'2016-02-14', '2016-02-15', '2016-02-16', '2016-02-17',
'2016-02-18', '2016-02-19', '2016-02-20', '2016-02-21',
'2016-02-22', '2016-02-23', '2016-02-24', '2016-02-25',
'2016-02-26', '2016-02-27', '2016-02-28', '2016-02-29',
'2016-03-01', '2016-03-02', '2016-03-03', '2016-03-04',
'2016-03-05', '2016-03-06', '2016-03-07', '2016-03-08',
'2016-03-09', '2016-03-10', '2016-03-11', '2016-03-12',
'2016-03-13', '2016-03-14', '2016-03-15', '2016-03-16',
'2016-03-17', '2016-03-18', '2016-03-19', '2016-03-20',
'2016-03-21', '2016-03-22', '2016-03-23', '2016-03-24',
'2016-03-25', '2016-03-26', '2016-03-27', '2016-03-28',
'2016-03-29', '2016-03-30', '2016-03-31', '2016-04-01',
'2016-04-02', '2016-04-03', '2016-04-04', '2016-04-05',
'2016-04-06', '2016-04-07', '2016-04-08', '2016-04-09'],
dtype='datetime64[ns]', freq='D')
如果把freq改为周,会发现从01-03开始,因为一周的开始时间为周日,2016-01-03为周日
date_list_new=pd.date_range('2016-01-01', periods=100,freq='w')
date_list_new
DatetimeIndex(['2016-01-03', '2016-01-10', '2016-01-17', '2016-01-24',
'2016-01-31', '2016-02-07', '2016-02-14', '2016-02-21',
'2016-02-28', '2016-03-06', '2016-03-13', '2016-03-20',
'2016-03-27', '2016-04-03', '2016-04-10', '2016-04-17',
'2016-04-24', '2016-05-01', '2016-05-08', '2016-05-15',
'2016-05-22', '2016-05-29', '2016-06-05', '2016-06-12',
'2016-06-19', '2016-06-26', '2016-07-03', '2016-07-10',
'2016-07-17', '2016-07-24', '2016-07-31', '2016-08-07',
'2016-08-14', '2016-08-21', '2016-08-28', '2016-09-04',
'2016-09-11', '2016-09-18', '2016-09-25', '2016-10-02',
'2016-10-09', '2016-10-16', '2016-10-23', '2016-10-30',
'2016-11-06', '2016-11-13', '2016-11-20', '2016-11-27',
'2016-12-04', '2016-12-11', '2016-12-18', '2016-12-25',
'2017-01-01', '2017-01-08', '2017-01-15', '2017-01-22',
'2017-01-29', '2017-02-05', '2017-02-12', '2017-02-19',
'2017-02-26', '2017-03-05', '2017-03-12', '2017-03-19',
'2017-03-26', '2017-04-02', '2017-04-09', '2017-04-16',
'2017-04-23', '2017-04-30', '2017-05-07', '2017-05-14',
'2017-05-21', '2017-05-28', '2017-06-04', '2017-06-11',
'2017-06-18', '2017-06-25', '2017-07-02', '2017-07-09',
'2017-07-16', '2017-07-23', '2017-07-30', '2017-08-06',
'2017-08-13', '2017-08-20', '2017-08-27', '2017-09-03',
'2017-09-10', '2017-09-17', '2017-09-24', '2017-10-01',
'2017-10-08', '2017-10-15', '2017-10-22', '2017-10-29',
'2017-11-05', '2017-11-12', '2017-11-19', '2017-11-26'],
dtype='datetime64[ns]', freq='W-SUN')
把freq的参数改为‘w-mon’即变为从周一开始
date_list_new=pd.date_range('2016-01-01', periods=100,freq='w-mon')
date_list_new
DatetimeIndex(['2016-01-04', '2016-01-11', '2016-01-18', '2016-01-25',
'2016-02-01', '2016-02-08', '2016-02-15', '2016-02-22',
'2016-02-29', '2016-03-07', '2016-03-14', '2016-03-21',
'2016-03-28', '2016-04-04', '2016-04-11', '2016-04-18',
'2016-04-25', '2016-05-02', '2016-05-09', '2016-05-16',
'2016-05-23', '2016-05-30', '2016-06-06', '2016-06-13',
'2016-06-20', '2016-06-27', '2016-07-04', '2016-07-11',
'2016-07-18', '2016-07-25', '2016-08-01', '2016-08-08',
'2016-08-15', '2016-08-22', '2016-08-29', '2016-09-05',
'2016-09-12', '2016-09-19', '2016-09-26', '2016-10-03',
'2016-10-10', '2016-10-17', '2016-10-24', '2016-10-31',
'2016-11-07', '2016-11-14', '2016-11-21', '2016-11-28',
'2016-12-05', '2016-12-12', '2016-12-19', '2016-12-26',
'2017-01-02', '2017-01-09', '2017-01-16', '2017-01-23',
'2017-01-30', '2017-02-06', '2017-02-13', '2017-02-20',
'2017-02-27', '2017-03-06', '2017-03-13', '2017-03-20',
'2017-03-27', '2017-04-03', '2017-04-10', '2017-04-17',
'2017-04-24', '2017-05-01', '2017-05-08', '2017-05-15',
'2017-05-22', '2017-05-29', '2017-06-05', '2017-06-12',
'2017-06-19', '2017-06-26', '2017-07-03', '2017-07-10',
'2017-07-17', '2017-07-24', '2017-07-31', '2017-08-07',
'2017-08-14', '2017-08-21', '2017-08-28', '2017-09-04',
'2017-09-11', '2017-09-18', '2017-09-25', '2017-10-02',
'2017-10-09', '2017-10-16', '2017-10-23', '2017-10-30',
'2017-11-06', '2017-11-13', '2017-11-20', '2017-11-27'],
dtype='datetime64[ns]', freq='W-MON')
freq传入’h’,会以每小时为间隔
date_list_new=pd.date_range('2016-01-01', periods=100,freq='h')
date_list_new
DatetimeIndex(['2016-01-01 00:00:00', '2016-01-01 01:00:00',
'2016-01-01 02:00:00', '2016-01-01 03:00:00',
'2016-01-01 04:00:00', '2016-01-01 05:00:00',
'2016-01-01 06:00:00', '2016-01-01 07:00:00',
'2016-01-01 08:00:00', '2016-01-01 09:00:00',
'2016-01-01 10:00:00', '2016-01-01 11:00:00',
'2016-01-01 12:00:00', '2016-01-01 13:00:00',
'2016-01-01 14:00:00', '2016-01-01 15:00:00',
'2016-01-01 16:00:00', '2016-01-01 17:00:00',
'2016-01-01 18:00:00', '2016-01-01 19:00:00',
'2016-01-01 20:00:00', '2016-01-01 21:00:00',
'2016-01-01 22:00:00', '2016-01-01 23:00:00',
'2016-01-02 00:00:00', '2016-01-02 01:00:00',
'2016-01-02 02:00:00', '2016-01-02 03:00:00',
'2016-01-02 04:00:00', '2016-01-02 05:00:00',
'2016-01-02 06:00:00', '2016-01-02 07:00:00',
'2016-01-02 08:00:00', '2016-01-02 09:00:00',
'2016-01-02 10:00:00', '2016-01-02 11:00:00',
'2016-01-02 12:00:00', '2016-01-02 13:00:00',
'2016-01-02 14:00:00', '2016-01-02 15:00:00',
'2016-01-02 16:00:00', '2016-01-02 17:00:00',
'2016-01-02 18:00:00', '2016-01-02 19:00:00',
'2016-01-02 20:00:00', '2016-01-02 21:00:00',
'2016-01-02 22:00:00', '2016-01-02 23:00:00',
'2016-01-03 00:00:00', '2016-01-03 01:00:00',
'2016-01-03 02:00:00', '2016-01-03 03:00:00',
'2016-01-03 04:00:00', '2016-01-03 05:00:00',
'2016-01-03 06:00:00', '2016-01-03 07:00:00',
'2016-01-03 08:00:00', '2016-01-03 09:00:00',
'2016-01-03 10:00:00', '2016-01-03 11:00:00',
'2016-01-03 12:00:00', '2016-01-03 13:00:00',
'2016-01-03 14:00:00', '2016-01-03 15:00:00',
'2016-01-03 16:00:00', '2016-01-03 17:00:00',
'2016-01-03 18:00:00', '2016-01-03 19:00:00',
'2016-01-03 20:00:00', '2016-01-03 21:00:00',
'2016-01-03 22:00:00', '2016-01-03 23:00:00',
'2016-01-04 00:00:00', '2016-01-04 01:00:00',
'2016-01-04 02:00:00', '2016-01-04 03:00:00',
'2016-01-04 04:00:00', '2016-01-04 05:00:00',
'2016-01-04 06:00:00', '2016-01-04 07:00:00',
'2016-01-04 08:00:00', '2016-01-04 09:00:00',
'2016-01-04 10:00:00', '2016-01-04 11:00:00',
'2016-01-04 12:00:00', '2016-01-04 13:00:00',
'2016-01-04 14:00:00', '2016-01-04 15:00:00',
'2016-01-04 16:00:00', '2016-01-04 17:00:00',
'2016-01-04 18:00:00', '2016-01-04 19:00:00',
'2016-01-04 20:00:00', '2016-01-04 21:00:00',
'2016-01-04 22:00:00', '2016-01-04 23:00:00',
'2016-01-05 00:00:00', '2016-01-05 01:00:00',
'2016-01-05 02:00:00', '2016-01-05 03:00:00'],
dtype='datetime64[ns]', freq='H')
freq传入’5h’,会以每5小时为间隔
date_list_new=pd.date_range('2016-01-01', periods=100,freq='5h')
date_list_new
DatetimeIndex(['2016-01-01 00:00:00', '2016-01-01 05:00:00',
'2016-01-01 10:00:00', '2016-01-01 15:00:00',
'2016-01-01 20:00:00', '2016-01-02 01:00:00',
'2016-01-02 06:00:00', '2016-01-02 11:00:00',
'2016-01-02 16:00:00', '2016-01-02 21:00:00',
'2016-01-03 02:00:00', '2016-01-03 07:00:00',
'2016-01-03 12:00:00', '2016-01-03 17:00:00',
'2016-01-03 22:00:00', '2016-01-04 03:00:00',
'2016-01-04 08:00:00', '2016-01-04 13:00:00',
'2016-01-04 18:00:00', '2016-01-04 23:00:00',
'2016-01-05 04:00:00', '2016-01-05 09:00:00',
'2016-01-05 14:00:00', '2016-01-05 19:00:00',
'2016-01-06 00:00:00', '2016-01-06 05:00:00',
'2016-01-06 10:00:00', '2016-01-06 15:00:00',
'2016-01-06 20:00:00', '2016-01-07 01:00:00',
'2016-01-07 06:00:00', '2016-01-07 11:00:00',
'2016-01-07 16:00:00', '2016-01-07 21:00:00',
'2016-01-08 02:00:00', '2016-01-08 07:00:00',
'2016-01-08 12:00:00', '2016-01-08 17:00:00',
'2016-01-08 22:00:00', '2016-01-09 03:00:00',
'2016-01-09 08:00:00', '2016-01-09 13:00:00',
'2016-01-09 18:00:00', '2016-01-09 23:00:00',
'2016-01-10 04:00:00', '2016-01-10 09:00:00',
'2016-01-10 14:00:00', '2016-01-10 19:00:00',
'2016-01-11 00:00:00', '2016-01-11 05:00:00',
'2016-01-11 10:00:00', '2016-01-11 15:00:00',
'2016-01-11 20:00:00', '2016-01-12 01:00:00',
'2016-01-12 06:00:00', '2016-01-12 11:00:00',
'2016-01-12 16:00:00', '2016-01-12 21:00:00',
'2016-01-13 02:00:00', '2016-01-13 07:00:00',
'2016-01-13 12:00:00', '2016-01-13 17:00:00',
'2016-01-13 22:00:00', '2016-01-14 03:00:00',
'2016-01-14 08:00:00', '2016-01-14 13:00:00',
'2016-01-14 18:00:00', '2016-01-14 23:00:00',
'2016-01-15 04:00:00', '2016-01-15 09:00:00',
'2016-01-15 14:00:00', '2016-01-15 19:00:00',
'2016-01-16 00:00:00', '2016-01-16 05:00:00',
'2016-01-16 10:00:00', '2016-01-16 15:00:00',
'2016-01-16 20:00:00', '2016-01-17 01:00:00',
'2016-01-17 06:00:00', '2016-01-17 11:00:00',
'2016-01-17 16:00:00', '2016-01-17 21:00:00',
'2016-01-18 02:00:00', '2016-01-18 07:00:00',
'2016-01-18 12:00:00', '2016-01-18 17:00:00',
'2016-01-18 22:00:00', '2016-01-19 03:00:00',
'2016-01-19 08:00:00', '2016-01-19 13:00:00',
'2016-01-19 18:00:00', '2016-01-19 23:00:00',
'2016-01-20 04:00:00', '2016-01-20 09:00:00',
'2016-01-20 14:00:00', '2016-01-20 19:00:00',
'2016-01-21 00:00:00', '2016-01-21 05:00:00',
'2016-01-21 10:00:00', '2016-01-21 15:00:00'],
dtype='datetime64[ns]', freq='5H')
使用date_list_new创建一个Series,产生index是时间序列的一个Series
s2=Series(np.random.rand(100),index=date_list_new)
s2
2016-01-01 00:00:00 0.895959
2016-01-01 05:00:00 0.392156
2016-01-01 10:00:00 0.650885
2016-01-01 15:00:00 0.504900
2016-01-01 20:00:00 0.484126
...
2016-01-20 19:00:00 0.133861
2016-01-21 00:00:00 0.135461
2016-01-21 05:00:00 0.338000
2016-01-21 10:00:00 0.813742
2016-01-21 15:00:00 0.588442
Freq: 5H, Length: 100, dtype: float64