编程字典Pandas教程 http://codingdict.com/article/8270
清华计算机博士带你学-Python金融量化分析 https://www.bilibili.com/video/BV1i741147LS?t
Python标准库处理时间对象datetime
datetime模块中的数据类型
类型 | 说明 |
---|---|
date | 以公历形式存储日历日期(年、月、日) |
time | 将时间存储为时、分、秒、毫秒 |
datetime | 存储日期和时间 |
timedelta | 表示两个datetime值之间的差(日、秒、毫秒) |
标准库函数:
函数 | 说明 |
---|---|
strftime | 日期转换成字符串 |
strptime | 字符串转换成日期 |
strftime
a = datetime.datetime(2017,6,27)
b = a.strftime('%Y-%M-%D')
print(a)
print(b)
输出
2017-06-27 00:00:00
2017-00-06/27/17
strptime
date = '2017-6-26'
datetime2 = datetime.datetime.strptime(date,'%Y-%m-%d')
print(datetime2)
输出
2017-00-06/27/17
2017-06-26 00:00:00
第三方库dateutil
import dateutil
dateutil.parser.parse()函数,不用输入格式化字符串,支持各种各样的时间日期格式,例如“2021-2-14”,“2021/2/14”,“02/03/2012”,“2020/JAN/01”等。
a = dateutil.parser.parse('2001-01-01')
print(a)
a = pd.to_datetime(['2001-01-01','2001/01/02'])
print(a)
输出
2001-01-01 00:00:00
DatetimeIndex(['2001-01-01', '2001-01-02'], dtype='datetime64[ns]', freq=None)
可以把批量的字符串转化成时间日期对象,所以使用dateutil.parser.parse方法生成的时间对象很适合当做pandas数据的索引。
pandas处理成组日期
pandas.date_range方法,可以生成一个范围的日期。
参数 | 说明 |
---|---|
start | 开始时间 |
end | 结束时间 |
period | 时间长度 |
freq | 时间频率,默认为‘D’,可选H(our),W(eek),B(usniss),M(onth),S(econd),A(year) |
例1:‘2021-01-01’到’2021-02-13’
a = pd.date_range('2021-01-01','2021-02-13')
print(a)
输出
DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04',
'2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08',
'2021-01-09', '2021-01-10', '2021-01-11', '2021-01-12',
'2021-01-13', '2021-01-14', '2021-01-15', '2021-01-16',
'2021-01-17', '2021-01-18', '2021-01-19', '2021-01-20',
'2021-01-21', '2021-01-22', '2021-01-23', '2021-01-24',
'2021-01-25', '2021-01-26', '2021-01-27', '2021-01-28',
'2021-01-29', '2021-01-30', '2021-01-31', '2021-02-01',
'2021-02-02', '2021-02-03', '2021-02-04', '2021-02-05',
'2021-02-06', '2021-02-07', '2021-02-08', '2021-02-09',
'2021-02-10', '2021-02-11', '2021-02-12', '2021-02-13'],
dtype='datetime64[ns]', freq='D')
例2:指定长度
a = pd.date_range('2021-01-01',periods=30)
print(a)
输出
DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04',
'2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08',
'2021-01-09', '2021-01-10', '2021-01-11', '2021-01-12',
'2021-01-13', '2021-01-14', '2021-01-15', '2021-01-16',
'2021-01-17', '2021-01-18', '2021-01-19', '2021-01-20',
'2021-01-21', '2021-01-22', '2021-01-23', '2021-01-24',
'2021-01-25', '2021-01-26', '2021-01-27', '2021-01-28',
'2021-01-29', '2021-01-30'],
dtype='datetime64[ns]', freq='D')
例3:指定频率freq
a = pd.date_range('2021-01-01',periods=30,freq='h')#默认每个小时
print(a)
输出:
DatetimeIndex(['2021-01-01 00:00:00', '2021-01-01 01:00:00',
'2021-01-01 02:00:00', '2021-01-01 03:00:00',
'2021-01-01 04:00:00', '2021-01-01 05:00:00',
'2021-01-01 06:00:00', '2021-01-01 07:00:00',
'2021-01-01 08:00:00', '2021-01-01 09:00:00',
'2021-01-01 10:00:00', '2021-01-01 11:00:00',
'2021-01-01 12:00:00', '2021-01-01 13:00:00',
'2021-01-01 14:00:00', '2021-01-01 15:00:00',
'2021-01-01 16:00:00', '2021-01-01 17:00:00',
'2021-01-01 18:00:00', '2021-01-01 19:00:00',
'2021-01-01 20:00:00', '2021-01-01 21:00:00',
'2021-01-01 22:00:00', '2021-01-01 23:00:00',
'2021-01-02 00:00:00', '2021-01-02 01:00:00',
'2021-01-02 02:00:00', '2021-01-02 03:00:00',
'2021-01-02 04:00:00', '2021-01-02 05:00:00'],
dtype='datetime64[ns]', freq='H')
时间序列
sr = pd.Series(np.arange(100),index=pd.date_range('2020-7-1',periods=100))
print(sr)
print(sr.index)
输出
2020-07-01 0
2020-07-02 1
2020-07-03 2
2020-07-04 3
2020-07-05 4
..
2020-10-04 95
2020-10-05 96
2020-10-06 97
2020-10-07 98
2020-10-08 99
Freq: D, Length: 100, dtype: int32
DatetimeIndex(['2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04',
'2020-07-05', '2020-07-06', '2020-07-07', '2020-07-08',
'2020-07-09', '2020-07-10', '2020-07-11', '2020-07-12',
'2020-07-13', '2020-07-14', '2020-07-15', '2020-07-16',
'2020-07-17', '2020-07-18', '2020-07-19', '2020-07-20',
'2020-07-21', '2020-07-22', '2020-07-23', '2020-07-24',
'2020-07-25', '2020-07-26', '2020-07-27', '2020-07-28',
'2020-07-29', '2020-07-30', '2020-07-31', '2020-08-01',
'2020-08-02', '2020-08-03', '2020-08-04', '2020-08-05',
'2020-08-06', '2020-08-07', '2020-08-08', '2020-08-09',
'2020-08-10', '2020-08-11', '2020-08-12', '2020-08-13',
'2020-08-14', '2020-08-15', '2020-08-16', '2020-08-17',
'2020-08-18', '2020-08-19', '2020-08-20', '2020-08-21',
'2020-08-22', '2020-08-23', '2020-08-24', '2020-08-25',
'2020-08-26', '2020-08-27', '2020-08-28', '2020-08-29',
'2020-08-30', '2020-08-31', '2020-09-01', '2020-09-02',
'2020-09-03', '2020-09-04', '2020-09-05', '2020-09-06',
'2020-09-07', '2020-09-08', '2020-09-09', '2020-09-10',
'2020-09-11', '2020-09-12', '2020-09-13', '2020-09-14',
'2020-09-15', '2020-09-16', '2020-09-17', '2020-09-18',
'2020-09-19', '2020-09-20', '2020-09-21', '2020-09-22',
'2020-09-23', '2020-09-24', '2020-09-25', '2020-09-26',
'2020-09-27', '2020-09-28', '2020-09-29', '2020-09-30',
'2020-10-01', '2020-10-02', '2020-10-03', '2020-10-04',
'2020-10-05', '2020-10-06', '2020-10-07', '2020-10-08'],
dtype='datetime64[ns]', freq='D')
时间序列的好处可以选取想要的日期的数据。例如选取所有7月份数据,选取所有2012年数据等。
例:
sr = pd.Series(np.arange(1000),index=pd.date_range('2012-7-3',periods=1000))
print(sr)
print(sr['2012-07']) #选取所有7月份数据
print(sr['2012']) #选取所有2012年数据
print(sr['2012-8-1':'2012-9-30']) #切片出来2012年8月9月的数据
输出:
2012-07-03 0
2012-07-04 1
2012-07-05 2
2012-07-06 3
2012-07-07 4
...
2015-03-25 995
2015-03-26 996
2015-03-27 997
2015-03-28 998
2015-03-29 999
Freq: D, Length: 1000, dtype: int32
2012-07-03 0
2012-07-04 1
2012-07-05 2
2012-07-06 3
2012-07-07 4
2012-07-08 5
2012-07-09 6
2012-07-10 7
2012-07-11 8
2012-07-12 9
2012-07-13 10
2012-07-14 11
2012-07-15 12
2012-07-16 13
2012-07-17 14
2012-07-18 15
2012-07-19 16
2012-07-20 17
2012-07-21 18
2012-07-22 19
2012-07-23 20
2012-07-24 21
2012-07-25 22
2012-07-26 23
2012-07-27 24
2012-07-28 25
2012-07-29 26
2012-07-30 27
2012-07-31 28
Freq: D, dtype: int32
2012-07-03 0
2012-07-04 1
2012-07-05 2
2012-07-06 3
2012-07-07 4
...
2012-12-27 177
2012-12-28 178
2012-12-29 179
2012-12-30 180
2012-12-31 181
Freq: D, Length: 182, dtype: int32
2012-08-01 29
2012-08-02 30
2012-08-03 31
2012-08-04 32
2012-08-05 33
..
2012-09-26 85
2012-09-27 86
2012-09-28 87
2012-09-29 88
2012-09-30 89
Freq: D, Length: 61, dtype: int32