python中间的时间格式,尤其是在用到 pandas 和 numpy之后可是迷迷糊糊的,处理起带有时间的数据时就很晕。下面结合stackoverflow中的回答 对 python中的datetime标准模块,numpy模块和pandas模块中的时间objects做个区分记录。
The datetime standard library of Python
这里面只有4个主要的对象:
- time - 只有time,可以以hours,minutes,seconds和microseconds衡量
- date - 只有year, month, day
- datetime - 包含date和time的所有对象
- timedelta - 最大单位是天的一段时间
import datetime
datetime.time(hour=1,minute=25,second=61,microsecond=6333)
Traceback (most recent call last):
File "<ipython-input-2-8e2667fea8f6>", line 1, in <module>
datetime.time(hour=1,minute=25,second=61,microsecond=6333)
ValueError: second must be in 0..59
datetime.time(hour=1,minute=25,second=22,microsecond=6333)
Out[3]: datetime.time(1, 25, 22, 6333)
datetime.date(year=2018,month=9,day=23)
Out[4]: datetime.date(2018, 9, 23)
datetime.datetime(year =2018,month=9,day=23,hour=20,minute=22,second=30,microsecond=3155)
Out[5]: datetime.datetime(2018, 9, 23, 20, 22, 30, 3155)
datetime.timedelta(days=3,minutes=55)
Out[6]: datetime.timedelta(3, 3300)
datetime.timedelta(days=3,minutes=55) + datetime.datetime(year =2018,month=9,day=23,hour=20,minute=22,second=30,microsecond=3155)
Out[7]: datetime.datetime(2018, 9, 26, 21, 17, 30, 3155)
datetime.date(2018,9,23)
Out[8]: datetime.date(2018, 9, 23)
datetime.date(2018,23,9)
Traceback (most recent call last):
File "<ipython-input-40-258ea9b432d0>", line 1, in <module>
datetime.date(2018,23,9)
ValueError: month must be in 1..12
可以看到中间,我瞎试了以下 second>59这是不允许的,然后你照着默认的年月日 时分秒的顺序来其实是可以不用输入 year=,month=,...这之类的
Numpy's datetime64 and timedelta64 objects
Numpy中间没有分离date和time对象,只有一个datetime64对象表示一瞬间的时间,datetime模块中间的datetime对象精度为微秒级(10^-7)而Numpy中的datetime64对象精度有到attoseconds(10^-18),更灵活能有支持更多类型的输入
import numpy as np
np.datetime64(5,'ns')
Out[9]: numpy.datetime64('1970-01-01T00:00:00.000000005')
np.datetime64('2018-09-23')
Out[10]: numpy.datetime64('2018-09-23')
np.datetime64('2018-9-23')
Traceback (most recent call last):
File "<ipython-input-11-5f3797908da0>", line 1, in <module>
np.datetime64('2018-9-23')
ValueError: Error parsing datetime string "2018-9-23" at position 5
np.datetime64('2018/09/23')
Traceback (most recent call last):
File "<ipython-input-12-fbe5ac53716b>", line 1, in <module>
np.datetime64('2018/09/23')
ValueError: Error parsing datetime string "2018/09/23" at position 4
np.datetime64('2018-09-23 05:00')
Out[13]: numpy.datetime64('2018-09-23T05:00')
np.timedelta64(5,'D')
Out[15]: numpy.timedelta64(5,'D')
np.datetime64('2018-09-23 05:00') - np.datetime64('2018-09-23 04:00:59')
Out[16]: numpy.timedelta64(3541,'s')
这里可以看出datetime64对于 对于时间的格式要求还是很 严格的,而且必须带单位,直接字符串转变的时候必须符合xxxx-xx-xx xx:xx:xx的形式,比如2018-09-24 变为2018-9-24都不行。
Pandas中的Timestamp和Timedelta
其实这两个就是在Numpy的时间格式的基础上深入,pandas中的Timestamp也是表示一瞬间的时间,跟datetime很相似,但有更多功能,可以用pd.Timestamp和pd.to_datetime来构建此对象。
import pandas as pd
pd.Timestamp(1234.1256537)#default ns
Out[19]: Timestamp('1970-01-01 00:00:00.000001234')
pd.Timestamp(1234.1256537, unit='h')#change units
Out[21]: Timestamp('1970-02-21 10:07:32.354399999')
pd.Timestamp('2018-9-23 5:00')
Out[22]: Timestamp('2018-09-23 05:00:00')
pd.to_datetime('2018-9-23 5:00')
Out[23]: Timestamp('2018-09-23 05:00:00')
pd.to_datetime(['2018-9-23 5:00','2018-9-23 15:00'])
Out[24]: DatetimeIndex(['2018-09-23 05:00:00', '2018-09-23 15:00:00'], dtype='datetime64[ns]', freq=None)
pd.to_datetime(['2018-9-23 5:00'])
Out[25]: DatetimeIndex(['2018-09-23 05:00:00'], dtype='datetime64[ns]', freq=None)
pd.to_datetime(['2018-9-23 5:00','2018-9-23 15:00'])[0]
Out[26]: Timestamp('2018-09-23 05:00:00')
a = pd.DataFrame([['2018-9-24 12:00',1,3],['2018-9-24 11:00',2,4],['2018-9-24 10:00',5,9]],columns=['date','num1','num2'])
a
Out[27]:
date num1 num2
0 2018-9-24 12:00 1 3
1 2018-9-24 11:00 2 4
2 2018-9-24 10:00 5 9
a.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
date 3 non-null object
num1 3 non-null int64
num2 3 non-null int64
dtypes: int64(2), object(1)
memory usage: 152.0+ bytes
b = a.date.apply(lambda x:pd.Timestamp(x))
b
Out[28]:
0 2018-09-24 12:00:00
1 2018-09-24 11:00:00
2 2018-09-24 10:00:00
Name: date, dtype: datetime64[ns]
b[0]
Out[29]: Timestamp('2018-09-24 12:00:00')
这里可以看出pandas中对于时间的格式要求不高,2018-9-24也可以通过,但是也出来我很疑惑的一点了,在一个Series中,输出info信息,会出现dtype为 datetime64[ns], 但对于每一个单独的,又是timestamp格式???
Convert Python datetime to datetime64 and Timestamp
这两个转变都很简单,如下所示
dt = datetime.datetime(2018,9,24,13,39,40,34676)
dt
Out[59]: datetime.datetime(2018, 9, 24, 13, 39, 40, 34676)
np.datetime64(dt)
Out[60]: numpy.datetime64('2018-09-24T13:39:40.034676')
pd.Timestamp(dt)
Out[61]: Timestamp('2018-09-24 13:39:40.034676')
pd.to_datetime(dt)
Out[62]: Timestamp('2018-09-24 13:39:40.034676')
Convert datetime64 to datetime and Timestamp
前者比较麻烦 要先变为float 然后变为datetime 后者更容易 pd.Timestamp/to_datetime()
dt64 = np.datetime64('2017-10-24 05:34:00.136562')
dt64
Out[30]: numpy.datetime64('2017-10-24T05:34:00.136562')
unix_epoch = np.datetime64(0, 's')
one_second = np.timedelta64(1, 's')
seconds_since_epoch = (dt64 - unix_epoch) / one_second
seconds_since_epoch
Out[32]: 1508823240.1365621
datetime.datetime.utcfromtimestamp(seconds_since_epoch)
Out[33]: datetime.datetime(2017, 10, 24, 5, 34, 0, 136562)
pd.to_datetime(dt64)
Out[34]: Timestamp('2017-10-24 05:34:00.136562')
pd.Timestamp(dt64)
Out[35]: Timestamp('2017-10-24 05:34:00.136562')
Convert Timestamp to datetime datetime64
这个也比较简单,如代码所示
ts = pd.Timestamp('2018-9-24 10:22:46.3654')
ts.to_pydatetime()#python's datetime
Out[37]: datetime.datetime(2018, 9, 24, 10, 22, 46, 365400)
ts.to_datetime64()
Out[38]: numpy.datetime64('2018-09-24T10:22:46.365400000')
这几种都可以互相比较大小的嘛??
dt64
Out[63]: numpy.datetime64('2017-10-24T05:34:00.136562')
dt
Out[64]: datetime.datetime(2018, 9, 24, 13, 39, 40, 34676)
ts
Out[65]: Timestamp('2018-09-24 10:22:46.365400')
dt64>dt
Out[66]: False
ts>dt
Out[67]: False
ts>dt64
Out[68]: True
那两种类型单独输出都是 timestamp 但是比较起来提示 float和timestamp不能比较的原因是??
有点懵