Datawhale_数据分析组队学习task9

日期和时间数据类型及工具

from datetime import datetime
now = datetime.now()
now
>>>
datetime.datetime(2019, 8, 25, 20, 42, 13, 790158)
In [2]:
now.year,now.month,now.day
Out[2]:
(2019, 8, 25)

datetime 以毫秒形式存储日期和时间。timedelta 表示两个 datetime 对象之间 的时间差:

delta = datetime(2011,1,7) - datetime(2008,6,24,8,15)
delta
>>>
datetime.timedelta(days=926, seconds=56700)
In [5]:

delta.days
Out[5]:
926
In [6]:

delta.seconds
Out[6]:
56700

datetime 对象加上(或减去)一个或多个 timedelta,这样会产生一个 新对象:

from datetime import timedelta
start = datetime(2011,1,7)
start + timedelta(12)
>>>
datetime.datetime(2011, 1, 19, 0, 0)
In [8]:

start - 2 * timedelta(12)
Out[8]:
datetime.datetime(2010, 12, 14, 0, 0)

datetime 模块中的数据类型:
在这里插入图片描述

字符串和datetime的相互转换

利用 str 或 strftime 方法(传入一个格式化字符串),datetime 对象可以被格式化为字符串:

stamp = datetime(2011,1,2)
str(stamp)
>>>
'2011-01-02 00:00:00'
In [11]:

stamp.strftime('%Y-%m-%d')
Out[11]:
'2011-01-02'

datetime格式化编码:
在这里插入图片描述
datetime.strptime 可以用格式化编码将字符串转换为日期:

value = '2011-01-03'
datetime.strptime(value,'%Y-%m-%d')
>>>
datetime.datetime(2011, 1, 3, 0, 0)
In [16]:

datestrs = ['7/6/2011', '8/6/2011']
[datetime.strptime(x,'%m/%d/%Y') for x in datestrs]
Out[16]:
[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]

dateutil中的parser.parse 方法直接解析日期

from dateutil.parser import parse
parse('2011-01-03')
>>>
datetime.datetime(2011, 1, 3, 0, 0)
In [18]:

parse('Jan 31, 1997 10:45 PM')
Out[18]:
datetime.datetime(1997, 1, 31, 22, 45)
In [19]:

parse('6/12/2011', dayfirst=True)  #dayfirst=True解决日出现在月前面
Out[19]:
datetime.datetime(2011, 12, 6, 0, 0)

to_datetime解析多种不同的日期表示形式

import pandas as pd
datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']
pd.to_datetime(datestrs)
>>>
DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)
In [22]:

# 处理缺失值
idx = pd.to_datetime(datestrs + [None])
idx
Out[22]:
DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)
In [23]:

idx[2]   # NaT是pandas中时间戳数据的null值
Out[23]:
NaT

日期格式:
在这里插入图片描述

时间序列基础

pandas 最基本的时间序列类型就是以时间戳(通常以 Python 字符串或 datatime 对象表示)为索引的 Series:

from datetime import datetime
import numpy as np
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
         datetime(2011, 1, 7), datetime(2011, 1, 8),
         datetime(2011, 1, 10), datetime(2011, 1, 12)]
ts = pd.Series(np.random.randn(6),index = dates)
ts
>>>
2011-01-02   -0.047613
2011-01-05   -0.109200
2011-01-07   -0.723960
2011-01-08    0.621621
2011-01-10   -0.415737
2011-01-12   -0.329828
dtype: float64
In [26]:

ts.index
Out[26]:
DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)
In [27]:

ts + ts[::2]  #每隔两个取一个
Out[27]:
2011-01-02   -0.095227
2011-01-05         NaN
2011-01-07   -1.447920
2011-01-08         NaN
2011-01-10   -0.831474
2011-01-12         NaN
dtype: float64
In [28]:

ts.index.dtype
Out[28]:
dtype('<M8[ns]')
In [30]:

stamp = ts.index[0]
stamp
Out[30]:
Timestamp('2011-01-02 00:00:00')

索引、选取、子集构造

stamp = ts.index[2]
ts[stamp]
>>>
-0.7239600611192241
In [32]:

ts['1/10/2011']
Out[32]:
-0.41573694830555685
In [33]:

ts['20110110']
Out[33]:
-0.41573694830555685
In [36]:

longer_ts = pd.Series(np.random.randn(1000),index = pd.date_range('1/1/2000',periods = 1000))
longer_ts
Out[36]:
2000-01-01    1.100388
2000-01-02    0.130130
2000-01-03   -2.055608
2000-01-04   -1.831481
2000-01-05   -1.148219
2000-01-06    0.511518
2000-01-07    0.937228
2000-01-08    1.375040
2000-01-09   -0.574676
2000-01-10   -1.069409
2000-01-11   -0.254553
2000-01-12    0.049901
2000-01-13    0.122048
2000-01-14   -0.569365
2000-01-15   -0.424824
2000-01-16    0.483661
2000-01-17    0.086624
2000-01-18    0.952121
2000-01-19   -0.164448
2000-01-20    1.865059
2000-01-21   -0.528312
2000-01-22    1.055838
2000-01-23   -0.741040
2000-01-24   -1.244783
2000-01-25    1.759633
2000-01-26    0.562466
2000-01-27   -0.213264
2000-01-28    0.357678
2000-01-29    0.727353
2000-01-30   -1.445755
                ...   
2002-08-28    0.309759
2002-08-29   -0.967562
2002-08-30   -0.563879
2002-08-31   -0.222950
2002-09-01   -1.399885
2002-09-02    0.137064
2002-09-03    1.035472
2002-09-04    1.375936
2002-09-05    1.523842
2002-09-06   -0.250198
2002-09-07   -0.032714
2002-09-08    1.312375
2002-09-09   -0.306416
2002-09-10    0.447024
2002-09-11    0.067726
2002-09-12    0.669652
2002-09-13   -0.622149
2002-09-14   -0.701277
2002-09-15    0.187466
2002-09-16    0.242639
2002-09-17    0.904932
2002-09-18   -0.542465
2002-09-19   -0.164287
2002-09-20   -0.344578
2002-09-21   -0.549160
2002-09-22   -0.920146
2002-09-23   -1.445779
2002-09-24    0.656674
2002-09-25    0.909611
2002-09-26    0.389822
Freq: D, Length: 1000, dtype: float64
In [37]:

longer_ts['2001']
Out[37]:
2001-01-01   -1.980710
2001-01-02    0.432760
2001-01-03    0.087654
2001-01-04   -0.111670
2001-01-05    0.159273
2001-01-06    0.394524
2001-01-07   -0.544867
2001-01-08    0.190084
2001-01-09   -1.367214
2001-01-10    0.375764
2001-01-11   -0.538164
2001-01-12    1.226165
2001-01-13    1.379185
2001-01-14    0.087215
2001-01-15    0.579437
2001-01-16    1.168259
2001-01-17    0.184381
2001-01-18   -0.790780
2001-01-19   -0.848215
2001-01-20    0.730526
2001-01-21    1.101298
2001-01-22   -0.169520
2001-01-23    0.457182
2001-01-24    1.560211
2001-01-25    1.554455
2001-01-26   -1.302019
2001-01-27   -0.050070
2001-01-28    0.906810
2001-01-29   -1.036318
2001-01-30    0.352866
                ...   
2001-12-02    3.161263
2001-12-03   -0.344133
2001-12-04   -1.724998
2001-12-05    0.992645
2001-12-06    0.249976
2001-12-07    0.854470
2001-12-08    1.206356
2001-12-09    0.227613
2001-12-10   -1.582625
2001-12-11   -0.543272
2001-12-12    0.500589
2001-12-13    1.296602
2001-12-14    0.747176
2001-12-15   -0.558653
2001-12-16    0.073324
2001-12-17    0.235366
2001-12-18   -0.186527
2001-12-19    1.454964
2001-12-20   -2.227943
2001-12-21   -0.487161
2001-12-22   -1.753786
2001-12-23   -1.360177
2001-12-24   -0.362365
2001-12-25    0.362488
2001-12-26    0.614667
2001-12-27    0.422250
2001-12-28    1.140720
2001-12-29    2.556641
2001-12-30    0.644311
2001-12-31   -1.113480
Freq: D, Length: 365, dtype: float64
In [38]:

longer_ts['2001-05']
Out[38]:
2001-05-01    0.135882
2001-05-02    0.690614
2001-05-03    0.111474
2001-05-04   -1.474460
2001-05-05    0.565190
2001-05-06    0.136883
2001-05-07   -0.282811
2001-05-08   -0.478677
2001-05-09    0.944554
2001-05-10   -0.242830
2001-05-11   -0.694054
2001-05-12    1.719454
2001-05-13   -1.259836
2001-05-14    0.097119
2001-05-15   -0.253572
2001-05-16    0.840243
2001-05-17    1.627034
2001-05-18    0.948645
2001-05-19   -0.405268
2001-05-20    0.760941
2001-05-21    0.878004
2001-05-22   -0.854182
2001-05-23    1.001909
2001-05-24    1.658426
2001-05-25   -1.047567
2001-05-26   -0.328606
2001-05-27    0.651273
2001-05-28    0.008510
2001-05-29    0.711464
2001-05-30   -0.384322
2001-05-31    0.213481
Freq: D, dtype: float64
In [39]:

ts[datetime(2011,1,7):]
Out[39]:
2011-01-07   -0.723960
2011-01-08    0.621621
2011-01-10   -0.415737
2011-01-12   -0.329828
dtype: float64
In [40]:

ts
Out[40]:
2011-01-02   -0.047613
2011-01-05   -0.109200
2011-01-07   -0.723960
2011-01-08    0.621621
2011-01-10   -0.415737
2011-01-12   -0.329828
dtype: float64
In [41]:

ts['1/6/2011':'1/11/2011']
Out[41]:
2011-01-07   -0.723960
2011-01-08    0.621621
2011-01-10   -0.415737
dtype: float64
In [42]:

ts.truncate(after='1/9/2011')
Out[42]:
2011-01-02   -0.047613
2011-01-05   -0.109200
2011-01-07   -0.723960
2011-01-08    0.621621
dtype: float64
In [43]:

dates = pd.date_range('1/1/2000', periods=100, freq='W-WED')
long_df = pd.DataFrame(np.random.randn(100, 4),index=dates,columns=['Colorado', 'Texas','New York', 'Ohio'])
long_df.loc['5-2001']
Out[43]:
Colorado	Texas	New York	Ohio
2001-05-02	-0.271322	-0.062810	-1.149354	-2.649799
2001-05-09	-0.568013	-0.945264	-0.810973	-0.872749
2001-05-16	1.079429	-0.782920	0.340388	-1.009138
2001-05-23	-0.592091	0.212066	1.083095	1.697421
2001-05-30	0.093989	-0.067399	-1.916981	2.265589

带有重复索引的时间序列

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值