pandas DatetimeIndex indexing

freq参数对应的缩写:Offset aliases

实际上freq就有点类似于下面DatetimeIndex的resolution,表示Timestamp精确到月(M),天(D),小时(H),分钟(T),秒(S)等等
参考:Offset aliases

缩写对应的含义
Bbusiness day frequency
Ccustom business day frequency 可自行定制的工作日(即挑出某几天)
Dcalendar day frequency 日历天,即按顺序排下来的
Wweekly frequency
Mmonth end frequency 月份!!注意别和分钟弄混了
SMsemi-month end frequency (15th and end of month)
BMbusiness month end frequency
CBMcustom business month end frequency
MSmonth start frequency
SMSsemi-month start frequency (1st and 15th)
BMSbusiness month start frequency
CBMScustom business month start frequency
Qquarter end frequency
BQbusiness quarter end frequency
QSquarter start frequency
BQSbusiness quarter start frequency
A, Yyear end frequency
BA, BYbusiness year end frequency
AS, YSyear start frequency
BAS, BYSbusiness year start frequency
BHbusiness hour frequency
Hhourly frequency 小时!!
T, minminutely frequency 分钟!!别和月份M弄混了
Ssecondly frequency 秒!!
L,ms milliseconds
U,us microseconds
Nnanoseconds



DatetimeIndex 与 Timestamp 的含义

在这里插入图片描述
DatetimeIndex 实际是对Timestamp的索引,Timestamp虽然都是有有hour, minute, second,但是一旦组合起来,变成了Timestamp,那就不一定了。

参考 exact-indexing

As discussed in previous section, indexing a DatetimeIndex with a partial string depends on the “accuracy” of the period, in other words how specific the interval is in relation to the resolution of the index. In contrast, indexing with Timestamp or datetime objects is exact, because the objects have exact meaning. These also follow the semantics of including both endpoints.


These Timestamp and datetime objects have exact hours, minutes, and seconds

意思是,若Dataframe或者Series的index是DatetimeIndex,则partial string selection可以做到的的最小分辨率(resolution)即最小的区间取决于这个DatetimeIndex的分辨率。具体的resolution可通过

DatetimeIndex.resolution

进行查看。

相反,Timestamp or datetime objects都一定是精确到了second的,即即使是没有指定的情况,下,他们也是包含了精确的hours、minutes 和 seconds

以Timestamp or datetime objects作为index的Dataframe或者Series,是一定能够通过形如’%Y-%m-%d %h:%m:%s’的index进行exact indexing。而DatetimeIndex则没办法,超过了原有的resolution则会出现下一小结中的错误: DatetimeIndex 中partial slicing与resolution的问题


DatetimeIndex 中partial slicing与resolution的问题

参考 slice-vs-exact-match

Warning However, if the string is treated as an exact match, the selection in DataFrame’s [] will be column-wise and not row-wise, see Indexing Basics. For example dft_minute[‘2011-12-31 23:59’] will raise KeyError as ‘2012-12-31 23:59’ has the same resolution as the index and there is no column with such name:
To always have unambiguous selection, whether the row is treated as a slice or a single selection, use .loc.
在这里插入图片描述

即DF的[]操作,当string的resolution和DatetimeIndex的resolution相同时会判断为exact match,因此会优先会判断为column-wise的,i.e.,优先匹配columns names。但是column里面没有这样的name,因此会报错 Keyerror

可以通过loc来避免这个问题。

也可以通过用resolution更小的string如’2011-12-31 23’ 来避免

Example:

In[19]: corr_matrix_hour.head()
Out[19]: 
co2                      pm10     pm1d0     pm2d5
datetime                                         
2017-09-06 02:00:00  0.524205  0.515984  0.535088
2017-09-06 03:00:00  0.271715  0.371295  0.287175
2017-09-06 04:00:00 -0.006424 -0.051545 -0.040707
2017-09-06 05:00:00 -0.146557 -0.329352 -0.205284
2017-09-06 06:00:00  0.190316  0.130562  0.206690
In[20]: corr_matrix_hour.index.resolution
Out[20]: 'hour'

可以看到是精确到hour


直接用相同或者更精确的resolution会报错

In[20]: corr_matrix_hour.index.resolution
Out[20]: 'hour'
In[21]: corr_matrix_hour['2017-09-06 02']
Traceback (most recent call last):
  File "C:\Users\walter\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\indexes\base.py", line 3078, in get_loc
    return self._engine.get_loc(key)
  File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '2017-09-06 02'

但是通过loc则可以解决。或者用resolution更小的

In[23]: corr_matrix_hour.loc['2017-09-06 02:00:00']
Out[23]: 
co2
pm10     0.524205
pm1d0    0.515984
pm2d5    0.535088
Name: 2017-09-06 02:00:00, dtype: float64
In[24]: corr_matrix_hour.loc['2017-09-06']
Out[24]: 
co2                      pm10     pm1d0     pm2d5
datetime                                         
2017-09-06 02:00:00  0.524205  0.515984  0.535088
2017-09-06 03:00:00  0.271715  0.371295  0.287175
2017-09-06 04:00:00 -0.006424 -0.051545 -0.040707
2017-09-06 05:00:00 -0.146557 -0.329352 -0.205284
2017-09-06 06:00:00  0.190316  0.130562  0.206690
2017-09-06 07:00:00  0.320764  0.433057  0.316239
2017-09-06 08:00:00  0.478715  0.427422  0.448940
2017-09-06 09:00:00  0.104344  0.176117  0.172312
2017-09-06 10:00:00  0.681273  0.747524  0.695242
2017-09-06 11:00:00  0.578912  0.695337  0.597895
2017-09-06 12:00:00  0.268466  0.306412  0.274614
2017-09-06 13:00:00  0.469630  0.463735  0.509890
2017-09-06 14:00:00  0.432735  0.441847  0.397852
2017-09-06 15:00:00  0.357006  0.368656  0.457821
2017-09-06 16:00:00  0.356666  0.288574  0.367849
2017-09-06 17:00:00  0.200700  0.146319  0.142676
2017-09-06 18:00:00  0.137906  0.227373  0.203481
2017-09-06 19:00:00  0.092497  0.201569  0.096877
2017-09-06 20:00:00 -0.141685 -0.094120 -0.066167
2017-09-06 21:00:00  0.188640  0.183061  0.201465
2017-09-06 22:00:00  0.040039  0.023205  0.051618
2017-09-06 23:00:00 -0.042926 -0.000429 -0.047177



index为DatetimeIndex的DF/Seires,同时对index和column等加以判断并组合boolean array 的方法

主要是利用index.isin()以及pd.date_range()

其中特别要注意的是,pd.date_range(start=None, end=None, periods=None, freq=None),中的这四个参数,必须且只能指定其中的三个,空出一个,从而产生出对应DatetimeIndex。


Example:
In[68]: co2.head(10)
Out[68]: 
                     co2  pm10  pm1d0  pm2d5 Location
datetime                                             
2017-09-06 02:19:58  594    24     16     24  C215012
2017-09-06 02:19:58  579    24     14     24  C215019
2017-09-06 02:19:58  615    25     18     23  C215013
2017-09-06 02:19:58  626    25     16     25  C215016
2017-09-06 02:19:58  581    27     17     26  C215017
2017-09-06 02:19:58  564    26     16     23  C215014
2017-09-06 02:19:58  601    25     17     24  C215018
2017-09-06 02:19:58  752    35     19     29  C215011
2017-09-06 02:19:59  615    25     18     23  C215013
2017-09-06 02:19:59  626    26     17     26  C215016
In[69]: pd.date_range('2018-01-14 03:00', '2018-01-14 04:00', freq='S') 
#2018-01-04 3am到4am,抽出其中的每一秒组成一个DatetimeIndex
Out[69]: 
DatetimeIndex(['2018-01-14 03:00:00', '2018-01-14 03:00:01',
               '2018-01-14 03:00:02', '2018-01-14 03:00:03',
               '2018-01-14 03:00:04', '2018-01-14 03:00:05',
               '2018-01-14 03:00:06', '2018-01-14 03:00:07',
               '2018-01-14 03:00:08', '2018-01-14 03:00:09',
               ...
               '2018-01-14 03:59:51', '2018-01-14 03:59:52',
               '2018-01-14 03:59:53', '2018-01-14 03:59:54',
               '2018-01-14 03:59:55', '2018-01-14 03:59:56',
               '2018-01-14 03:59:57', '2018-01-14 03:59:58',
               '2018-01-14 03:59:59', '2018-01-14 04:00:00'],
              dtype='datetime64[ns]', length=3601, freq='S')

以下代码就是Location列为’C215012’,时间在2018-01-14 3AM到4AM之间的数据抽取出来

In[70]: datetime_boolean = co2.index.isin(pd.date_range('2018-01-14 03:00', '2018-01-14 04:00', freq='S'))
   ...: co2[datetime_boolean & (co2['Location'] == 'C215012')]
   ...: 


Out[70]: 
                     co2  pm10  pm1d0  pm2d5 Location
datetime                                             
2018-01-14 03:40:56  505    22     13     20  C215012
2018-01-14 03:40:58  505    22     14     21  C215012
2018-01-14 03:40:59  505    22     14     21  C215012
2018-01-14 03:41:00  505    22     14     21  C215012
2018-01-14 03:41:01  505    21     14     21  C215012
2018-01-14 03:41:02  505    21     14     21  C215012
2018-01-14 03:41:03  505    21     14     21  C215012
2018-01-14 03:41:04  505    21     13     20  C215012
2018-01-14 03:41:05  505    21     13     20  C215012
2018-01-14 03:41:06  505    21     13     20  C215012
2018-01-14 03:41:07  505    20     14     20  C215012
2018-01-14 03:41:08  505    20     14     20  C215012
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值