文章目录
freq参数对应的缩写:Offset aliases
实际上freq就有点类似于下面DatetimeIndex的resolution,表示Timestamp精确到月(M),天(D),小时(H),分钟(T),秒(S)等等
参考:Offset aliases
缩写 | 对应的含义 |
---|---|
B | business day frequency |
C | custom business day frequency 可自行定制的工作日(即挑出某几天) |
D | calendar day frequency 日历天,即按顺序排下来的 |
W | weekly frequency 周 |
M | month end frequency 月份!!注意别和分钟弄混了 |
SM | semi-month end frequency (15th and end of month) |
BM | business month end frequency |
CBM | custom business month end frequency |
MS | month start frequency |
SMS | semi-month start frequency (1st and 15th) |
BMS | business month start frequency |
CBMS | custom business month start frequency |
Q | quarter end frequency |
BQ | business quarter end frequency |
QS | quarter start frequency |
BQS | business quarter start frequency |
A, Y | year end frequency |
BA, BY | business year end frequency |
AS, YS | year start frequency |
BAS, BYS | business year start frequency |
BH | business hour frequency |
H | hourly frequency 小时!! |
T, min | minutely frequency 分钟!!别和月份M弄混了 |
S | secondly frequency 秒!! |
L, | ms milliseconds |
U, | us microseconds |
N | nanoseconds |
DatetimeIndex 与 Timestamp 的含义
DatetimeIndex 实际是对Timestamp的索引,Timestamp虽然都是有有hour, minute, second,但是一旦组合起来,变成了Timestamp,那就不一定了。
As discussed in previous section, indexing a DatetimeIndex with a partial string depends on the “accuracy” of the period, in other words how specific the interval is in relation to the resolution of the index. In contrast, indexing with Timestamp or datetime objects is exact, because the objects have exact meaning. These also follow the semantics of including both endpoints.
These Timestamp and datetime objects have exact hours, minutes, and seconds
意思是,若Dataframe或者Series的index是DatetimeIndex,则partial string selection可以做到的的最小分辨率(resolution)即最小的区间取决于这个DatetimeIndex的分辨率。具体的resolution可通过
DatetimeIndex.resolution
进行查看。
相反,Timestamp or datetime objects都一定是精确到了second的,即即使是没有指定的情况,下,他们也是包含了精确的hours、minutes 和 seconds
以Timestamp or datetime objects作为index的Dataframe或者Series,是一定能够通过形如’%Y-%m-%d %h:%m:%s’的index进行exact indexing。而DatetimeIndex则没办法,超过了原有的resolution则会出现下一小结中的错误: DatetimeIndex 中partial slicing与resolution的问题
DatetimeIndex 中partial slicing与resolution的问题
Warning However, if the string is treated as an exact match, the selection in DataFrame’s [] will be column-wise and not row-wise, see Indexing Basics. For example dft_minute[‘2011-12-31 23:59’] will raise KeyError as ‘2012-12-31 23:59’ has the same resolution as the index and there is no column with such name:
To always have unambiguous selection, whether the row is treated as a slice or a single selection, use .loc.
即DF的[]操作,当string的resolution和DatetimeIndex的resolution相同时,会判断为exact match,因此会优先会判断为column-wise的,i.e.,优先匹配columns names。但是column里面没有这样的name,因此会报错 Keyerror
可以通过loc来避免这个问题。
也可以通过用resolution更小的string如’2011-12-31 23’ 来避免
Example:
In[19]: corr_matrix_hour.head()
Out[19]:
co2 pm10 pm1d0 pm2d5
datetime
2017-09-06 02:00:00 0.524205 0.515984 0.535088
2017-09-06 03:00:00 0.271715 0.371295 0.287175
2017-09-06 04:00:00 -0.006424 -0.051545 -0.040707
2017-09-06 05:00:00 -0.146557 -0.329352 -0.205284
2017-09-06 06:00:00 0.190316 0.130562 0.206690
In[20]: corr_matrix_hour.index.resolution
Out[20]: 'hour'
可以看到是精确到hour
直接用相同或者更精确的resolution会报错
In[20]: corr_matrix_hour.index.resolution
Out[20]: 'hour'
In[21]: corr_matrix_hour['2017-09-06 02']
Traceback (most recent call last):
File "C:\Users\walter\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\indexes\base.py", line 3078, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '2017-09-06 02'
但是通过loc则可以解决。或者用resolution更小的
In[23]: corr_matrix_hour.loc['2017-09-06 02:00:00']
Out[23]:
co2
pm10 0.524205
pm1d0 0.515984
pm2d5 0.535088
Name: 2017-09-06 02:00:00, dtype: float64
In[24]: corr_matrix_hour.loc['2017-09-06']
Out[24]:
co2 pm10 pm1d0 pm2d5
datetime
2017-09-06 02:00:00 0.524205 0.515984 0.535088
2017-09-06 03:00:00 0.271715 0.371295 0.287175
2017-09-06 04:00:00 -0.006424 -0.051545 -0.040707
2017-09-06 05:00:00 -0.146557 -0.329352 -0.205284
2017-09-06 06:00:00 0.190316 0.130562 0.206690
2017-09-06 07:00:00 0.320764 0.433057 0.316239
2017-09-06 08:00:00 0.478715 0.427422 0.448940
2017-09-06 09:00:00 0.104344 0.176117 0.172312
2017-09-06 10:00:00 0.681273 0.747524 0.695242
2017-09-06 11:00:00 0.578912 0.695337 0.597895
2017-09-06 12:00:00 0.268466 0.306412 0.274614
2017-09-06 13:00:00 0.469630 0.463735 0.509890
2017-09-06 14:00:00 0.432735 0.441847 0.397852
2017-09-06 15:00:00 0.357006 0.368656 0.457821
2017-09-06 16:00:00 0.356666 0.288574 0.367849
2017-09-06 17:00:00 0.200700 0.146319 0.142676
2017-09-06 18:00:00 0.137906 0.227373 0.203481
2017-09-06 19:00:00 0.092497 0.201569 0.096877
2017-09-06 20:00:00 -0.141685 -0.094120 -0.066167
2017-09-06 21:00:00 0.188640 0.183061 0.201465
2017-09-06 22:00:00 0.040039 0.023205 0.051618
2017-09-06 23:00:00 -0.042926 -0.000429 -0.047177
index为DatetimeIndex的DF/Seires,同时对index和column等加以判断并组合boolean array 的方法
主要是利用index.isin()以及pd.date_range()
其中特别要注意的是,pd.date_range(start=None, end=None, periods=None, freq=None),中的这四个参数,必须且只能指定其中的三个,空出一个,从而产生出对应DatetimeIndex。
Example:
In[68]: co2.head(10)
Out[68]:
co2 pm10 pm1d0 pm2d5 Location
datetime
2017-09-06 02:19:58 594 24 16 24 C215012
2017-09-06 02:19:58 579 24 14 24 C215019
2017-09-06 02:19:58 615 25 18 23 C215013
2017-09-06 02:19:58 626 25 16 25 C215016
2017-09-06 02:19:58 581 27 17 26 C215017
2017-09-06 02:19:58 564 26 16 23 C215014
2017-09-06 02:19:58 601 25 17 24 C215018
2017-09-06 02:19:58 752 35 19 29 C215011
2017-09-06 02:19:59 615 25 18 23 C215013
2017-09-06 02:19:59 626 26 17 26 C215016
In[69]: pd.date_range('2018-01-14 03:00', '2018-01-14 04:00', freq='S')
#2018-01-04 3am到4am,抽出其中的每一秒组成一个DatetimeIndex
Out[69]:
DatetimeIndex(['2018-01-14 03:00:00', '2018-01-14 03:00:01',
'2018-01-14 03:00:02', '2018-01-14 03:00:03',
'2018-01-14 03:00:04', '2018-01-14 03:00:05',
'2018-01-14 03:00:06', '2018-01-14 03:00:07',
'2018-01-14 03:00:08', '2018-01-14 03:00:09',
...
'2018-01-14 03:59:51', '2018-01-14 03:59:52',
'2018-01-14 03:59:53', '2018-01-14 03:59:54',
'2018-01-14 03:59:55', '2018-01-14 03:59:56',
'2018-01-14 03:59:57', '2018-01-14 03:59:58',
'2018-01-14 03:59:59', '2018-01-14 04:00:00'],
dtype='datetime64[ns]', length=3601, freq='S')
以下代码就是Location列为’C215012’,时间在2018-01-14 3AM到4AM之间的数据抽取出来
In[70]: datetime_boolean = co2.index.isin(pd.date_range('2018-01-14 03:00', '2018-01-14 04:00', freq='S'))
...: co2[datetime_boolean & (co2['Location'] == 'C215012')]
...:
Out[70]:
co2 pm10 pm1d0 pm2d5 Location
datetime
2018-01-14 03:40:56 505 22 13 20 C215012
2018-01-14 03:40:58 505 22 14 21 C215012
2018-01-14 03:40:59 505 22 14 21 C215012
2018-01-14 03:41:00 505 22 14 21 C215012
2018-01-14 03:41:01 505 21 14 21 C215012
2018-01-14 03:41:02 505 21 14 21 C215012
2018-01-14 03:41:03 505 21 14 21 C215012
2018-01-14 03:41:04 505 21 13 20 C215012
2018-01-14 03:41:05 505 21 13 20 C215012
2018-01-14 03:41:06 505 21 13 20 C215012
2018-01-14 03:41:07 505 20 14 20 C215012
2018-01-14 03:41:08 505 20 14 20 C215012