我刚刚创建了几个问题来解决一些我觉得很不错的功能/便利:
GH-856,
GH-857,
GH-858
我们目前正在对时间序列功能进行改进,并且现在可以进行第二次分辨率的校准(尽管不是重复,所以需要为此编写一些函数).我还想以更好的方式支持重复的时间戳.但是,这实际上是面板(3D)数据,因此您可以改变的一种方法如下:
In [29]: df.pivot('Symbol', 'TimeStamp').stack()
Out[29]:
M1 M2 Price Volume
Symbol TimeStamp
FUEL 9:58:40 AM 6 1.05 3.8544 100116
9:58:47 AM 7 1.07 3.8599 102116
9:59:09 AM 8 1.11 3.9099 105265
9:59:11 AM 9 1.15 3.9490 109674
GBR 9:57:52 AM 2 0.34 3.7500 47521
9:58:20 AM 3 0.45 3.8000 63211
9:58:24 AM 4 0.46 3.8300 64251
MPET 9:57:52 AM 3 0.26 1.4200 44600
ORBC 9:59:02 AM 2 0.22 3.4000 10509
SUNH 9:59:09 AM 6 0.09 4.3700 24394
TBET 9:59:05 AM 2 8.03 2.1800 1121629
9:59:14 AM 3 8.05 2.1900 1124179
XRA 9:58:08 AM 3 0.12 3.6167 42310
请注意,这创建了一个MultiIndex.另一种方法我可以得到这个:
In [32]: df.set_index(['Symbol', 'TimeStamp'])
Out[32]:
Price M1 M2 Volume
Symbol TimeStamp
TBET 9:59:14 AM 2.1900 3 8.05 1124179
FUEL 9:59:11 AM 3.9490 9 1.15 109674
SUNH 9:59:09 AM 4.3700 6 0.09 24394
FUEL 9:59:09 AM 3.9099 8 1.11 105265
TBET 9:59:05 AM 2.1800 2 8.03 1121629
ORBC 9:59:02 AM 3.4000 2 0.22 10509
FUEL 9:58:47 AM 3.8599 7 1.07 102116
9:58:40 AM 3.8544 6 1.05 100116
GBR 9:58:24 AM 3.8300 4 0.46 64251
9:58:20 AM 3.8000 3 0.45 63211
XRA 9:58:08 AM 3.6167 3 0.12 42310
GBR 9:57:52 AM 3.7500 2 0.34 47521
MPET 9:57:52 AM 1.4200 3 0.26 44600
In [33]: df.set_index(['Symbol', 'TimeStamp']).sortlevel(0)
Out[33]:
Price M1 M2 Volume
Symbol TimeStamp
FUEL 9:58:40 AM 3.8544 6 1.05 100116
9:58:47 AM 3.8599 7 1.07 102116
9:59:09 AM 3.9099 8 1.11 105265
9:59:11 AM 3.9490 9 1.15 109674
GBR 9:57:52 AM 3.7500 2 0.34 47521
9:58:20 AM 3.8000 3 0.45 63211
9:58:24 AM 3.8300 4 0.46 64251
MPET 9:57:52 AM 1.4200 3 0.26 44600
ORBC 9:59:02 AM 3.4000 2 0.22 10509
SUNH 9:59:09 AM 4.3700 6 0.09 24394
TBET 9:59:05 AM 2.1800 2 8.03 1121629
9:59:14 AM 2.1900 3 8.05 1124179
XRA 9:58:08 AM 3.6167 3 0.12 42310
您可以使用真正的面板格式获取此数据,如下所示:
In [35]: df.set_index(['TimeStamp', 'Symbol']).sortlevel(0).to_panel()
Out[35]:
Dimensions: 4 (items) x 11 (major) x 7 (minor)
Items: Price to Volume
Major axis: 9:57:52 AM to 9:59:14 AM
Minor axis: FUEL to XRA
In [36]: panel = df.set_index(['TimeStamp', 'Symbol']).sortlevel(0).to_panel()
In [37]: panel['Price']
Out[37]:
Symbol FUEL GBR MPET ORBC SUNH TBET XRA
TimeStamp
9:57:52 AM NaN 3.75 1.42 NaN NaN NaN NaN
9:58:08 AM NaN NaN NaN NaN NaN NaN 3.6167
9:58:20 AM NaN 3.80 NaN NaN NaN NaN NaN
9:58:24 AM NaN 3.83 NaN NaN NaN NaN NaN
9:58:40 AM 3.8544 NaN NaN NaN NaN NaN NaN
9:58:47 AM 3.8599 NaN NaN NaN NaN NaN NaN
9:59:02 AM NaN NaN NaN 3.4 NaN NaN NaN
9:59:05 AM NaN NaN NaN NaN NaN 2.18 NaN
9:59:09 AM 3.9099 NaN NaN NaN 4.37 NaN NaN
9:59:11 AM 3.9490 NaN NaN NaN NaN NaN NaN
9:59:14 AM NaN NaN NaN NaN NaN 2.19 NaN
然后,您可以从该数据生成一些图表.
请注意,时间戳仍然是字符串 – 我猜它们可以转换为Python datetime.time对象,事情可能更容易使用.我没有很多计划为原始时间和时间戳(日期时间)提供大量支持,但如果有足够的人需要它,我想我可以说服:)
如果对于单个符号在一秒钟内有多个观察值,则上述某些方法将不起作用.但我希望在即将发布的大熊猫版本中为此提供更好的支持,因此了解您的用例对我有帮助 – 考虑加入邮件列表(pystatsmodels)