我正在处理一些当前以1分钟为间隔存储的数据,如下所示:
CREATE TABLE #MinuteData
(
[Id] INT,[MinuteBar] DATETIME,[Open] NUMERIC(12,6),[High] NUMERIC(12,[Low] NUMERIC(12,[Close] NUMERIC(12,6)
);
INSERT INTO #MinuteData
( [Id],[MinuteBar],[Open],[High],[Low],[Close] )
VALUES ( 1,'2015-01-01 17:00:00',1.557870,1.557880,1.557880 ),( 2,'2015-01-01 17:01:00',1.557900,( 3,'2015-01-01 17:02:00',1.557960,1.558070,1.558040 ),( 4,'2015-01-01 17:03:00',1.558080,1.558100,1.558040,1.558050 ),( 5,'2015-01-01 17:04:00',1.558050,1.558020,1.558030 ),( 6,'2015-01-01 17:05:00',1.558580,1.558710,1.557950 ),( 7,'2015-01-01 17:06:00',1.557910,1.558120,1.557990 ),( 8,'2015-01-01 17:07:00',1.557940,1.558250,1.558170 ),( 9,'2015-01-01 17:08:00',1.558140,1.558200,1.558120 ),( 10,'2015-01-01 17:09:00',1.558110,1.557970,1.557970 );
SELECT *
FROM #MinuteData;
DROP TABLE #MinuteData;
这些值跟踪货币汇率,因此对于每分钟间隔(条形),分钟开始时有未平仓价格,分钟结束时有收盘价格.高值和低值表示每个分钟期间的最高和最低速率.
期望的输出
我想要将这些数据重新格式化为5分钟,以产生以下输出:
MinuteBar Open Close Low High
2015-01-01 17:00:00.000 1.557870 1.558030 1.557870 1.558100
2015-01-01 17:05:00.000 1.558580 1.557970 1.557870 1.558710
这取5的第一分钟的开放值,即5的最后一分钟的关闭值.高和低值表示5分钟时段内的最高和最低低速率.
当前解决方案
我有一个解决方案,这样做(下面),但它感觉不优雅,因为它依赖于id值和自连接.此外,我打算在更大的数据集上运行它,所以我希望在可能的情况下以更有效的方式执行它:
-- Create a column to allow grouping in 5 minute Intervals
SELECT Id,MinuteBar,High,Low,[Close],DATEDIFF(MINUTE,'2015-01-01T00:00:00',MinuteBar)/5 AS Interval
INTO #5MinuteData
FROM #MinuteData
ORDER BY minutebar
-- Group by inteval and aggregate prior to self join
SELECT Interval,MIN(MinuteBar) AS MinuteBar,MIN(Id) AS OpenId,MAX(Id) AS CloseId,MIN(Low) AS Low,MAX(High) AS High
INTO #DataMinMax
FROM #5MinuteData
GROUP BY Interval;
-- Self join to get the Open and Close values
SELECT t1.Interval,t1.MinuteBar,tOpen.[Open],tClose.[Close],t1.Low,t1.High
FROM #DataMinMax t1
INNER JOIN #5MinuteData tOpen ON tOpen.Id = OpenId
INNER JOIN #5MinuteData tClose ON tClose.Id = CloseId;
DROP TABLE #DataMinMax
DROP TABLE #5MinuteData
返工尝试
而不是上面的查询,我一直在寻找使用FIRST_VALUE和LAST_VALUE,因为它似乎是我所追求的,但我无法让它与我正在进行的分组工作.可能有比我正在尝试做的更好的解决方案,所以我愿意接受建议.目前我正在尝试这样做:
SELECT MIN(MinuteBar) MinuteBar5,FIRST_VALUE([Open]) OVER (ORDER BY MinuteBar) AS opening,MAX(High) AS High,LAST_VALUE([Close]) OVER (ORDER BY MinuteBar) AS Closing,'2015-01-01 00:00:00',MinuteBar) / 5 AS Interval
FROM #MinuteData
GROUP BY DATEDIFF(MINUTE,MinuteBar) / 5
这给了我以下错误,如果删除这些行,则查询运行时会出现FIRST_VALUE和LAST_VALUE:
Column ‘#MinuteData.MinuteBar’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.