我有以下三个系列的表中定义的3个进程A,B和C:
CREATE TABLE processA
(date_time datetime, valueA int);
INSERT INTO processA
(date_time, valueA)
VALUES
('2013-1-8 22:10:00', 100),
('2013-1-8 22:15:00', 100),
('2013-1-8 22:30:00', 100),
('2013-1-8 22:35:00', 100),
('2013-1-8 22:40:00', 100),
('2013-1-8 22:45:00', 100),
('2013-1-8 22:50:00', 100),
('2013-1-8 23:05:00', 100),
('2013-1-8 23:10:00', 100),
('2013-1-8 23:20:00', 100),
('2013-1-8 23:25:00', 100),
('2013-1-8 23:35:00', 100),
('2013-1-8 23:40:00', 100),
('2013-1-9 00:05:00', 100),
('2013-1-9 00:10:00', 100);
CREATE TABLE processB
(date_time datetime, valueB decimal(4,2));
INSERT INTO processB
(date_time, valueB)
VALUES
('2013-1-08 21:46:00', 3),
('2013-1-08 22:11:00', 4),
('2013-1-08 22:31:00', 5),
('2013-1-08 22:36:00', 6),
('2013-1-08 22:41:00', 7),
('2013-1-08 23:06:00', 8),
('2013-1-08 23:20:00', 2),
('2013-1-08 23:46:00', 3),
('2013-1-09 00:34:00', 9);
CREATE TABLE processC
(date_time datetime, status varchar(4));
INSERT INTO processC
VALUES
('2013-1-08 18:00:00', 'yes'),
('2013-1-08 19:00:00', 'yes'),
('2013-1-08 20:00:00', 'yes'),
('2013-1-08 21:00:00', 'yes'),
('2013-1-08 22:00:00', 'yes'),
('2013-1-08 23:00:00', 'no'),
('2013-1-08 00:00:00', 'no'),
('2013-1-08 01:00:00', 'no');
如您所见,每个过程的读数发生时间都不相同.
> ProcessA(如果发生),每隔5分钟进行一次
> ProcessB,读数发生在不可预测的时间,但通常在一小时内发生多次
> ProcessC将始终具有小时值(是或否).
首先,我想转换processB以便每隔5分钟读取一次数据,以便数据与processA对齐,然后使我能够以5分钟间隔标记对两个表进行简单连接.为了进行转换,应将每5分钟的数据设置为[-30,30)分钟窗口内最近的processB观测值.如果值是等距的,则取平均值.如果30分钟内没有可用的窗口,则将其设置为null.
一旦有了这些,我就可以使用ProcessC在%Y%m%d%H上进行简单的连接,使用如下所示的方法来获得一个最终表,其中所有数据都以5分钟的间隔对齐:
date_format(date_time, '%Y%m%d%H') = date_format(date_time, '%Y%m%d%H')
如果有人有任何指示/指导,我将不胜感激.我很感激.
样本输出:
'2013-1-8 22:10:00', 100, 4, yes
'2013-1-8 22:15:00', 100, 4, yes
'2013-1-8 22:30:00', 100, 5, yes
'2013-1-8 22:35:00', 100, 6, yes
'2013-1-8 22:40:00', 100, 7, yes
'2013-1-8 22:45:00', 100, 7, yes
'2013-1-8 22:50:00', 100, 7, yes
'2013-1-8 23:05:00', 100, 8, yes
'2013-1-8 23:10:00', 100, 8, no
'2013-1-8 23:20:00', 100, 2, no
'2013-1-8 23:25:00', 100, 2, no
'2013-1-8 23:35:00', 100, 3, no
'2013-1-9 00:05:00', 100, 3, no
'2013-1-9 00:10:00', 100, 6, no