mysql 直方图 索引,时间间隔数据的MySQL查询直方图

I have an event input of this type

event user

event start

event end

event type

Inserted to MySql table, each in its own row with user+start as primary key.

I need to query an histogram for a type by time interval (say minute) counting events occurred on each time interval.

something like:

SELECT count(*) as hits FROM events

WHERE type="browsing"

GROUP BY time_diff("2015-1-1" AND "2015-1-2") / 60 * second

but I could not find any way to do that in SQL besides writing code, any idea?

Sample data

user, start, end, type

1, 2015-1-1 12:00:00, 2015-1-1 12:03:59, browsing

2, 2015-1-1 12:03:00, 2015-1-1 12:06:00, browsing

2, 2015-1-1 12:03:00, 2015-1-1 12:06:00, eating

3, 2015-1-1 12:03:00, 2015-1-1 12:08:00, browsing

the result should look like this:

^

count |

browsing |

users | *

| * * * *

| * * * * * * * *

--|--|--|--|--|--|--|--|--|--> minute

0 1 2 3 4 5 6 7 8 9

解决方案

You can do this using group by with the level that you want. Here is an example using the data you gave:

First the SQL to create the table and populate it. The ID column here isn't "needed" but it is recommended if the table will be large or have indexes on it.

CREATE TABLE `test`.`events` (

`id` INT NOT NULL AUTO_INCREMENT,

`user` INT NULL,

`start` DATETIME NULL,

`end` DATETIME NULL,

`type` VARCHAR(45) NULL,

PRIMARY KEY (`id`));

INSERT INTO events (user, start, end, type) VALUES

(1, '2015-1-1 12:00:00', '2015-1-1 12:03:59', 'browsing'),

(2, '2015-1-1 12:03:00', '2015-1-1 12:06:00', 'browsing'),

(2, '2015-1-1 12:03:00', '2015-1-1 12:06:00', 'eating'),

(3, '2015-1-1 12:03:00', '2015-1-1 12:08:00', 'browsing');

To get a list of ordered pairs of number of minutes duration to number of events:

The query can then be easily written using the timestampdiff fuction, as shown below:

SELECT

TIMESTAMPDIFF(MINUTE, start, end) as minutes,

COUNT(*) AS numEvents

FROM

test.events

GROUP BY TIMESTAMPDIFF(MINUTE, start, end)

The output:

minutes numEvents

3 3

5 1

The first parameter in the select can be one of FRAC_SECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, or YEAR.

Here are some more examples of queries you can do:

Events by hour (floor function is applied)

SELECT

TIMESTAMPDIFF(HOUR, start, end) as hours,

COUNT(*) AS numEvents

FROM

test.events

GROUP BY TIMESTAMPDIFF(HOUR, start, end)

**Events by hour with better formatting **

SELECT

CONCAT("

COUNT(*) AS numEvents

FROM

test.events

GROUP BY TIMESTAMPDIFF(HOUR, start, end)

You can group by a variety of options, but this should definitely get you started. Most plotting packages will allow you to specify arbitrary x y coordinates, so you don't need to worry about the missing values on the x axis.

To get a list of ordered pairs of number of events at a specific time (for logging):

Note that this is left for reference.

Now for the queries. First you have to pick which item you want to use for the grouping. For example, a task might take more than a minute, so the start and end would be in different minutes. For all these examples, I am basing them off of the start time, since that is when the event actually took place.

To group event counts by minute, you can use a query like this:

SELECT

DATE_FORMAT(start, '%M %e, %Y %h:%i %p') as minute,

count(*) AS numEvents

FROM test.events

GROUP BY YEAR(start), MONTH(start), DAYOFMONTH(start), HOUR(start), MINUTE(start);

Note how this groups by all the items, starting with year, going the minute. I also have the minute displayed as a label. The resulting output looks like this:

minute numEvents

January 1, 2015 12:00 PM 1

January 1, 2015 12:03 PM 3

This is data that you could then take using php and prepare it for display by one of the many graphing libraries out there, plotting the minute column on the x axis, and plotting the numEvents on the y axis.

Here are some more examples of queries you can do:

Events by hour

SELECT

DATE_FORMAT(start, '%M %e, %Y %h %p') as hour,

count(*) AS numEvents

FROM test.events

GROUP BY YEAR(start), MONTH(start), DAYOFMONTH(start), HOUR(start);

Events by date

SELECT

DATE_FORMAT(start, '%M %e, %Y') as date,

count(*) AS numEvents

FROM test.events

GROUP BY YEAR(start), MONTH(start), DAYOFMONTH(start);

Events by month

SELECT

DATE_FORMAT(start, '%M %Y') as date,

count(*) AS numEvents

FROM test.events

GROUP BY YEAR(start), MONTH(start);

Events by year

SELECT

DATE_FORMAT(start, '%Y') as date,

count(*) AS numEvents

FROM test.events

GROUP BY YEAR(start);

I should also point out that if you have an index on the start column for this table, these queries will complete quickly, even with hundreds of millions of rows.

Hope this helps! Let me know if you have any other questions about this.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值