This question has been asked before but I am facing a slightly different problem.
I have a table which logs events and stores their timestamps (as datetime). I need to be able to break up time into chunks and get number of events that occurred in that interval. The interval can be custom (Say from 5 minutes to 1 hour and even beyond).
The obvious solution is to convert the datetime to unix_timestamp divide it by number of seconds in the interval, take its floor function and multiply it back by the number of seconds. Finally convert the unix_timestamp back to the datetime format.
This works fine for small intervals.
select
from_unixtime(floor(unix_timestamp(event.timestamp)/300)*300) as start_time,
count(*) as total
from event
where timestamp>='2012-08-03 00:00:00'
group by start_time;
This gives the correct output
+---------------------+-------+
| start_time | total |
+---------------------+-------+
| 2012-08-03 00:00:00 | 11 |
| 2012-08-03 00:05:00 | 4 |
| 2012-08-03 00:10:00 | 4 |
| 2012-08-03 00:15:00 | 7 |
| 2012-08-03 00:20:00 | 8 |
| 2012-08-03 00:25:00 | 1 |
| 2012-08-03 00:30:00 | 1 |
| 2012-08-03 00:35:00 | 3 |
| 2012-08-03 00:40:00 | 3 |
| 2012-08-03 00:45:00 | 5 |
~~~~~OUTPUT SNIPPED~~~~~~~~~~~~
But if I increase the interval to say 1 hour (3600 sec)
mysql> select from_unixtime(floor(unix_timestamp(event.timestamp)/3600)*3600) as start_time, count(*) as total from event where timestamp>='2012-08-03 00:00:00' group by start_time;
+---------------------+-------+
| start_time | total |
+---------------------+-------+
| 2012-08-02 23:30:00 | 35 |
| 2012-08-03 00:30:00 | 30 |
| 2012-08-03 01:30:00 | 12 |
| 2012-08-03 02:30:00 | 18 |
| 2012-08-03 03:30:00 | 12 |
| 2012-08-03 04:30:00 | 4 |
| 2012-08-03 05:30:00 | 3 |
| 2012-08-03 06:30:00 | 13 |
| 2012-08-03 07:30:00 | 269 |
| 2012-08-03 08:30:00 | 681 |
| 2012-08-03 09:30:00 | 1523 |
| 2012-08-03 10:30:00 | 911 |
+---------------------+-------+
The reason, as far as I could gauge, for the boundaries not being set properly is that unix_timestamp will convert time from my local timezone (GMT + 0530) to UTC and then output the numerical value.
So a value like 2012-08-03 00:00:00 will actually be 2012-08-02 18:30:00. Dividing and using floor will set the minutes part to 00. But when I use from_unixtime, it will convert it back to GMT + 0530 and hence give me intervals that begin at 30 mins.
How do I ensure the query works correctly irrespective of the timezone? I use MySQL 5.1.52 so to_seconds() is not available
EDIT:
The query should also fire correctly irrespective of the interval (can be hours, minutes, days). A generic solution would be appreciated
解决方案
You can use TIMESTAMPDIFF to group by intervals of time:
For a specified interval of hours, you can use:
SELECT '2012-08-03 00:00:00' +
INTERVAL FLOOR(TIMESTAMPDIFF(HOUR, '2012-08-03 00:00:00', timestamp) / ) * HOUR AS start_time,
COUNT(*) AS total
FROM event
WHERE timestamp >= '2012-08-03 00:00:00'
GROUP BY start_time
Replace the occurances of 2012-08-03 00:00:00 with your minimum input date.
is your specified interval in hours (every 2 hours, 3 hours, etc.), and you can do the same for minutes:
SELECT '2012-08-03 00:00:00' +
INTERVAL FLOOR(TIMESTAMPDIFF(MINUTE, '2012-08-03 00:00:00', timestamp) / ) * MINUTE AS start_time,
COUNT(*) AS total
FROM event
WHERE timestamp >= '2012-08-03 00:00:00'
GROUP BY start_time
Where is your specified interval in minutes (every 45 minutes, 90 minutes, etc).
Be sure you're passing in your minimum input date (in this example 2012-08-03 00:00:00) as the second parameter to TIMESTAMPDIFF.
EDIT: If you don't want to worry about which interval unit to pick in the TIMESTAMPDIFF function, then of course just do the interval by seconds (300 = 5 minutes, 3600 = 1 hour, 7200 = 2 hours, etc.)
SELECT '2012-08-03 00:00:00' +
INTERVAL FLOOR(TIMESTAMPDIFF(SECOND, '2012-08-03 00:00:00', timestamp) / ) * SECOND AS start_time,
COUNT(*) AS total
FROM event
WHERE timestamp >= '2012-08-03 00:00:00'
GROUP BY start_time
EDIT2: To address your comment pertaining to reducing the number of areas in the statement where you have to pass in your minimum parameter date, you can use:
SELECT b.mindate +
INTERVAL FLOOR(TIMESTAMPDIFF(SECOND, b.mindate, timestamp) / ) * SECOND AS start_time,
COUNT(*) AS total
FROM event
JOIN (SELECT '2012-08-03 00:00:00' AS mindate) b ON timestamp >= b.mindate
GROUP BY start_time
And simply pass in your minimum datetime parameter once into the join subselect.
You can even make a second column in the join subselect for your seconds interval (e.g. 3600) and name the column something like secinterval... then change the 's to b.secinterval, so you only have to pass in your minimum date parameter AND interval one time each.