1107 每日新用户统计
SQL架构
Create table If Not Exists Traffic_1107 (user_id int, activity ENUM('login', 'logout', 'jobs', 'groups', 'homepage'), activity_date date);
Truncate table Traffic_1107;
insert into Traffic_1107 (user_id, activity, activity_date) values ('1', 'login', '2019-05-01');
insert into Traffic_1107 (user_id, activity, activity_date) values ('1', 'homepage', '2019-05-01');
insert into Traffic_1107 (user_id, activity, activity_date) values ('1', 'logout', '2019-05-01');
insert into Traffic_1107 (user_id, activity, activity_date) values ('2', 'login', '2019-06-21');
insert into Traffic_1107 (user_id, activity, activity_date) values ('2', 'logout', '2019-06-21');
insert into Traffic_1107 (user_id, activity, activity_date) values ('3', 'login', '2019-01-01');
insert into Traffic_1107 (user_id, activity, activity_date) values ('3', 'jobs', '2019-01-01');
insert into Traffic_1107 (user_id, activity, activity_date) values ('3', 'logout', '2019-01-01');
insert into Traffic_1107 (user_id, activity, activity_date) values ('4', 'login', '2019-06-21');
insert into Traffic_1107 (user_id, activity, activity_date) values ('4', 'groups', '2019-06-21');
insert into Traffic_1107 (user_id, activity, activity_date) values ('4', 'logout', '2019-06-21');
insert into Traffic_1107 (user_id, activity, activity_date) values ('5', 'login', '2019-03-01');
insert into Traffic_1107 (user_id, activity, activity_date) values ('5', 'logout', '2019-03-01');
insert into Traffic_1107 (user_id, activity, activity_date) values ('5', 'login', '2019-06-21');
insert into Traffic_1107 (user_id, activity, activity_date) values ('5', 'logout', '2019-06-21');
Traffic 表:
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| activity | enum |
| activity_date | date |
+---------------+---------+
该表没有主键,它可能有重复的行。
activity 列是 ENUM 类型,可能取 ('login', 'logout', 'jobs', 'groups', 'homepage') 几个值之一。
编写一个 SQL 查询,以查询从今天起最多 90 天内,每个日期该日期首次登录的用户数。假设今天是 2019-06-30.
查询结果格式如下例所示:
Traffic 表:
+---------+----------+---------------+
| user_id | activity | activity_date |
+---------+----------+---------------+
| 1 | login | 2019-05-01 |
| 1 | homepage | 2019-05-01 |
| 1 | logout | 2019-05-01 |
| 2 | login | 2019-06-21 |
| 2 | logout | 2019-06-21 |
| 3 | login | 2019-01-01 |
| 3 | jobs | 2019-01-01 |
| 3 | logout | 2019-01-01 |
| 4 | login | 2019-06-21 |
| 4 | groups | 2019-06-21 |
| 4 | logout | 2019-06-21 |
| 5 | login | 2019-03-01 |
| 5 | logout | 2019-03-01 |
| 5 | login | 2019-06-21 |
| 5 | logout | 2019-06-21 |
+---------+----------+---------------+
Result 表:
+------------+-------------+
| login_date | user_count |
+------------+-------------+
| 2019-05-01 | 1 |
| 2019-06-21 | 2 |
+------------+-------------+
请注意,我们只关心用户数非零的日期.
ID 为 5 的用户第一次登陆于 2019-03-01,因此他不算在 2019-06-21 的的统计内。
解题
-- 1、先找出所有用户第一次登录的信息
-- 2、筛选出距今90天内的信息
-- 3、按照日期聚合计数
select first_date login_date, count(user_id) user_count
from
(select user_id, activity, min(activity_date) first_date from Traffic_1107
where activity = 'login' group by user_id) t1
where datediff("2019-06-30", first_date) <= 90
group by first_date;