Table: Traffic
+---------------+---------+ | Column Name | Type | +---------------+---------+ | user_id | int | | activity | enum | | activity_date | date | +---------------+---------+ There is no primary key for this table, it may have duplicate rows. The activity column is an ENUM type of ('login', 'logout', 'jobs', 'groups', 'homepage').
Write an SQL query that reports for every date within at most 90 days from today, the number of users that logged in for the first time on that date. Assume today is 2019-06-30.
The query result format is in the following example:
Traffic table: +---------+----------+---------------+ | user_id | activity | activity_date | +---------+----------+---------------+ | 1 | login | 2019-05-01 | | 1 | homepage | 2019-05-01 | | 1 | logout | 2019-05-01 | | 2 | login | 2019-06-21 | | 2 | logout | 2019-06-21 | | 3 | login | 2019-01-01 | | 3 | jobs | 2019-01-01 | | 3 | logout | 2019-01-01 | | 4 | login | 2019-06-21 | | 4 | groups | 2019-06-21 | | 4 | logout | 2019-06-21 | | 5 | login | 2019-03-01 | | 5 | logout | 2019-03-01 | | 5 | login | 2019-06-21 | | 5 | logout | 2019-06-21 | +---------+----------+---------------+ Result table: +------------+-------------+ | login_date | user_count | +------------+-------------+ | 2019-05-01 | 1 | | 2019-06-21 | 2 | +------------+-------------+ Note that we only care about dates with non zero user count. The user with id 5 first logged in on 2019-03-01 so he's not counted on 2019-06-21.
思路一:通过min()选择每组中最早的login_date
with tmp as
(select user_id, min(activity_date) login_date
from traffic
where activity = 'login'
group by user_id
having login_date between '2019-04-01' and '2019-06-30')
select login_date, count(*) user_count
from tmp
group by login_date
order by login_date
思路二:通过row_number()排序,然后通过num=1选择每组中最早的login_date
with tmp as
(select activity_date, user_id,
row_number() over (partition by user_id order by activity_date) num
from traffic
where user_id not in
(select distinct user_id
from traffic
where activity_date not between '2019-04-01' and '2019-06-30'
and activity = 'login')
and activity = 'login')
select activity_date as login_date, count(*) user_count
from tmp
where num = 1
group by activity_date
order by activity_date