第一题
问:请统计2021年国庆7天期间在北京接单至少3次的司机的平均接单数和平均兼职收入(暂不考虑平台佣金,直接计算完成的订单费用总额),结果保留三位小数。
输出示例:
city | avg_order_num | avg_income |
---|---|---|
北京 | 3.500 | 121.000 |
with table1 as(
select driver_id, count(order_id) as order_num,sum(fare) as income
from tb_get_car_order a
join tb_get_car_record b on a.order_id = b.order_id
where city = "北京"
and date_format(order_time, "%Y%m%d") between "20211001" and "20211007"
group by driver_id
having count(order_id) >= 3
)
select
"北京" as city,
round(avg(order_num),3) as avg_order_num,
round(avg(income),3) as avg_income
from table1
第二题
问:找到2021年10月有过取消订单记录的司机,计算他们每人全部已完成的有评分订单的平均评分及总体平均评分,保留1位小数。先按driver_id升序输出,再输出总体情况。
输出示例:
driver_id | avg_grade |
---|---|
202 | 4.3 |
203 | 4.8 |
总体 | 4.6 |
select
ifnull(driver_id,"总体) as driver_id,
round(avg(grade),1) as avg_grade
from tb_get_car_order
where driver_id in (select driver_id
from tb_get_car_order
where isnull(fare)
and date_format(order_time, "%Y-%m") = "2021-10")
and not isnull(grade)
group by driver_id
with rollup
第三题
问:已知用户登陆表log_t, 包含登陆日期和登录用户ID,请查询出截止到当前日期累积登录用户数及登录用户列表
输入示例:log_t
log_date | user_id |
---|---|
2024-01-01 | a |
2024-01-02 | a |
2024-01-02 | b |
2024-01-03 | b |
2024-01-04 | c |
2024-01-05 | b |
2024-01-05 | c |
2024-01-05 | d |
2024-01-05 | e |
输出示例:
log_date | user_cnt | user_list |
---|---|---|
2024-01-01 | 1 | ["a"] |
2024-01-02 | 2 | ["a","b"] |
2024-01-03 | 2 | ["a","b"] |
2024-01-04 | 3 | ["a","b","c"] |
2024-01-05 | 5 | ["a","b","c","d","e"] |
with temp as(
select log_date,
collect_set(user_id) over(order by log_date rows between unbounded preceding and current row) as user_list
from log_t
)
select log_date,
size(user_list) as user_cnt,
user_list
from temp
第四题
问:输出’2023-01-01‘当天,每个司机的最大接单间隔(按照秒计算),要注意存在司机上一单未完成就提前接到下一单的情况。
输入示例:
订单开始表
driver_id | order_id | start_time |
---|---|---|
1 | 1000 | 1672534145 |
1 | 1004 | 1672535945 |
1 | 1008 | 1672537745 |
1 | 1010 | 1672538945 |
订单结束表
driver_id | order_id | close_time |
---|---|---|
1 | 1000 | 1672536145 |
1 | 1004 | 1672537705 |
1 | 1008 | 1672538925 |
1 | 1010 | 1672539345 |
输出
driver_id | timegap |
---|---|
1 | 40 |
WITH temp AS (
SELECT
t1.driver_id, t1.order_id, t1.start_time, t2.close_time,
unix_timestamp(LEAD(t1.start_time, 1, NULL) OVER (PARTITION BY t1.driver_id ORDER BY t1.start_time)) - unix_timestamp(t2.close_time) AS timediff
FROM t1
LEFT JOIN t2 ON t1.order_id = t2.order_id
)
SELECT
driver_id, MAX(timediff) AS max_timediff
FROM temp
GROUP BY driver_id;
第五题
问:2023年1月1日完成首单的新司机的首单后的首周完单留存率(首单后的第一至第七天有完单行为),单个司机平均的7日及30日订单总金额ARPU(ARPU从首单当日算起,平均计算包括位留存的司机)
输入:t1(分区DATE, yyyy-MM-dd)
字段 | d_id | order_id | order_status | order_amount |
---|---|---|---|---|
中文 | 司机ID | 订单ID | 订单状态 | 订单金额 |
类型 | int | int | int | float |
备注 | —— | —— | 完单1;取消0 | 若取消则为空 |
字段 | d_id | create_date | first_order_date |
---|---|---|---|
中文 | 司机ID | 司机注册日期 | 司机首单日期 |
类型 | int | int | string(yyyy-MM-dd) |
备注 | —— | —— | 司机无首单则为空 |
输出:
Rate | ARPU_7d | ARPU_30d |
---|---|---|
0.6 | 20 | 35 |
WITH temp AS (
SELECT a.d_id, a.order_status, a.order_amount, a.DATE, b.first_order_date,
DATEDIFF(a.DATE, b.first_order_date) AS DATEGAP
FROM
(SELECT * FROM t1 WHERE DATE BETWEEN '2023-01-01' AND '2023-02-28') a
LEFT JOIN
(SELECT * FROM t2 WHERE first_order_date = '2023-01-01') b
ON a.d_id = b.d_id
),
retention AS (
SELECT
d_id,
COUNT(DISTINCT CASE WHEN DATEGAP BETWEEN 1 AND 7 AND order_status = 1 THEN d_id ELSE NULL END) AS is_rnt,
SUM(CASE WHEN DATEGAP BETWEEN 0 AND 6 THEN order_amount ELSE 0 END) AS arpu_7d,
SUM(CASE WHEN DATEGAP BETWEEN 0 AND 29 THEN order_amount ELSE 0 END) AS arpu_30d
FROM temp
GROUP BY d_id
)
SELECT
SUM(is_rnt) / COUNT(DISTINCT d_id) AS rate,
SUM(arpu_7d) / COUNT(DISTINCT d_id) AS arpu_7d_avg,
SUM(arpu_30d) / COUNT(DISTINCT d_id) AS arpu_30d_avg
FROM
retention;