order订单信息表 里记录了巴西乘客使用打车软件的信息,包括订单呼叫、应答、取消、完单时间。
-order_id:订单ID,呼叫订单识别号
-passenger_id:乘客ID,乘客识别号
-call_time:呼叫时间,乘客从应用上发出需要用车的请求的时间点(北京时间)
-grab_time:应答时间,司机点击接单的时间点(北京时间)
-cancel_time:取消时间,司机或乘客取消订单的时间(北京时间)
-finish_time:完单时间,司机点击到达目的地的时间点(北京时间)
注意:
(1)表中的时间是北京时间,巴西比中国慢11小时。
(2)应答时间列的数据值如果是“1970”年,表示该订单没有司机应答,属于无效订单。
指标含义:
应答率:呼叫订单被应答的比例
完单率:呼叫订单被完成订单的比例
呼叫应答时间:被应答订单从呼叫到被应答平均时长
1、创建表
CREATE TABLE IF NOT EXISTS login2
CREATE TABLE IF NOT EXISTS order2(
order_id INT,
passenger_id VARCHAR(10),
call_time DATETIME,
grab_time DATETIME,
cancel_time DATETIME,
finish_time DATETIME
);
2、插入数据
INSERT INTO order2 VALUES(1,'001','2020/3/20 18:08:00','2020/3/20 18:09:00','1971/1/01 00:00:00','2020/3/20 19:08:00'),
(2,'002','2020/3/20 18:08:00','2020/3/20 18:10:00','1971/1/01 00:00:00','2020/3/20 20:08:00'),
(3,'002','2020/3/20 18:08:00','2020/3/20 18:08:00','2020/3/20 18:09:00','1971/1/01 00:00:00'),
(4,'003','2020/3/24 18:08:00','1971/1/01 00:00:00','2020/3/24 18:09:00','1971/1/01 00:00:00'),
(5,'004','2020/3/24 18:08:00','1971/1/01 00:00:00','1971/1/01 00:00:00','1971/1/01 00:00:00'),
(6,'005','2020/3/27 18:08:00','2020/3/27 18:09:00','1971/1/01 00:00:00','2020/3/27 18:40:00');
3、逻辑SQL
以下所有问题都是基于巴西时间来的,要转换为巴西时间
第一种:
要将表中所有时间减11个小时DATE_SUB(datetime,INTERVAL 11 HOUR)
第二种:
时区转换,北京为东八区,GMT + 08:00。CONVERT_TZ(datetime,'+08:00','-03:00')
SELECT order_id,
passenger_id,
DATE_SUB(call_time,INTERVAL 11 HOUR) AS call_time,
DATE_SUB(grab_time,INTERVAL 11 HOUR) AS grab_time,
DATE_SUB(cancel_time,INTERVAL 11 HOUR) AS cancel_time,
DATE_SUB(finish_time,INTERVAL 11 HOUR) AS finish_time
FROM order2;
或
SELECT order_id,
passenger_id,
CONVERT_TZ(call_time,'+08:00','-03:00') AS call_time,
CONVERT_TZ(grab_time,'+08:00','-03:00') AS grab_time,
CONVERT_TZ(cancel_time,'+08:00','-03:00') AS cancel_time,
CONVERT_TZ(finish_time,'+08:00','-03:00') AS finish_time
FROM order2;
(1)订单的应答率、完单率
应答率:呼叫订单被应答的比例,即COUNT YEAR(grab_time)<>1970 的订单数/COUNT所有呼叫的订单数
完单率:呼叫订单被完成订单的比例,即COUNT YEAR(finish_time)<>1970 的订单数/COUNT所有呼叫的订单数
mysql> WITH t1 AS (SELECT order_id,
-> passenger_id,
-> DATE_SUB(call_time,INTERVAL 11 HOUR) AS call_time,
-> DATE_SUB(grab_time,INTERVAL 11 HOUR) AS grab_time,
-> DATE_SUB(cancel_time,INTERVAL 11 HOUR) AS cancel_time,
-> DATE_SUB(finish_time,INTERVAL 11 HOUR) AS finish_time
-> FROM order2)
-> SELECT COUNT(CASE WHEN YEAR(grab_time)=1970 THEN NULL ELSE grab_time END)/COUNT(*) AS '应答率',
-> COUNT(CASE WHEN YEAR(finish_time)=1970 THEN NULL ELSE grab_time END)/COUNT(*) AS '完单率'
-> FROM t1;
+-----------+-----------+
| 应答率 | 完单率 |
+-----------+-----------+
| 0.6667 | 0.5000 |
+-----------+-----------+
1 row in set (0.01 sec)
对显示有要求的可用ROUND()、CONCAT()函数转
(2)呼叫应答时间有多长
TIMESTAMPDIFF(unit,datetime_expr1,datetime_expr2)
unit可以为year,month,day,hour,minute,second等
区别于DATEDIFF(date1,date2)
返回相差的天数
mysql> WITH t1 AS (SELECT order_id,
-> passenger_id,
-> CONVERT_TZ(call_time,'+08:00','-03:00') AS call_time,
-> CONVERT_TZ(grab_time,'+08:00','-03:00') AS grab_time,
-> CONVERT_TZ(cancel_time,'+08:00','-03:00') AS cancel_time,
-> CONVERT_TZ(finish_time,'+08:00','-03:00') AS finish_time
-> FROM order2)
-> SELECT SUM(TIMESTAMPDIFF(minute,call_time,grab_time))/COUNT(*) AS '平均应答时间'
-> FROM t1
-> WHERE YEAR(grab_time)<>1970;
+--------------------+
| 平均应答时间 |
+--------------------+
| 1.0000 |
+--------------------+
1 row in set (0.00 sec)
(3)从这一周的数据来看,呼叫量最高的是哪一个小时(当地时间)?
呼叫量最少的是哪一个小时(当地时间)?
排序问题一般有几种情况:
order by
rank 三种窗口函数
where 子查询,eg比小明分数高的不超过2个
第一种:如果仅理解这周的每个小时,即不要考虑每天,只需要考虑本周的每个小时
考虑到呼叫量重复的情况,采用DENSE_RANK() OVER(),有并列名次,连续排序
根据题目要满足的条件:
①本周,因为为测试数据,根据数据内容指定了日期2020-3-27,我理解本周为当天开始往前数7天,故用DATEDIFF,如果理解为某一周,BETWEEN指定日期范围就行
②每小时呼叫量,按HOUR(call_time)分组,求COUNT呼叫量
③呼叫量最高,DENSE_RANK() 按降序排序,取排序为1的
同理,呼叫量最低,DENSE_RANK() 按升序排序,取排序为1的
mysql> WITH t1 AS (SELECT order_id,
-> passenger_id,
-> CONVERT_TZ(call_time,'+08:00','-03:00') AS call_time,
-> CONVERT_TZ(grab_time,'+08:00','-03:00') AS grab_time,
-> CONVERT_TZ(cancel_time,'+08:00','-03:00') AS cancel_time,
-> CONVERT_TZ(finish_time,'+08:00','-03:00') AS finish_time
-> FROM order2)
-> SELECT hour AS '呼叫量最高的小时'
-> FROM
-> (SELECT HOUR(call_time) AS hour,
COUNT(order_id) AS order_co, # ②HOUR分组求呼叫量
-> DENSE_RANK() OVER(ORDER BY COUNT(order_id) DESC) AS rk # ③排序,取排序为1
-> FROM t1
-> WHERE DATEDIFF('2020-3-27',call_time)<=7 # ①本周
-> GROUP BY HOUR(call_time))t2 # ②HOUR分组求呼叫量
-> WHERE rk=1; # ③排序,取排序为1
+--------------------------+
| 呼叫量最高的小时 |
+--------------------------+
| 7 |
+--------------------------+
1 row in set (0.00 sec)
**第二种:**如果要考虑本周每个小时的平均呼叫量最高/最低的小时,其实,想来想去纯属脑洞
根据题目要满足的条件
①本周,同上
②每小时呼叫量,如果求出每个小时总呼叫量/7,同上面并没有什么不同,就是除了7;
如果某个小时没有呼叫不被平均,比如周一至周三的下午6点分别有14个呼叫量,平均为14,而不是6
③排序,同上,DENSE_RANK()按平均值排序
纯属脑洞,不编了,就是如下
mysql> WITH t1 AS (SELECT order_id,
-> passenger_id,
-> CONVERT_TZ(call_time,'+08:00','-03:00') AS call_time,
-> CONVERT_TZ(grab_time,'+08:00','-03:00') AS grab_time,
-> CONVERT_TZ(cancel_time,'+08:00','-03:00') AS cancel_time,
-> CONVERT_TZ(finish_time,'+08:00','-03:00') AS finish_time
-> FROM order2)
-> SELECT hour AS '呼叫量最多的小时'
-> FROM
-> (SELECT hour,AVG(order_co) AS avg,DENSE_RANK() OVER(ORDER BY AVG(order_co) DESC ) AS rk
-> FROM
-> (SELECT DATE(call_time) AS date,HOUR(call_time) AS hour,COUNT(order_id) AS order_co
-> FROM t1
-> WHERE DATEDIFF('2020-3-27',call_time)<=7
-> GROUP BY DATE(call_time),HOUR(call_time))t2
-> GROUP BY hour)t3
-> WHERE rk=1;
+--------------------------+
| 呼叫量最多的小时 |
+--------------------------+
| 7 |
+--------------------------+
1 row in set (0.00 sec)
(3)呼叫订单第二天继续呼叫的用户比例有多少
mysql> WITH t1 AS (SELECT order_id,
-> passenger_id,
-> CONVERT_TZ(call_time,'+08:00','-03:00') AS call_time,
-> CONVERT_TZ(grab_time,'+08:00','-03:00') AS grab_time,
-> CONVERT_TZ(cancel_time,'+08:00','-03:00') AS cancel_time,
-> CONVERT_TZ(finish_time,'+08:00','-03:00') AS finish_time
-> FROM order2)
-> SELECT DATE(t1.call_time),COUNT(DISTINCT CASE WHEN DATEDIFF(t1.call_time,t2.call_time)=1 THEN t1.passenger_id ELSE NULL END)/COUNT(DISTINCT t1.passenger_id) AS '次日留存'
-> FROM t1 t1 LEFT JOIN t1 t2 ON t1.passenger_id = t2.passenger_id
-> GROUP BY DATE(t1.call_time);
+--------------------+--------------+
| DATE(t1.call_time) | 次日留存 |
+--------------------+--------------+
| 2020-03-20 | 0.0000 |
| 2020-03-24 | 0.0000 |
| 2020-03-27 | 0.0000 |
+--------------------+--------------+
3 rows in set (0.00 sec)
注:题目来自于《猴子数据分析》公众号