SQL | 部分时间函数 | 打车问题

order订单信息表 里记录了巴西乘客使用打车软件的信息,包括订单呼叫、应答、取消、完单时间。

-order_id:订单ID,呼叫订单识别号
-passenger_id:乘客ID,乘客识别号
-call_time:呼叫时间,乘客从应用上发出需要用车的请求的时间点(北京时间)
-grab_time:应答时间,司机点击接单的时间点(北京时间)
-cancel_time:取消时间,司机或乘客取消订单的时间(北京时间)
-finish_time:完单时间,司机点击到达目的地的时间点(北京时间)

注意:
(1)表中的时间是北京时间,巴西比中国慢11小时。
(2)应答时间列的数据值如果是“1970”年,表示该订单没有司机应答,属于无效订单。
指标含义:
应答率:呼叫订单被应答的比例
完单率:呼叫订单被完成订单的比例
呼叫应答时间:被应答订单从呼叫到被应答平均时长
1、创建表

CREATE TABLE IF NOT EXISTS login2
CREATE TABLE IF NOT EXISTS order2(
order_id INT,
passenger_id VARCHAR(10),
call_time DATETIME,
grab_time DATETIME,
cancel_time DATETIME,
finish_time DATETIME
);

2、插入数据

INSERT INTO order2 VALUES(1,'001','2020/3/20 18:08:00','2020/3/20 18:09:00','1971/1/01 00:00:00','2020/3/20 19:08:00'),
						 (2,'002','2020/3/20 18:08:00','2020/3/20 18:10:00','1971/1/01 00:00:00','2020/3/20 20:08:00'),
						 (3,'002','2020/3/20 18:08:00','2020/3/20 18:08:00','2020/3/20 18:09:00','1971/1/01 00:00:00'),
						 (4,'003','2020/3/24 18:08:00','1971/1/01 00:00:00','2020/3/24 18:09:00','1971/1/01 00:00:00'),
						 (5,'004','2020/3/24 18:08:00','1971/1/01 00:00:00','1971/1/01 00:00:00','1971/1/01 00:00:00'),
						 (6,'005','2020/3/27 18:08:00','2020/3/27 18:09:00','1971/1/01 00:00:00','2020/3/27 18:40:00');

3、逻辑SQL
以下所有问题都是基于巴西时间来的,要转换为巴西时间
第一种:
要将表中所有时间减11个小时DATE_SUB(datetime,INTERVAL 11 HOUR)

第二种:
时区转换,北京为东八区,GMT + 08:00。CONVERT_TZ(datetime,'+08:00','-03:00')

SELECT  order_id,
	    passenger_id,
	    DATE_SUB(call_time,INTERVAL 11 HOUR) AS call_time,
	    DATE_SUB(grab_time,INTERVAL 11 HOUR) AS grab_time,
	    DATE_SUB(cancel_time,INTERVAL 11 HOUR) AS cancel_time,
	    DATE_SUB(finish_time,INTERVAL 11 HOUR) AS finish_time
FROM order2;

SELECT  order_id,
	    passenger_id,
		CONVERT_TZ(call_time,'+08:00','-03:00') AS call_time,
		CONVERT_TZ(grab_time,'+08:00','-03:00') AS grab_time,
		CONVERT_TZ(cancel_time,'+08:00','-03:00') AS cancel_time,
		CONVERT_TZ(finish_time,'+08:00','-03:00') AS finish_time
 FROM order2;

(1)订单的应答率、完单率
应答率:呼叫订单被应答的比例,即COUNT YEAR(grab_time)<>1970 的订单数/COUNT所有呼叫的订单数
完单率:呼叫订单被完成订单的比例,即COUNT YEAR(finish_time)<>1970 的订单数/COUNT所有呼叫的订单数

mysql> WITH t1 AS (SELECT order_id,
	->   				passenger_id,
    ->    				DATE_SUB(call_time,INTERVAL 11 HOUR) AS call_time,
    ->    				DATE_SUB(grab_time,INTERVAL 11 HOUR) AS grab_time,
    ->    				DATE_SUB(cancel_time,INTERVAL 11 HOUR) AS cancel_time,
    ->    				DATE_SUB(finish_time,INTERVAL 11 HOUR) AS finish_time
    ->    		   FROM order2)
    -> SELECT COUNT(CASE WHEN YEAR(grab_time)=1970 THEN NULL ELSE grab_time END)/COUNT(*) AS '应答率',
    ->        COUNT(CASE WHEN YEAR(finish_time)=1970 THEN NULL ELSE grab_time END)/COUNT(*) AS '完单率'
    -> FROM t1;
+-----------+-----------+
| 应答率    | 完单率    |
+-----------+-----------+
|    0.6667 |    0.5000 |
+-----------+-----------+
1 row in set (0.01 sec)

对显示有要求的可用ROUND()、CONCAT()函数转

(2)呼叫应答时间有多长
TIMESTAMPDIFF(unit,datetime_expr1,datetime_expr2) unit可以为year,month,day,hour,minute,second等

区别于DATEDIFF(date1,date2) 返回相差的天数

mysql> WITH t1 AS (SELECT   order_id,
    ->       				passenger_id,
    ->       				CONVERT_TZ(call_time,'+08:00','-03:00') AS call_time,
    ->       				CONVERT_TZ(grab_time,'+08:00','-03:00') AS grab_time,
    ->       				CONVERT_TZ(cancel_time,'+08:00','-03:00') AS cancel_time,
    ->       				CONVERT_TZ(finish_time,'+08:00','-03:00') AS finish_time
    ->      		FROM order2)
    -> SELECT SUM(TIMESTAMPDIFF(minute,call_time,grab_time))/COUNT(*) AS '平均应答时间'
    -> FROM  t1
    -> WHERE YEAR(grab_time)<>1970;
+--------------------+
| 平均应答时间       |
+--------------------+
|             1.0000 |
+--------------------+
1 row in set (0.00 sec)

(3)从这一周的数据来看,呼叫量最高的是哪一个小时(当地时间)?
呼叫量最少的是哪一个小时(当地时间)?

排序问题一般有几种情况:
order by
rank 三种窗口函数
where 子查询,eg比小明分数高的不超过2个

第一种:如果仅理解这周的每个小时,即不要考虑每天,只需要考虑本周的每个小时
考虑到呼叫量重复的情况,采用DENSE_RANK() OVER(),有并列名次,连续排序
根据题目要满足的条件:
①本周,因为为测试数据,根据数据内容指定了日期2020-3-27,我理解本周为当天开始往前数7天,故用DATEDIFF,如果理解为某一周,BETWEEN指定日期范围就行
②每小时呼叫量,按HOUR(call_time)分组,求COUNT呼叫量
③呼叫量最高,DENSE_RANK() 按降序排序,取排序为1的
同理,呼叫量最低,DENSE_RANK() 按升序排序,取排序为1的

mysql> WITH t1 AS (SELECT	 order_id,
    ->       				passenger_id,
    ->       				CONVERT_TZ(call_time,'+08:00','-03:00') AS call_time,
    ->       				CONVERT_TZ(grab_time,'+08:00','-03:00') AS grab_time,
    ->       				CONVERT_TZ(cancel_time,'+08:00','-03:00') AS cancel_time,
    ->       				CONVERT_TZ(finish_time,'+08:00','-03:00') AS finish_time
    ->      		FROM order2)
    -> SELECT hour AS '呼叫量最高的小时'
    -> FROM
    -> 		(SELECT HOUR(call_time) AS hour,
    			    COUNT(order_id) AS order_co, # ②HOUR分组求呼叫量
    ->    		    DENSE_RANK() OVER(ORDER BY COUNT(order_id) DESC) AS rk  # ③排序,取排序为1
    -> 		FROM t1
    -> 		WHERE DATEDIFF('2020-3-27',call_time)<=7  # ①本周
    -> 		GROUP BY HOUR(call_time))t2 # ②HOUR分组求呼叫量
    -> WHERE rk=1; # ③排序,取排序为1
+--------------------------+
| 呼叫量最高的小时         |
+--------------------------+
|                        7 |
+--------------------------+
1 row in set (0.00 sec)

**第二种:**如果要考虑本周每个小时的平均呼叫量最高/最低的小时,其实,想来想去纯属脑洞
根据题目要满足的条件
①本周,同上
②每小时呼叫量,如果求出每个小时总呼叫量/7,同上面并没有什么不同,就是除了7;
如果某个小时没有呼叫不被平均,比如周一至周三的下午6点分别有14个呼叫量,平均为14,而不是6
③排序,同上,DENSE_RANK()按平均值排序
纯属脑洞,不编了,就是如下

mysql> WITH t1 AS (SELECT   order_id,
    ->       				passenger_id,
    ->       				CONVERT_TZ(call_time,'+08:00','-03:00') AS call_time,
    ->       				CONVERT_TZ(grab_time,'+08:00','-03:00') AS grab_time,
    ->       				CONVERT_TZ(cancel_time,'+08:00','-03:00') AS cancel_time,
    ->       				CONVERT_TZ(finish_time,'+08:00','-03:00') AS finish_time
    ->      		FROM order2)
    -> SELECT hour AS '呼叫量最多的小时'
    -> FROM
    -> 		(SELECT hour,AVG(order_co) AS avg,DENSE_RANK() OVER(ORDER BY AVG(order_co) DESC ) AS rk
    -> 		FROM
    -> 			(SELECT DATE(call_time) AS date,HOUR(call_time) AS hour,COUNT(order_id) AS order_co
    -> 			FROM t1
    -> 			WHERE DATEDIFF('2020-3-27',call_time)<=7
    -> 			GROUP BY DATE(call_time),HOUR(call_time))t2
    -> 		GROUP BY hour)t3
    -> WHERE rk=1;
+--------------------------+
| 呼叫量最多的小时         |
+--------------------------+
|                        7 |
+--------------------------+
1 row in set (0.00 sec)

(3)呼叫订单第二天继续呼叫的用户比例有多少

mysql> WITH t1 AS (SELECT   order_id,
    ->       				passenger_id,
    ->       				CONVERT_TZ(call_time,'+08:00','-03:00') AS call_time,
    ->       				CONVERT_TZ(grab_time,'+08:00','-03:00') AS grab_time,
    ->       				CONVERT_TZ(cancel_time,'+08:00','-03:00') AS cancel_time,
    ->       				CONVERT_TZ(finish_time,'+08:00','-03:00') AS finish_time
    ->      		FROM order2)
    -> SELECT DATE(t1.call_time),COUNT(DISTINCT CASE WHEN DATEDIFF(t1.call_time,t2.call_time)=1 THEN t1.passenger_id ELSE NULL END)/COUNT(DISTINCT t1.passenger_id) AS '次日留存'
    -> FROM t1 t1 LEFT JOIN t1 t2 ON t1.passenger_id = t2.passenger_id
    -> GROUP BY DATE(t1.call_time);
+--------------------+--------------+
| DATE(t1.call_time) | 次日留存     |
+--------------------+--------------+
| 2020-03-20         |       0.0000 |
| 2020-03-24         |       0.0000 |
| 2020-03-27         |       0.0000 |
+--------------------+--------------+
3 rows in set (0.00 sec)

注:题目来自于《猴子数据分析》公众号

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值