题目
现有一个交易表Transactions,内有id,country,state(列类型为 ["approved", "declined"]),amount金额,trans_date交易日期。编写一个 sql 查询来查找每个月和每个国家/地区的事务数及其总金额、已批准的事务数及其总金额。
以 任意顺序 返回结果表。
数据准备
Create table If Not Exists Transactions (id int, country varchar(4), state enum('approved', 'declined'), amount int, trans_date date);
Truncate table Transactions;
insert into Transactions (id, country, state, amount, trans_date) values ('121', 'US', 'approved', '1000', '2018-12-18');
insert into Transactions (id, country, state, amount, trans_date) values ('122', 'US', 'declined', '2000', '2018-12-19');
insert into Transactions (id, country, state, amount, trans_date) values ('123', 'US', 'approved', '2000', '2019-01-01');
insert into Transactions (id, country, state, amount, trans_date) values ('124', null, 'approved', '2000', '2019-01-07');
解答注意
这道题总体难度不大,共有两个解题思路
- 第一个是最简洁的
- 第二个稍微麻烦点,主要是因为有个地方(当left join的两个表里关联字段有null值,要怎么处理正确匹配上两个表的null)需要注意下,所以也放上来了
- 思路简单,有几点需要注意,分别放在两个方法下讲讲
方法一 :直接法(在统计与筛选同时走)
COUNT()
函数- 当使用
COUNT
函数统计某个列的非NULL值时,如果该列中包含NULL值,这些NULL值不会被计入总数。例如,如果有一个列,其中有5个非NULL值和3个NULL值,那么COUNT(列名)
的结果将是5。 - 当使用
COUNT
函数统计全表的行数时COUNT(*)
不关注列值是否为NULL
,它只计算表中的行数,只要行存在,就会被计数,即使这一行所有列的值都是NULL
。COUNT(1)
:这里的1是一个常量值,COUNT(1)
会统计结果集中的行数,与COUNT(*)
的效果相同。
COUNT(列名)
是按列统计的,但COUNT(*)
和COUNT(1)
是按行统计的,区别在于COUNT(列名)
忽略了NULL
值。
- 当使用
SELECT DATE_FORMAT(trans_date, '%Y-%m') AS month, country
, COUNT(*) AS trans_count
-- 用if函数做筛选,count(列名)不会考虑空值
, COUNT(if(state = 'approved', 1, NULL)) AS approved_count
, SUM(amount) AS trans_total_amount
-- 用if函数做筛选,如果状态为通过则返回amount值,如果不是通过,则返回0.最后做加总
, SUM(if(state = 'approved', amount, 0)) AS approved_total_amount
FROM Transactions
GROUP BY month, country
方法二:间接法(先筛选后合并统计)
这个方法思路也很清晰(写这个方法主要是以后需要注意到在left join的时候null值的处理,但是针对这个题最优解法还是方法一,方法二可行,在大数据集上可能带来性能问题,并且需要注意多个 NULL
值匹配时的逻辑问题。
-
先用两个公用表表达式得出来总计表 t1 和通过表 t2
WITH t1 AS ( SELECT LEFT(trans_date, 7) AS month, country, COUNT(*) AS trans_count, SUM(amount) AS trans_total_amount FROM Transactions GROUP BY LEFT(trans_date, 7), country ), t2 AS ( SELECT LEFT(trans_date, 7) AS month, country, COUNT(*) AS approved_count, SUM(amount) AS approved_total_amount FROM Transactions WHERE state = 'approved' GROUP BY LEFT(trans_date, 7), country )
-
再用Left join 做关联(这个地方就是易错点)
-
错误示例,这样得出来的结果很明显是有问题的,因为2019-01,country值为null的应该有一条通过的计数,但是结果显示为0
SELECT t1.month, t1.country, IFNULL(t1.trans_count, 0) AS trans_count, IFNULL(t2.approved_count, 0) AS approved_count, IFNULL(t1.trans_total_amount, 0) AS trans_total_amount, IFNULL(t2.approved_total_amount, 0) AS approved_total_amount FROM t1 LEFT JOIN t2 ON t1.month = t2.month AND t1.country = t2.country;
-
问题出在查询中对于
country
字段的处理上。在数据中,有一行的country
字段是NULL
。在SQL查询中,如果尝试将t1
和t2
通过month
和country
进行连接,但是当country
为NULL
时,这种连接会失败,因为NULL
与任何值(包括另一个NULL
)的比较都不会返回true
。 -
那么如何修正呢?
修正方法1
可以在连接条件中使用
OR
来特别处理country
为NULL
的情况。这样,即使country
是NULL
,只要month
相同,这两行也会被认为是匹配的。WITH t1 AS ( SELECT LEFT(trans_date, 7) AS month, country, COUNT(*) AS trans_count, SUM(amount) AS trans_total_amount FROM Transactions GROUP BY LEFT(trans_date, 7), country ), t2 AS ( SELECT LEFT(trans_date, 7) AS month, country, COUNT(*) AS approved_count, SUM(amount) AS approved_total_amount FROM Transactions WHERE state = 'approved' GROUP BY LEFT(trans_date, 7), country ) SELECT t1.month, t1.country, IFNULL(trans_count, 0) AS trans_count, IFNULL(approved_count, 0) AS approved_count, IFNULL(trans_total_amount, 0) AS trans_total_amount, IFNULL(approved_total_amount, 0) AS approved_total_amount FROM t1 LEFT JOIN t2 ON t1.month = t2.month AND (t1.country = t2.country OR (t1.country IS NULL AND t2.country IS NULL))
-
修正方法2
要解决这个问题,可以考虑在
GROUP BY
时对NULL
值进行处理,比如使用COALESCE()
函数将NULL
转换为一个默认值(如'UNKNOWN'
或其他你认为合适的值)。下面是修改后的查询: -
WITH t1 AS ( SELECT LEFT(trans_date, 7) AS month, COALESCE(country, 'UNKNOWN') AS country, -- 处理 NULL 值 COUNT(*) AS trans_count, SUM(amount) AS trans_total_amount FROM Transactions GROUP BY LEFT(trans_date, 7), COALESCE(country, 'UNKNOWN') ), t2 AS ( SELECT LEFT(trans_date, 7) AS month, COALESCE(country, 'UNKNOWN') AS country, -- 处理 NULL 值 COUNT(*) AS approved_count, SUM(amount) AS approved_total_amount FROM Transactions WHERE state = 'approved' GROUP BY LEFT(trans_date, 7), COALESCE(country, 'UNKNOWN') ) SELECT t1.month, t1.country, IFNULL(t1.trans_count, 0) AS trans_count, IFNULL(t2.approved_count, 0) AS approved_count, IFNULL(t1.trans_total_amount, 0) AS trans_total_amount, IFNULL(t2.approved_total_amount, 0) AS approved_total_amount FROM t1 LEFT JOIN t2 ON t1.month = t2.month AND t1.country = t2.country;
-