Leetcode 1193 每月交易(探究当有关联字段有NULL值如何做左右关联

题目

现有一个交易表Transactions,内有id,country,state(列类型为 ["approved", "declined"]),amount金额,trans_date交易日期。编写一个 sql 查询来查找每个月和每个国家/地区的事务数及其总金额、已批准的事务数及其总金额。

以 任意顺序 返回结果表。

数据准备

Create table If Not Exists Transactions (id int, country varchar(4), state enum('approved', 'declined'), amount int, trans_date date);
Truncate table Transactions;
insert into Transactions (id, country, state, amount, trans_date) values ('121', 'US', 'approved', '1000', '2018-12-18');
insert into Transactions (id, country, state, amount, trans_date) values ('122', 'US', 'declined', '2000', '2018-12-19');
insert into Transactions (id, country, state, amount, trans_date) values ('123', 'US', 'approved', '2000', '2019-01-01');
insert into Transactions (id, country, state, amount, trans_date) values ('124', null, 'approved', '2000', '2019-01-07');

解答注意

这道题总体难度不大,共有两个解题思路

  • 第一个是最简洁的
  • 第二个稍微麻烦点,主要是因为有个地方(当left join的两个表里关联字段有null值,要怎么处理正确匹配上两个表的null)需要注意下,所以也放上来了
  • 思路简单,有几点需要注意,分别放在两个方法下讲讲

方法一 :直接法(在统计与筛选同时走)

  • COUNT()函数
    • 当使用COUNT函数统计某个列的非NULL值时,如果该列中包含NULL值,这些NULL值不会被计入总数。例如,如果有一个列,其中有5个非NULL值和3个NULL值,那么COUNT(列名)的结果将是5。
    • 当使用COUNT函数统计全表的行数时
      • COUNT(*) 不关注列值是否为 NULL,它只计算表中的行数,只要行存在,就会被计数,即使这一行所有列的值都是 NULL
      • COUNT(1):这里的1是一个常量值,COUNT(1)会统计结果集中的行数,与COUNT(*)的效果相同。
    • COUNT(列名) 是按列统计的,但 COUNT(*)COUNT(1) 是按行统计的,区别在于 COUNT(列名) 忽略了 NULL

 

SELECT DATE_FORMAT(trans_date, '%Y-%m') AS month, country
	, COUNT(*) AS trans_count
	-- 用if函数做筛选,count(列名)不会考虑空值
	, COUNT(if(state = 'approved', 1, NULL)) AS approved_count
	, SUM(amount) AS trans_total_amount
	-- 用if函数做筛选,如果状态为通过则返回amount值,如果不是通过,则返回0.最后做加总
	, SUM(if(state = 'approved', amount, 0)) AS approved_total_amount
FROM Transactions
GROUP BY month, country

方法二:间接法(先筛选后合并统计)

这个方法思路也很清晰(写这个方法主要是以后需要注意到在left join的时候null值的处理,但是针对这个题最优解法还是方法一,方法二可行,在大数据集上可能带来性能问题,并且需要注意多个 NULL 值匹配时的逻辑问题。

  • 先用两个公用表表达式得出来总计表 t1 和通过表 t2

    WITH t1 AS (
        SELECT 
            LEFT(trans_date, 7) AS month,
            country,
            COUNT(*) AS trans_count,
            SUM(amount) AS trans_total_amount
        FROM Transactions
        GROUP BY LEFT(trans_date, 7), country
    ),
    t2 AS (
        SELECT 
            LEFT(trans_date, 7) AS month,
            country,
            COUNT(*) AS approved_count,
            SUM(amount) AS approved_total_amount
        FROM Transactions
        WHERE state = 'approved'
        GROUP BY LEFT(trans_date, 7), country
    )
    
  • 再用Left join 做关联(这个地方就是易错点)

    • 错误示例,这样得出来的结果很明显是有问题的,因为2019-01,country值为null的应该有一条通过的计数,但是结果显示为0

      	SELECT 
      		t1.month,
      		t1.country,
      		IFNULL(t1.trans_count, 0) AS trans_count,
      		IFNULL(t2.approved_count, 0) AS approved_count,
      		IFNULL(t1.trans_total_amount, 0) AS trans_total_amount,
      		IFNULL(t2.approved_total_amount, 0) AS approved_total_amount
      	FROM t1
      	LEFT JOIN t2 
      		ON t1.month = t2.month 
      		AND t1.country = t2.country;
      
    • 问题出在查询中对于country字段的处理上。在数据中,有一行的country字段是NULL。在SQL查询中,如果尝试将t1t2通过monthcountry进行连接,但是当countryNULL时,这种连接会失败,因为NULL与任何值(包括另一个NULL)的比较都不会返回true

    • 那么如何修正呢?

      修正方法1

      可以在连接条件中使用OR来特别处理countryNULL的情况。这样,即使countryNULL,只要month相同,这两行也会被认为是匹配的。

      WITH t1 AS (
          SELECT 
              LEFT(trans_date, 7) AS month,
              country,
              COUNT(*) AS trans_count,
              SUM(amount) AS trans_total_amount
          FROM Transactions
          GROUP BY LEFT(trans_date, 7), country
      ),
      t2 AS (
          SELECT 
              LEFT(trans_date, 7) AS month,
              country,
              COUNT(*) AS approved_count,
              SUM(amount) AS approved_total_amount
          FROM Transactions
          WHERE state = 'approved'
          GROUP BY LEFT(trans_date, 7), country
      )
      SELECT 
          t1.month,
          t1.country,
          IFNULL(trans_count, 0) AS trans_count,
          IFNULL(approved_count, 0) AS approved_count,
          IFNULL(trans_total_amount, 0) AS trans_total_amount,
          IFNULL(approved_total_amount, 0) AS approved_total_amount
      FROM 
          t1
      LEFT JOIN t2 
          ON t1.month = t2.month 
          AND (t1.country = t2.country OR (t1.country IS NULL AND t2.country IS NULL))
      

    • 修正方法2

      要解决这个问题,可以考虑在 GROUP BY 时对 NULL 值进行处理,比如使用 COALESCE() 函数将 NULL 转换为一个默认值(如 'UNKNOWN' 或其他你认为合适的值)。下面是修改后的查询:

    • WITH t1 AS (
          SELECT 
              LEFT(trans_date, 7) AS month,
              COALESCE(country, 'UNKNOWN') AS country,  -- 处理 NULL 值
              COUNT(*) AS trans_count,
              SUM(amount) AS trans_total_amount
          FROM Transactions
          GROUP BY LEFT(trans_date, 7), COALESCE(country, 'UNKNOWN')
      ),
      t2 AS (
          SELECT 
              LEFT(trans_date, 7) AS month,
              COALESCE(country, 'UNKNOWN') AS country,  -- 处理 NULL 值
              COUNT(*) AS approved_count,
              SUM(amount) AS approved_total_amount
          FROM Transactions
          WHERE state = 'approved'
          GROUP BY LEFT(trans_date, 7), COALESCE(country, 'UNKNOWN')
      )
      SELECT 
          t1.month,
          t1.country,
          IFNULL(t1.trans_count, 0) AS trans_count,
          IFNULL(t2.approved_count, 0) AS approved_count,
          IFNULL(t1.trans_total_amount, 0) AS trans_total_amount,
          IFNULL(t2.approved_total_amount, 0) AS approved_total_amount
      FROM t1
      LEFT JOIN t2 
          ON t1.month = t2.month 
          AND t1.country = t2.country;
      

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值