SUM(A)+SUM(B)和SUM(A+B)不相等问题踩坑

SUN(A)+SUM(B)!=SUM(A+B)的情况

在hive、presto、spark中,当字段A或字段B列中存在null值时,SUN(A)+SUM(B)!=SUM(A+B)

1.null + 数值 = null

   > select null + 0;
   +-------+
   |  _c0  |
   +-------+
   | NULL  |
   +-------+

2.在对null进行判断时,null无法和任何值(包括其本身)通过= != < > <= >=进行比较,值均为null。

   select null == 1;
   select null != 1;
   select null == null;
   select null >= 1;
   -- 结果都为null
   +-------+
   |  _c0  |
   +-------+
   | NULL  |
   +-------+

使用聚合函数遇到null值,结果分别如下

聚合函数col列值都为nullcol列值存在不为null
sum(col)null非null值的和
avg(col)null非null值的平均值
count(col)0非null值的个数
max(col)null非null值中最大值
min(col)null非null值中最小值

1. 验证

创建测试表,并插入测试数据(default库)

CREATE TABLE table1(customer_no STRING, stock_fee double, advertisement_fee double);

INSERT INTO table1 VALUES 
("C0001", NULL, 1130.52)
,("C0001", 0.1, 1084.72)
,("C0001", NULL, 1738.32)
,("C0001", NULL, 1817.67)
,("C0001", NULL, 3271.45)
,("C0002", NULL, 2587.76)
,("C0002", NULL, 1397.60)
,("C0002", NULL, 2871.75)
,("C0002", NULL, 812.08)
,("C0002", NULL, 1643.39)
,("C0003", 2635.23, 2587.76)
,("C0003", 1253.69, 1397.60)
,("C0003", 253.69, 2871.75)
,("C0003", 1528.12, 812.08)
,("C0003", 412.53, 1643.39);

select customer_no,sum(stock_fee) as sum,avg(stock_fee) as avg,count(stock_fee) as count,MAX(stock_fee) as max,MIN(stock_fee) as min from table1 group by customer_no;
+--------------+--------------------+---------------------+--------+----------+---------+
| customer_no  |        sum         |         avg         | count  |   max    |   min   |
+--------------+--------------------+---------------------+--------+----------+---------+
| C0001        | 0.2                | 0.1                 | 2      | 0.1      | 0.1     |
| C0002        | NULL               | NULL                | 0      | NULL     | NULL    |
| C0003        | 6083.259999999999  | 1216.6519999999998  | 5      | 2635.23  | 253.69  |
+--------------+--------------------+---------------------+--------+----------+---------+

1.1. 对单独的一列进行SUM

SELECT  customer_no
       ,SUM(stock_fee)
FROM table1
GROUP BY  customer_no;

+--------------+--------------------+
| customer_no  |        _c1         |
+--------------+--------------------+
| C0001        | 0.1                |
| C0002        | NULL               |
| C0003        | 6083.259999999999  |
+--------------+--------------------+
  • 当列值中存在NULL时(C0001),sum结果不会报错,且计算结果正确
  • 当列值中全部为NULL时(C0002),sum结果为null
  • 当列值中不存在NULL值时(C0003),对数值进行sum,无异议

1.2. SUM(A)+SUM(B)

SELECT  customer_no
       ,SUM(stock_fee)+SUM(advertisement_fee)
FROM table1
GROUP BY  customer_no;

+--------------+-----------+
| customer_no  |    _c1    |
+--------------+-----------+
| C0001        | 9042.78   |
| C0002        | NULL      |
| C0003        | 15395.84  |
+--------------+-----------+
  • 对于c0001,SUM(stock_fee)=0.1,SUM(advertisement_fee)=9042.68,因此SUM(stock_fee)+SUM(advertisement_fee)=9042.78
  • 对于c0002,SUM(stock_fee)=null,SUM(advertisement_fee)=9312.58,因此SUM(stock_fee)+SUM(advertisement_fee)=null
  • 对于c0003,SUM(stock_fee)=6083.259999999999,SUM(advertisement_fee)=9312.58,因此SUM(stock_fee)+SUM(advertisement_fee)=15395.84

1.3. SUM(A+B)

SELECT  customer_no
       ,SUM(stock_fee + advertisement_fee)
FROM table1
GROUP BY  customer_no;

+--------------+---------------------+
| customer_no  |         _c1         |
+--------------+---------------------+
| C0001        | 1084.82             |
| C0002        | NULL                |
| C0003        | 15395.839999999998  |
+--------------+---------------------+
  • 对于c0001,只有0.1 + 1084.72 不为null,因此SUM(stock_fee + advertisement_fee)结果为1084.82
  • 对于c0002,只有stock_fee + advertisement_fee都为null,因此SUM(stock_fee + advertisement_fee)结果为null
  • 对于c0003,只有stock_fee + advertisement_fee都不为为null,因此SUM(stock_fee + advertisement_fee)结果为15395.839999999998

2. 解决

想要sum时得到正确结果,在sum时对每个字段值都判断是否为null,当为null时置为0

以下二者结果相同

SELECT  customer_no
       ,SUM(if(stock_fee is null,0,stock_fee) + if(advertisement_fee is null,0,advertisement_fee))
FROM table1
GROUP BY  customer_no;

SELECT  customer_no
       ,SUM(if(stock_fee is null,0,stock_fee)) + SUM(if(advertisement_fee is null,0,advertisement_fee))
FROM table1
GROUP BY  customer_no;

+--------------+-----------+
| customer_no  |    _c1    |
+--------------+-----------+
| C0001        | 9042.78   |
| C0002        | 9312.58   |
| C0003        | 15395.84  |
+--------------+-----------+
  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值