SUN(A)+SUM(B)!=SUM(A+B)的情况
在hive、presto、spark中,当字段A或字段B列中存在null值时,SUN(A)+SUM(B)!=SUM(A+B)
1.null + 数值 = null
> select null + 0;
+-------+
| _c0 |
+-------+
| NULL |
+-------+
2.在对null进行判断时,null无法和任何值(包括其本身)通过=
!=
<
>
<=
>=
进行比较,值均为null。
select null == 1;
select null != 1;
select null == null;
select null >= 1;
-- 结果都为null
+-------+
| _c0 |
+-------+
| NULL |
+-------+
使用聚合函数遇到null值,结果分别如下:
聚合函数 | col列值都为null | col列值存在不为null |
---|---|---|
sum(col) | null | 非null值的和 |
avg(col) | null | 非null值的平均值 |
count(col) | 0 | 非null值的个数 |
max(col) | null | 非null值中最大值 |
min(col) | null | 非null值中最小值 |
1. 验证
创建测试表,并插入测试数据(default库)
CREATE TABLE table1(customer_no STRING, stock_fee double, advertisement_fee double);
INSERT INTO table1 VALUES
("C0001", NULL, 1130.52)
,("C0001", 0.1, 1084.72)
,("C0001", NULL, 1738.32)
,("C0001", NULL, 1817.67)
,("C0001", NULL, 3271.45)
,("C0002", NULL, 2587.76)
,("C0002", NULL, 1397.60)
,("C0002", NULL, 2871.75)
,("C0002", NULL, 812.08)
,("C0002", NULL, 1643.39)
,("C0003", 2635.23, 2587.76)
,("C0003", 1253.69, 1397.60)
,("C0003", 253.69, 2871.75)
,("C0003", 1528.12, 812.08)
,("C0003", 412.53, 1643.39);
select customer_no,sum(stock_fee) as sum,avg(stock_fee) as avg,count(stock_fee) as count,MAX(stock_fee) as max,MIN(stock_fee) as min from table1 group by customer_no;
+--------------+--------------------+---------------------+--------+----------+---------+
| customer_no | sum | avg | count | max | min |
+--------------+--------------------+---------------------+--------+----------+---------+
| C0001 | 0.2 | 0.1 | 2 | 0.1 | 0.1 |
| C0002 | NULL | NULL | 0 | NULL | NULL |
| C0003 | 6083.259999999999 | 1216.6519999999998 | 5 | 2635.23 | 253.69 |
+--------------+--------------------+---------------------+--------+----------+---------+
1.1. 对单独的一列进行SUM
SELECT customer_no
,SUM(stock_fee)
FROM table1
GROUP BY customer_no;
+--------------+--------------------+
| customer_no | _c1 |
+--------------+--------------------+
| C0001 | 0.1 |
| C0002 | NULL |
| C0003 | 6083.259999999999 |
+--------------+--------------------+
- 当列值中存在NULL时(C0001),sum结果不会报错,且计算结果正确
- 当列值中全部为NULL时(C0002),sum结果为null
- 当列值中不存在NULL值时(C0003),对数值进行sum,无异议
1.2. SUM(A)+SUM(B)
SELECT customer_no
,SUM(stock_fee)+SUM(advertisement_fee)
FROM table1
GROUP BY customer_no;
+--------------+-----------+
| customer_no | _c1 |
+--------------+-----------+
| C0001 | 9042.78 |
| C0002 | NULL |
| C0003 | 15395.84 |
+--------------+-----------+
- 对于c0001,SUM(stock_fee)=0.1,SUM(advertisement_fee)=9042.68,因此SUM(stock_fee)+SUM(advertisement_fee)=9042.78
- 对于c0002,SUM(stock_fee)=null,SUM(advertisement_fee)=9312.58,因此SUM(stock_fee)+SUM(advertisement_fee)=null
- 对于c0003,SUM(stock_fee)=6083.259999999999,SUM(advertisement_fee)=9312.58,因此SUM(stock_fee)+SUM(advertisement_fee)=15395.84
1.3. SUM(A+B)
SELECT customer_no
,SUM(stock_fee + advertisement_fee)
FROM table1
GROUP BY customer_no;
+--------------+---------------------+
| customer_no | _c1 |
+--------------+---------------------+
| C0001 | 1084.82 |
| C0002 | NULL |
| C0003 | 15395.839999999998 |
+--------------+---------------------+
- 对于c0001,只有0.1 + 1084.72 不为null,因此SUM(stock_fee + advertisement_fee)结果为1084.82
- 对于c0002,只有stock_fee + advertisement_fee都为null,因此SUM(stock_fee + advertisement_fee)结果为null
- 对于c0003,只有stock_fee + advertisement_fee都不为为null,因此SUM(stock_fee + advertisement_fee)结果为15395.839999999998
2. 解决
想要sum时得到正确结果,在sum时对每个字段值都判断是否为null,当为null时置为0
以下二者结果相同
SELECT customer_no
,SUM(if(stock_fee is null,0,stock_fee) + if(advertisement_fee is null,0,advertisement_fee))
FROM table1
GROUP BY customer_no;
SELECT customer_no
,SUM(if(stock_fee is null,0,stock_fee)) + SUM(if(advertisement_fee is null,0,advertisement_fee))
FROM table1
GROUP BY customer_no;
+--------------+-----------+
| customer_no | _c1 |
+--------------+-----------+
| C0001 | 9042.78 |
| C0002 | 9312.58 |
| C0003 | 15395.84 |
+--------------+-----------+