null值对group by和over中partition by的影响
结论:
当group by或partition by的key中包含null值时,null值像其他普通值一样被处理,并不会特殊处理。
验证
- 测试数据
CREATE TABLE table1(customer_no STRING, stock_fee double, advertisement_fee double);
INSERT INTO table1 VALUES
("C0001", 0.1, 1221.12)
,("C0001", NULL, 1130.52)
,("C0001", 0.1, 1084.72)
,("C0001", NULL, 1738.32)
,("C0001", NULL, 1817.67)
,("C0001", NULL, 3271.45)
,("C0002", NULL, 2587.76)
,("C0002", NULL, 1397.60)
,("C0002", NULL, 2871.75)
,("C0002", NULL, 812.08)
,("C0002", NULL, 1643.39)
,("C0003", 2635.23, 2587.76)
,("C0003", 1253.69, 1397.60)
,("C0003", 253.69, 2871.75)
,("C0003", 1528.12, 812.08)
,("C0003", 412.53, 1643.39);
- group by
SELECT customer_no
,stock_fee
FROM table1
WHERE customer_no = 'C0001'
GROUP BY customer_no
,stock_fee;
-- 结果如下
+--------------+------------+
| customer_no | stock_fee |
+--------------+------------+
| C0001 | NULL |
| C0001 | 0.1 |
+--------------+------------+
SELECT stock_fee
,COUNT(stock_fee)
FROM table1
WHERE customer_no = 'C0001'
GROUP BY stock_fee;
-- 结果如下
+------------+------+
| stock_fee | _c1 |
+------------+------+
| NULL | 0 |
| 0.1 | 2 |
+------------+------+
SELECT stock_fee
,COUNT(customer_no)
FROM table1
WHERE customer_no = 'C0001'
GROUP BY stock_fee;
-- 结果如下
+------------+------+
| stock_fee | _c1 |
+------------+------+
| NULL | 4 |
| 0.1 | 2 |
+------------+------+
当值中存在null时,null值也会被当成分组的一种,但是count()时,如果指定的是null的字段,则null不会被计数。
- partition by
SELECT customer_no
,stock_fee
,advertisement_fee
,row_number() over(partition by customer_no,stock_fee ORDER BY customer_no) AS rn
FROM table1
WHERE customer_no = 'C0001';
-- 结果如下
+--------------+------------+--------------------+-----+
| customer_no | stock_fee | advertisement_fee | rn |
+--------------+------------+--------------------+-----+
| C0001 | NULL | 1130.52 | 1 |
| C0001 | NULL | 1738.32 | 2 |
| C0001 | NULL | 1817.67 | 3 |
| C0001 | NULL | 3271.45 | 4 |
| C0001 | 0.1 | 1084.72 | 1 |
| C0001 | 0.1 | 1221.12 | 2 |
+--------------+------------+--------------------+-----+
SELECT customer_no
,stock_fee
,advertisement_fee
,row_number() over(partition by stock_fee ORDER BY customer_no) AS rn
FROM table1
WHERE customer_no = 'C0002';
-- 结果如下
+--------------+------------+--------------------+-----+
| customer_no | stock_fee | advertisement_fee | rn |
+--------------+------------+--------------------+-----+
| C0002 | NULL | 2587.76 | 1 |
| C0002 | NULL | 1397.6 | 2 |
| C0002 | NULL | 2871.75 | 3 |
| C0002 | NULL | 812.08 | 4 |
| C0002 | NULL | 1643.39 | 5 |
+--------------+------------+--------------------+-----+
在partition by中为null的值也会被正常处理。