本文为您介绍UNION、UNOIN ALL、UNION DISTINCT并集,INTERSECT、INTERSECT ALL、INTERSECT DISTINCT交集,EXCEPT、EXCEPT ALL、EXCEPT DISTINCT补集等SQL语法。
语法格式
select_statement UNION ALL select_statement;
select_statement UNION [DISTINCT] select_statement;
select_statement INTERSECT ALL select_statement;
select_statement INTERSECT [DISTINCT] select_statement;
select_statement EXCEPT ALL select_statement;
select_statement EXCEPT [DISTINCT] select_statement;
select_statement MINUS ALL select_statement;
select_statement MINUS [DISTINCT] select_statement;
语法说明
- UNION功能说明:求两个数据集的并集,即将两个数据集合并成一个数据集。
- 当UNION后指定参数ALL时,返回两个数据集的所有记录。示例如下。
返回结果如下。SELECT * FROM VALUES (1, 2), (1, 2), (3, 4) t(a, b) UNION ALL SELECT * FROM VALUES (1, 2), (1, 4) t(a, b);
+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 1 | 4 | | 1 | 2 | | 1 | 2 | | 3 | 4 | +------------+------------+
- 存在多个UNION ALL时,支持通过括号指定UNION ALL的优先级。
SELECT * FROM src UNION ALL (SELECT * FROM src2 UNION ALL SELECT * FROM src3);
- 当UNION后不指定参数时,返回记录中会去掉重复的记录,效果等同于UNION DISTINCT。示例如下。
结果如下。SELECT * FROM VALUES (1, 2), (1, 2), (3, 4) t(a, b) UNION SELECT * FROM VALUES (1, 2), (1, 4) t(a, b); --等同于如下语句。 SELECT DISTINCT * FROM (<UNION ALL的结果>)t;
+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 1 | 4 | | 3 | 4 | +------------+------------+
- UNION后如果有
CLUSTER BY
、DISTRIBUTE BY
、SORT BY
、ORDER BY
或者LIMIT
子句,当设置set odps.sql.type.system.odps2=false;
时,其作用于UNION的最后一个select_statement
;当设置set odps.sql.type.system.odps2=true;
时,作用于前面所有UNION的结果。示例如下。
返回结果如下。set odps.sql.type.system.odps2=true; SELECT explode(array(3, 1)) AS (a) UNION ALL SELECT explode(array(0, 4, 2)) AS (a) ORDER BY a LIMIT 3;
+------+ | a | +------+ | 0 | | 1 | | 2 | +------+
- 当UNION后指定参数ALL时,返回两个数据集的所有记录。示例如下。
- INTERSECT
功能说明:求两个数据集的交集,即输出两个数据集均包含的记录。
示例INTERSECT ALL
示例
返回结果如下。SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 6) t(a, b) INTERSECT ALL SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 7) t(a, b);
+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 1 | 2 | | 3 | 4 | +------------+------------+
INTERSECT DISTINCT
示例
结果如下,等同于SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 6) t(a, b) INTERSECT SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 7) t(a, b);
SELECT DISTINCT * FROM (< INTERSECT ALL的结果 >) t;
语句。+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 3 | 4 | +------------+------------+
- EXCEPT
功能说明:求第二个数据集在第一个数据集中的补集,即输出第一个数据集包含而第二个数据集不包含的记录。
示例EXCEPT ALL
示例。
返回结果如下。SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b) EXCEPT ALL SELECT * FROM VALUES (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);
+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 1 | 2 | | 3 | 4 | | 7 | 8 | +------------+------------+
EXCEPT DISTINCT
示例。
结果如下,相当于SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b) EXCEPT SELECT * FROM VALUES (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);
SELECT DISTINCT * FROM left_branch EXCEPT ALL SELECT DISTINCT * FROM right_branch;
。+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 7 | 8 | +------------+------------+
- MINUS
功能说明:等同于
EXCEPT
。
注意事项
- 集合操作的结果不一定会按序排列。
- 集合操作左右两个分支要求列个数必须一致。如果数据类型不一致,可能会进行隐式类型转换。由于兼容性原因,STRING类型和非STRING类型在集合操作中的隐式转换已被禁用。
- MaxCompute最多允许256个分支的集合操作,超出256个将报错。