main~
假设,我们有以下的两个行集:
A
id:int Name
1 Smith
1 Smith
2 Brown
3 Case
B
id:int Name
1 Smith
1 Smith
1 Smith
2 Brown
4 Dey
4 Dey
我们加载这些数据集:
a = EXTRACT
Id:int, Name:string
FROM @"/my/SampleData/SetOps_A.txt"
USING DefaultTextExtractor();
b = EXTRACT
Id:int, Name:string
FROM @"/my/SampleData/SetOps_B.txt"
USING DefaultTextExtractor();
实例(行集的合并):
存在两种,一种为UNION ALL(保留所有),一种为UNION(删除重复)
union_distinct = SELECT * FROM a
UNION DISTINCT
SELECT * FROM b;
OUTPUT union_distinct
TO @"/my/Outputs/union_distinct.txt";
结果:
UNION DISTINCT
id:int Name
1 Smith
2 Brown
3 Case
4 Dey
union_all = SELECT * FROM a
UNION ALL
SELECT * FROM b;
OUTPUT union_all
TO @"/my/Outputs/union_all.txt";
结果:
UNION ALL
id:int Name
1 Smith
1 Smith
2 Brown
3 Case
1 Smith
1 Smith
1 Smith
2 Brown
4 Dey
4 Dey
实例(集合的公共行)
可以使用INTERSECT来实现。
同样的存在两种,INTERSECT ALL(保留所有),INTERSECT(删除重复)。
rs1 = SELECT * FROM a
INTERSECT DISTINCT
SELECT * FROM b;
rs2 = SELECT * FROM a
INTERSECT ALL
SELECT * FROM b;
OUTPUT rs1
TO @"/my/Outputs/intersect.txt";
OUTPUT rs2
TO @"/my/Outputs/intersect-all.txt";
结果:
INTERSECT DISTINCT
id:int name
1 Smith
2 Brown
INTERSECT ALL
id:int name
1 Smith
1 Smith
2 Brown
实例(查找在左rowset中,不在右rowset中的所有行)
可以使用EXCEPT实现,同样的存在两种,EXCEPT ALL(保留全部),EXCEPT (删除重复)。
rs0 = SELECT * FROM a
EXCEPT DISTINCT
SELECT * FROM b;
rs1 = SELECT * FROM a
EXCEPT ALL
SELECT * FROM b;
rs2 = SELECT * FROM b
EXCEPT DISTINCT
SELECT * FROM a;
rs3 = SELECT * FROM b
EXCEPT ALL
SELECT * FROM a;
OUTPUT rs0 TO @"/my/Outputs/except_distinct_a_b.txt";
OUTPUT rs1 TO @"/my/Outputs/except-all_a_b.txt";
OUTPUT rs2 TO @"/my/Outputs/except_distinct_b_a.txt";
OUTPUT rs3 TO @"/my/Outputs/except-all_b_a.txt";
结果:
EXCEPT ALL (A,B)
id:int name
3 Case
EXCEPT DISTINCT (A,B)
id:int name
3 Case
EXCEPT ALL (B,A)
id:int name
1 Smith
4 Dey
4 Dey
EXCEPT DISTINCT (B,A)
id:int name
4 Dey