今天在测试full join的时候意外发现一个报错
with t1 as (
select
null as col1,null as col2,null as col3
)
,t2 as (
select
1 as col1,null as col2,null as col3
)
select
t1.*,t2.*
from t1
full join t2
on t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3
报错:
org.apache.impala.common.NotImplementedException: Error generating a valid execution plan for this query. A FULL OUTER JOIN type with no equi-join predicates can only be executed with a single node plan.
一开始没看懂什么原因,我猜测是上面例子关联出来的数据都是Null值,才会报错,后来我就多加了一行t1.col1=1的数据进行测试,这样就可以保证关联字段值不都是null了,如下:
with t1 as (
select
null as col1,2 as col2,null as col3,
1 as col1,2 as col2,null as col3
)
,t2 as (
select
1 as col1,null as col2 ,null as col3
-- 1 as col1,2 as col2 ,null as col3
)
select
t1.*,t2.*
from t1
full join t2
on t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3
结果还是一样,暂时结论是只有保证两个表中的col1或col2或col3中的至少一个字段的所有值不为null才不会报错,如下:
with t1 as (
select
1 as col1,2 as col2,null as col3
)
,t2 as (
select
1 as col1,null as col2 ,null as col3
-- 1 as col1,2 as col2 ,null as col3
)
select
t1.*,t2.*
from t1
full join t2
on t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3
希望有大佬知道的可以指出具体原因;