Hive partition prune Failed

最新推荐文章于 2019-08-16 11:08:50 发布

weixin_34354173

最新推荐文章于 2019-08-16 11:08:50 发布

阅读量177

点赞数

文章标签：大数据

原文链接：http://blog.51cto.com/boylook/1365734

版权

昨天发现线上的HiveQuery：

select * from db1.t1where dt between to_char(getdate(‘variables’,’-40’),’yyyymmdd’) and ‘variables’and hour=’xxx’(其中t1 partitioned bydt,hour)

不能进行partition prune导致执行效率非常的差，问题出现在哪里呢？

把To_Char函数的代码翻出来就一目了然了：

@UDFType(deterministic= false)
@Description(name= "to_char",
        value = "_FUNC_(date, pattern)  converts a string with yyyy-MM-dd HH:mm:sspattern " +
                "to a string with givenpattern.\n"
        +"_FUNC_(datetime, pattern)  converts a string with yyyy-MM-dd pattern" +
                "to a string with givenpattern.\n"
        +"_FUNC_(number [,format]) convertsa number to a string\n",
        extended = "Example:\n"
        +" > SELECT to_char('2011-05-1110:00:12'.'yyyyMMdd') FROM src LIMIT 1;\n"
        +"20110511\n"
)

注意到这个函数是一个“非确定性”函数，Hive在做partition prune时考虑三点不进行过滤处理：

1.如果是逻辑函数的话，若所有的child节点都为null则忽略

2.非确定性函数忽略

3.其他情况，只要有child节点为null则忽略

而这里的to_char正是第二种情况，这里我们自己写了个确定性UDF来解决该问题

注：另外一个类似的Case HIVE-1173

转载于:https://blog.51cto.com/boylook/1365734