最近在用公司集群hive跑sql时总是会遇到如下报错:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
但是在hive.log里很奇怪,没有找到什么有用信息,原因暂时不明。
目前解决方法记录
查找hive log,通过将log直接打出来的方式寻找报错
hive --hiveconf hive.root.logger=INFO,console
复现时发现如下错误信息:
java.io.FileNotFoundException: File does not exist:
hdfs://cluster1:8020/tmp/hive/hadoop/_tez_session_dir/ffe122b1-08ed-4c84-8705-686594118764/.tez/application_1561361832540_3013/tez-conf.pb
根据该错误信息,在网上搜寻答案,发现配置如下参数项可规避错误:
set tez.client.asynchronous-stop=false
网上给出的原因如下:
Cause:
The above issue occurs when there are multiple jobs triggered and Hive removes a session directory for some application failure while Tez Application Master is still using it. The Tez Application Master staging directory is part of Hive Scratch directory which is controlled by the Hive Session.
Solution:
To resolve this issue, block the closing of sessions until tez AM shuts down
暂时如此设置,待观察是否错误继续。