问题Review
因为oozie4.2版本修复了4.1版本的rerun的bug,而且考虑升级成本不大,so在某一天的下午启动了升级,升级步骤参看Oozie4.2安装笔记
重启oozie server,打开oozie webui,一切正常,长舒一口气,没有引发线上任务异常。
但是欢乐的时光注定短暂,一个小时左右生产任务报警,上线后查看oozie历史,报警是一个执行bulkload的java action任务失败触发,这个任务主要是将上一个action生成的hfile导入到hbase表中。
为什么会失败?查看oozie server的oozie.log,其中有如下异常日志:
org.apache.oozie.action.hadoop.JavaMainException: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://xxxxxxx/user/hadoop/myproject/tmp/import_2_hbase_hfile/0000000-160223221418726-oozie-hado-W/meta/5bd82eb70707448d917ab7e319caec09
at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:58)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39)
at org.apache.oozie.action.hadoop.JavaMain.main(JavaMain.java:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at