近期由于运维重启集群导致Flume异常关闭,总结一下原因和处理方法:
现象:查询Hive外部表失败,外部表数据有Flume写入
错误日志:
2017-11-01 00:15:37,077 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hive (auth:SIMPLE) cause:java.io.IOException: java.lang.reflect.InvocationTargetException
2017-11-01 00:15:37,079 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:265)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:212)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:332)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:721)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:251)
... 11 more
Caused by: java.io.IOException: Cannot obtain block length for LocatedBlock{BP-1461471655-10.2.35.25-1489124540398:blk_1176728682_102999133; getBlockSize()=2414080; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[10.2.35.52:50010,DS-d3f75610-c89b-4617-8c97-4460066476ad,DISK], DatanodeInfoWithStorage[10.2.35.53:50010,DS-ad8c89d8-5f5d-4491-9aef-602bce3c244a,DISK], DatanodeInfoWithStorage[10.2.35.29:50010,DS-e77934f2-799d-4c65-8715-59378e689e93,DISK], DatanodeInfoWithStorage[10.2.35.54:50010,DS-75fdf7a7-7177-4b0a-8939-720b718623e5,DISK]]}
at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:427)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:335)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:271)
at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:263)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1585)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:322)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:783)
at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:109)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:67)
... 16 more
问题原因:CM重启前,未关闭Flume Agent,导致文件未被正常关闭。
处理方法:
方法一:暴力删除状态异常文件
hadoop fsck -openforwrite | egrep -v '^\.+$' | egrep "MISSING|OPENFORWRITE" | grep -o "/[^ ]*" | sed -e "s/:$//" | xargs -i hadoop fs -rmr {};
方法二:恢复租约
hdfs debug recoverLease -path -retries