异常情况:
2021-12-06 23:25:15,656 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(260)) - Cleaning up the staging area /tmp/hadoop-yarn/staging/henry/.staging/job_1638759714369_0002
Exception in thread "main" java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:474)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1600)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:947)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:944)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:954)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:262)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
at com.henry.whale.mranalysis.tool.AnalysisTextTool.run(AnalysisTextTool.java:48)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at com.henry.whale.mranalysis.AnalysisData.main(AnalysisData.java:8)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:318)
at org.apache.hadoop.util.RunJar.main(RunJar.java:232)
原因分析:
若参数fs.hdfs.impl.disable.cache默认为false,这个Configuration被Cache;
当任务提交到集群上面以后,多个datanode在getFileSystem过程中,由于Configuration一样,会得到同一个FileSystem。
如果有一个datanode在使用完关闭连接,导致在方法外的FileSystem closed,其它的datanode在访问就会出现上述FileSystem closed异常。
解决方案:
-
通过修改配置文件core-site.xml
<property>
<name>fs.hdfs.impl.disable.cache</name>
<value>true</value>
</property>
2. 在代码中添加
conf.setBoolean("fs.hdfs.impl.disable.cache", true);
FileSytem类内部有一个static CACHE,用来保存每种文件系统的实例集合,FileSystem类中可以通过"fs.%s.impl.disable.cache"来指定是否缓存FileSystem实例(其中%s替换为相应的scheme,比如hdfs、local、s3、s3n等),即一旦创建了相应的FileSystem实例,这个实例将会保存在缓存中,此后每次get都会获取同一个实例。所以设为true以后,就能解决上面的异常。
There is a little-known gotcha with the hadoop filesystem API: FileSystem.get returns the same object for every invocation with the same filesystem. So if one is closed anywhere, they are all closed. You could debate the merits of this decision, but that's the way it is.
So, if you attempt to close your BufferedReader, and it tries to flush out some data it has buffered, but the underlying stream is connected to a FileSystem that is already closed, you'll get this error. Check your code for any other places you are closing a FileSystem object, and look for race conditions. Also, I believe Hadoop itself will at some point close the FileSystem, so to be safe, you should probably only be accessing it from within the Reducer's setup, reduce, or cleanup methods (or configure, reduce, and close, depending on which API you're using).
参考:
【错误处理】hadoop hdfs 读写错误解决:java.io.IOException: Filesystem closed_victorzzzz的专栏-CSDN博客java.io.IOException: Filesystem closed_bitcarmanlee的博客-CSDN博客
https://blog.csdn.net/victorzzzz/article/details/89926613