主要参照的两篇博客:
Hadoop3.2.0使用详解.
hadoop3.1.0 HA高可用完全分布式集群的安装部署(详细教程).
搭建环境:
- centos8(5台)
- JDK 11.0.9
- hadoop 3.1.4
一.端口占用问题
2021-01-02 19:16:35,007 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping JournalNode metrics system...
2021-01-02 19:16:35,008 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: JournalNode metrics system stopped.
2021-01-02 19:16:35,008 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: JournalNode metrics system shutdown complete.
2021-01-02 19:16:35,008 ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNode: Failed to start journalnode.
java.net.BindException: Port in use: 0.0.0.0:8480
at org.apache.hadoop.http.HttpServer2.constructBindException(HttpServer2.java:1226)
at org.apache.hadoop.http.HttpServer2.bindForSinglePort(HttpServer2.java:1248)
at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:1307)
at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1162)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeHttpServer.start(JournalNodeHttpServer.java:86)
at org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:234)
at org.apache.hadoop.hdfs.qjournal.server.JournalNode.run(JournalNode.java:205)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.hdfs.qjournal.server.JournalNode.main(JournalNode.java:415)
Caused by: java.io.IOException: Failed to bind to /0.0.0.0:8480
at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:346)
at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:307)
at org.apache.hadoop.http.HttpServer2.bindListener(HttpServer2.java:1213)
at org.apache.hadoop.http.HttpServer2.bindForSinglePort(HttpServer2.java:1244)
... 8 more
Caused by: java.net.BindException: 地址已在使用
at java.base/sun.nio.ch.Net.bind0(Native Method)
at java.base/sun.nio.ch.Net.bind(Net.java:461)
at java.base/sun.nio.ch.Net.bind(Net.java:453)
at java.base/sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:227)
at java.base/sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:80)
at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:342)
... 11 more
解决方案:查看端口号是否被占用,netstat -anp |grep 端口号。
如果端口没被占用,重启解决一切。
二.权限问题
2021-01-02 18:58:22,430 INFO org.apache.hadoop.hdfs.qjournal.server.JournalNode: registered UNIX signal handlers for [TERM, HUP, INT]
2021-01-02 18:58:22,572 ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNode: Failed to start JournalNode.
org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not writable: /data/hdata/dfs/journal
at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:167)
at org.apache.hadoop.util.DiskChecker.checkDirInternal(DiskChecker.java:100)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:77)
at org.apache.hadoop.hdfs.qjournal.server.JournalNode.validateAndCreateJournalDir(JournalNode.java:195)
at org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:218)
at org.apache.hadoop.hdfs.qjournal.server.JournalNode.run(JournalNode.java:205)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.hdfs.qjournal.server.JournalNode.main(JournalNode.java:415)
2021-01-02 18:58:22,584 ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNode: Failed to start journalnode.
org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not writable: /data/hdata/dfs/journal
at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:167)
at org.apache.hadoop.util.DiskChecker.checkDirInternal(DiskChecker.java:100)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:77)
at org.apache.hadoop.hdfs.qjournal.server.JournalNode.validateAndCreateJournalDir(JournalNode.java:195)
at org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:218)
at org.apache.hadoop.hdfs.qjournal.server.JournalNode.run(JournalNode.java:205)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.hdfs.qjournal.server.JournalNode.main(JournalNode.java:415)
解决方案:权限问题,hadoop的数据我存储在了根目录下,用户和用户组都是root
drwxr-xr-x. 2 root root 6 1月 2 20:17 journal
hadoop用户没有w权限,文件写不进去。
我直接把hadoop的数据存到hadoop用户目录下,省时省力。
drwxrwxr-x. 4 hadoop hadoop 37 1月 3 13:46 journal
2021-01-02 20:18:04,650 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT]
2021-01-02 20:18:05,321 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: Scheduling a check for [DISK]file:/data/hdata/dfs/data
2021-01-02 20:18:05,386 WARN org.apache.hadoop.hdfs.server.datanode.checker.StorageLocationChecker: Exception checking StorageLocation [DISK]file:/data/hdata/dfs/data
EPERM: Operation not permitted
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO.java:381)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:836)
at org.apache.hadoop.fs.ChecksumFileSystem$1.apply(ChecksumFileSystem.java:508)
at org.apache.hadoop.fs.ChecksumFileSystem$FsOperation.run(ChecksumFileSystem.java:489)
at org.apache.hadoop.fs.ChecksumFileSystem.setPermission(ChecksumFileSystem.java:511)
at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:234)
at org.apache.hadoop.util.DiskChecker.checkDirInternal(DiskChecker.java:141)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:116)
at org.apache.hadoop.hdfs.server.datanode.StorageLocation.check(StorageLocation.java:239)
at org.apache.hadoop.hdfs.server.datanode.StorageLocation.check(StorageLocation.java:52)
at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker$1.call(ThrottledAsyncChecker.java:142)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2021-01-02 20:18:05,388 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 0, volumes configured: 1, volumes failed: 1, volume failures tolerated: 0
at org.apache.hadoop.hdfs.server.datanode.checker.StorageLocationChecker.check(StorageLocationChecker.java:231)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2804)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2719)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2761)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2905)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2929)
解决方案:和上面问题相似,权限问题,启动不了datanode,因为data目录数据写不了。
drwxr-xr-x. 2 root root 6 1月 2 20:17 data
三.缺包或找不到主类问题
Caused by: java.lang.NoClassDefFoundError: javax/activation/DataSource
at com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl.<clinit>(RuntimeBuiltinLeafInfoImpl.java:457)
at com.sun.xml.bind.v2.model.impl.RuntimeTypeInfoSetImpl.<init>(RuntimeTypeInfoSetImpl.java:65)
at com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.createTypeInfoSet(RuntimeModelBuilder.java:133)
at com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.createTypeInfoSet(RuntimeModelBuilder.java:85)
at com.sun.xml.bind.v2.model.impl.ModelBuilder.<init>(ModelBuilder.java:156)
at com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.<init>(RuntimeModelBuilder.java:93)
at com.sun.xml.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:473)
at com.sun.xml.bind.v2.runtime.JAXBContextImpl.<init>(JAXBContextImpl.java:319)
at com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1170)
at com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:145)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:262)
at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:249)
at javax.xml.bind.ContextFinder.find(ContextFinder.java:456)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:656)
at com.sun.jersey.api.json.JSONJAXBContext.<init>(JSONJAXBContext.java:255)
at org.apache.hadoop.yarn.server.resourcemanager.webapp.JAXBContextResolver.<init>(JAXBContextResolver.java:69)
at org.apache.hadoop.yarn.server.resourcemanager.webapp.JAXBContextResolver$$FastClassByGuice$$6a7be7f6.newInstance(<generated>)
at com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
at com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:61)
at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:105)
at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:267)
at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1103)
at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:145)
at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
at com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1016)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092)
at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1012)
... 54 more
Caused by: java.lang.ClassNotFoundException: javax.activation.DataSource
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 88 more
解决方案:https://blog.csdn.net/slopop12/article/details/108710328
这种问题比较常见,缺啥就补啥。
[2021-01-03 14:52:02.446]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
原因: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.v2.app.MRAppMaster
[2021-01-03 14:52:02.447]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
原因: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.v2.app.MRAppMaster
解决方案:https://blog.csdn.net/qq_41684957/article/details/81710190
四.目录变文件问题
hdfs dfs -put /tmp/wc /tmp
org.apache.hadoop.fs.ParentNotDirectoryException: /tmp (is not a directory)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkIsDirectory(FSPermissionChecker.java:638)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkSimpleTraverse(FSPermissionChecker.java:629)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:604)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkTraverse(FSDirectory.java:1801)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkTraverse(FSDirectory.java:1819)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:679)
at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:50)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3192)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1157)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:714)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943)
解决方案:hdfs dfs -put /tmp/wc /tmp之前,先hdfs dfs -mkdir /tmp,不然tmp会被创成文件类型。
五.java空指针问题
2021-01-03 15:11:50,386 INFO mapreduce.Job: map 0% reduce 0%
2021-01-03 15:11:50,408 INFO mapreduce.Job: Job job_1609657572378_0002 failed with state FAILED due to: Application application_1609657572378_0002 failed 2 times due to AM Container for appattempt_1609657572378_0002_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2021-01-03 15:11:49.602]Exception from container-launch.
Container id: container_1609657572378_0002_02_000001
Exit code: 1
[2021-01-03 15:11:49.606]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[2021-01-03 15:11:49.606]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
For more detailed output, check the application tracking page: http://hadoop001:8088/cluster/app/application_1609657572378_0002 Then click on links to logs of each attempt.
. Failing the application.
终端显示的错误看不出来啥,查看web端历史日志,发现是java空指针问题。
2021-01-03 15:06:53,508 ERROR [Listener at 0.0.0.0/45119] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.NullPointerException
at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:178)
at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStart(RMCommunicator.java:122)
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStart(RMContainerAllocator.java:280)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serviceStart(MRAppMaster.java:979)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1293)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1761)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1757)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1691)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.mapreduce.v2.app.client.MRClientService.getHttpPort(MRClientService.java:177)
at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:159)
... 14 more
解决方案:https://blog.csdn.net/yj2434/article/details/107817519