Hadoop3.1.4(apache发行版)HA搭建时遇到的问题及解决方案

主要参照的两篇博客:
Hadoop3.2.0使用详解.
hadoop3.1.0 HA高可用完全分布式集群的安装部署(详细教程).

搭建环境:

  • centos8(5台)
  • JDK 11.0.9
  • hadoop 3.1.4

一.端口占用问题

2021-01-02 19:16:35,007 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping JournalNode metrics system...
2021-01-02 19:16:35,008 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: JournalNode metrics system stopped.
2021-01-02 19:16:35,008 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: JournalNode metrics system shutdown complete.
2021-01-02 19:16:35,008 ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNode: Failed to start journalnode.
java.net.BindException: Port in use: 0.0.0.0:8480
	at org.apache.hadoop.http.HttpServer2.constructBindException(HttpServer2.java:1226)
	at org.apache.hadoop.http.HttpServer2.bindForSinglePort(HttpServer2.java:1248)
	at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:1307)
	at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1162)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNodeHttpServer.start(JournalNodeHttpServer.java:86)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:234)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.run(JournalNode.java:205)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.main(JournalNode.java:415)
Caused by: java.io.IOException: Failed to bind to /0.0.0.0:8480
	at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:346)
	at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:307)
	at org.apache.hadoop.http.HttpServer2.bindListener(HttpServer2.java:1213)
	at org.apache.hadoop.http.HttpServer2.bindForSinglePort(HttpServer2.java:1244)
	... 8 more
Caused by: java.net.BindException: 地址已在使用
	at java.base/sun.nio.ch.Net.bind0(Native Method)
	at java.base/sun.nio.ch.Net.bind(Net.java:461)
	at java.base/sun.nio.ch.Net.bind(Net.java:453)
	at java.base/sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:227)
	at java.base/sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:80)
	at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:342)
	... 11 more

解决方案:查看端口号是否被占用,netstat -anp |grep 端口号。
如果端口没被占用,重启解决一切。

二.权限问题

2021-01-02 18:58:22,430 INFO org.apache.hadoop.hdfs.qjournal.server.JournalNode: registered UNIX signal handlers for [TERM, HUP, INT]
2021-01-02 18:58:22,572 ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNode: Failed to start JournalNode.
org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not writable: /data/hdata/dfs/journal
	at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:167)
	at org.apache.hadoop.util.DiskChecker.checkDirInternal(DiskChecker.java:100)
	at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:77)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.validateAndCreateJournalDir(JournalNode.java:195)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:218)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.run(JournalNode.java:205)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.main(JournalNode.java:415)
2021-01-02 18:58:22,584 ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNode: Failed to start journalnode.
org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not writable: /data/hdata/dfs/journal
	at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:167)
	at org.apache.hadoop.util.DiskChecker.checkDirInternal(DiskChecker.java:100)
	at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:77)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.validateAndCreateJournalDir(JournalNode.java:195)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:218)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.run(JournalNode.java:205)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNode.main(JournalNode.java:415)

解决方案:权限问题,hadoop的数据我存储在了根目录下,用户和用户组都是root
drwxr-xr-x. 2 root root 6 1月 2 20:17 journal
hadoop用户没有w权限,文件写不进去。
我直接把hadoop的数据存到hadoop用户目录下,省时省力。
drwxrwxr-x. 4 hadoop hadoop 37 1月 3 13:46 journal

2021-01-02 20:18:04,650 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT]
2021-01-02 20:18:05,321 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: Scheduling a check for [DISK]file:/data/hdata/dfs/data
2021-01-02 20:18:05,386 WARN org.apache.hadoop.hdfs.server.datanode.checker.StorageLocationChecker: Exception checking StorageLocation [DISK]file:/data/hdata/dfs/data
EPERM: Operation not permitted
	at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl(Native Method)
	at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO.java:381)
	at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:836)
	at org.apache.hadoop.fs.ChecksumFileSystem$1.apply(ChecksumFileSystem.java:508)
	at org.apache.hadoop.fs.ChecksumFileSystem$FsOperation.run(ChecksumFileSystem.java:489)
	at org.apache.hadoop.fs.ChecksumFileSystem.setPermission(ChecksumFileSystem.java:511)
	at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:234)
	at org.apache.hadoop.util.DiskChecker.checkDirInternal(DiskChecker.java:141)
	at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:116)
	at org.apache.hadoop.hdfs.server.datanode.StorageLocation.check(StorageLocation.java:239)
	at org.apache.hadoop.hdfs.server.datanode.StorageLocation.check(StorageLocation.java:52)
	at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker$1.call(ThrottledAsyncChecker.java:142)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
2021-01-02 20:18:05,388 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 0, volumes configured: 1, volumes failed: 1, volume failures tolerated: 0
	at org.apache.hadoop.hdfs.server.datanode.checker.StorageLocationChecker.check(StorageLocationChecker.java:231)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2804)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2719)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2761)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2905)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2929)

解决方案:和上面问题相似,权限问题,启动不了datanode,因为data目录数据写不了。
drwxr-xr-x. 2 root root 6 1月 2 20:17 data

三.缺包或找不到主类问题

Caused by: java.lang.NoClassDefFoundError: javax/activation/DataSource
	at com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl.<clinit>(RuntimeBuiltinLeafInfoImpl.java:457)
	at com.sun.xml.bind.v2.model.impl.RuntimeTypeInfoSetImpl.<init>(RuntimeTypeInfoSetImpl.java:65)
	at com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.createTypeInfoSet(RuntimeModelBuilder.java:133)
	at com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.createTypeInfoSet(RuntimeModelBuilder.java:85)
	at com.sun.xml.bind.v2.model.impl.ModelBuilder.<init>(ModelBuilder.java:156)
	at com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.<init>(RuntimeModelBuilder.java:93)
	at com.sun.xml.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:473)
	at com.sun.xml.bind.v2.runtime.JAXBContextImpl.<init>(JAXBContextImpl.java:319)
	at com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1170)
	at com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:145)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:262)
	at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:249)
	at javax.xml.bind.ContextFinder.find(ContextFinder.java:456)
	at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:656)
	at com.sun.jersey.api.json.JSONJAXBContext.<init>(JSONJAXBContext.java:255)
	at org.apache.hadoop.yarn.server.resourcemanager.webapp.JAXBContextResolver.<init>(JAXBContextResolver.java:69)
	at org.apache.hadoop.yarn.server.resourcemanager.webapp.JAXBContextResolver$$FastClassByGuice$$6a7be7f6.newInstance(<generated>)
	at com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
	at com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:61)
	at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:105)
	at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
	at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:267)
	at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
	at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1103)
	at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
	at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:145)
	at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
	at com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1016)
	at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092)
	at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1012)
	... 54 more
Caused by: java.lang.ClassNotFoundException: javax.activation.DataSource
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
	... 88 more

解决方案:https://blog.csdn.net/slopop12/article/details/108710328
这种问题比较常见,缺啥就补啥。

[2021-01-03 14:52:02.446]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
原因: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.v2.app.MRAppMaster


[2021-01-03 14:52:02.447]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
原因: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.v2.app.MRAppMaster

解决方案:https://blog.csdn.net/qq_41684957/article/details/81710190

四.目录变文件问题

hdfs dfs -put /tmp/wc /tmp
org.apache.hadoop.fs.ParentNotDirectoryException: /tmp (is not a directory)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkIsDirectory(FSPermissionChecker.java:638)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkSimpleTraverse(FSPermissionChecker.java:629)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:604)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkTraverse(FSDirectory.java:1801)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkTraverse(FSDirectory.java:1819)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:679)
	at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:50)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3192)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1157)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:714)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943)

解决方案:hdfs dfs -put /tmp/wc /tmp之前,先hdfs dfs -mkdir /tmp,不然tmp会被创成文件类型。

五.java空指针问题

2021-01-03 15:11:50,386 INFO mapreduce.Job:  map 0% reduce 0%
2021-01-03 15:11:50,408 INFO mapreduce.Job: Job job_1609657572378_0002 failed with state FAILED due to: Application application_1609657572378_0002 failed 2 times due to AM Container for appattempt_1609657572378_0002_000002 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2021-01-03 15:11:49.602]Exception from container-launch.
Container id: container_1609657572378_0002_02_000001
Exit code: 1

[2021-01-03 15:11:49.606]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.


[2021-01-03 15:11:49.606]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.


For more detailed output, check the application tracking page: http://hadoop001:8088/cluster/app/application_1609657572378_0002 Then click on links to logs of each attempt.
. Failing the application.

终端显示的错误看不出来啥,查看web端历史日志,发现是java空指针问题。

2021-01-03 15:06:53,508 ERROR [Listener at 0.0.0.0/45119] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.NullPointerException
	at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:178)
	at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStart(RMCommunicator.java:122)
	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStart(RMContainerAllocator.java:280)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serviceStart(MRAppMaster.java:979)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1293)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1761)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1757)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1691)
Caused by: java.lang.NullPointerException
	at org.apache.hadoop.mapreduce.v2.app.client.MRClientService.getHttpPort(MRClientService.java:177)
	at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:159)
	... 14 more

解决方案:https://blog.csdn.net/yj2434/article/details/107817519

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值