最近在学习大数据相关教程,目前自己在本地搭建了 hadoop-ha 集群, hadoop: 2.10.0 , hive: 1.2.2 遇到以下问题,记录下
hadoop 集群设置为ha 高可用后, mr 运行异常的问题
在未设置为ha 时, mr 任务正常调用,设置ha 后, 在运行 hql 时,调用mr时,报错, 查看日志,报错如下:
2020-03-22 00:12:05,577 ERROR [Listener at 0.0.0.0/37442] org.apache.hadoop.mapreduce.v2.app.client.MRClientService: Webapps failed to start. Ignoring for now:
java.lang.NullPointerException
at org.apache.hadoop.util.StringUtils.join(StringUtils.java:956)
at org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer.initFilter(AmFilterInitializer.java:74)
at org.apache.hadoop.http.HttpServer2.initializeWebServer(HttpServer2.java:463)
at org.apache.hadoop.http.HttpServer2.<init>(HttpServer2.java:409)
at org.apache.hadoop.http.HttpServer2.<init>(HttpServer2.java:112)
at org.apache.hadoop.http.HttpServer2$Builder.build(HttpServer2.java:333)
at org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:315)
at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:401)
at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:397)
at org.apache.hadoop.mapreduce.v2.app.client.MRClientService.serviceStart(MRClientService.java:143)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1272)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1746)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1742)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1673)
看日志中中 at org.apache.hadoop.mapreduce.v2.app.client.MRClientService.getHttpPort(MRClientService.java:177)
at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:156) , 百度了下找到了原因, HA机制下yarn-site.xml需要加入以下配置:
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>rm1机器名:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>rm1机器名:8088</value>
</property>
hiveserver2 启动后,在beeline 中调用mr 时报错
异常信息如下:
java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: hdfs is not allowed to impersonate hdfs
at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:285)
at org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:328)
at org.apache.hadoop.hive.ql.Context.getMRTmpPath(Context.java:389)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:225)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1676)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1435)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1218)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1082)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1077)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
从异常信息来看,是 hdfs账号不允许假扮hdfs用户, 网上的和官网上原因是 没有在core-site.xml文件中配置账户的proxyuser规则,
官网链接:https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html
于是设置以下内容: (我hadoop 集群运行用户是 hadoop 因此我设置的内容如下)
<property>
<name>hadoop.proxyuser.hadoop(此次设置自己的运行用户).groups</name>
<value>*</value>
<description>Allow the superuser oozie to impersonate any members of the group group1 and group2</description>
</property>
<property>
<name>hadoop.proxyuser.(此次设置自己的运行用户).hosts</name>
<value>*</value>
<description>The superuser can connect only from host1 and host2 to impersonate a user</description>
</property>
更新namenode 用户权限
非ha 使用以下命令:
hdfs dfsadmin –refreshSuperUserGroupsConfiguration
ha使用如下:
hdfs dfsadmin -fs hdfs://namenode节点1 -refreshSuperUserGroupsConfiguration
hdfs dfsadmin -fs hdfs:///namenode节点1 -refreshSuperUserGroupsConfiguration
再更新yarn 集群用户权限
yarn rmadmin -refreshSuperUserGroupsConfiguration
感觉稳稳解决了,马上再试一次, 结果还是不行, 报错依旧啊, 心塞, 这里有一个问题: 我运行的用户是 hadoop ,但是报错中的是 hdfs, 这个是最高权限的用户, 用户不一致,因此从这里入手,在环境变量中 /etc/profile 中设置了 export HADOOP_USER_NAME=hadoop
source 后还是未生效, 因此尝试在 core-site.xml 中增加 hdfs 账户的proxyuser规则,
此时报错终于解决了, 但是又出现 hdfs 文件操作权限问题,这个问题相对就简单很多了, 对报错的文件权限设置为 777 即可, hadoop dfs -chmod -R 777 无权限路径,
这个时候就正常运行了