安装单节点伪分布式 CDH hadoop 集群

原来安装都是三个节点,今天要装个单节点的,装完后 MapReduce 总是不能提交到 YARN,折腾了一下午也没搞定

MR1  中 Job 提交到 JobTracker,在 YARN 中应该提交到 ResourceManager,但发现起了个 LocalJob,经发现做如下配置并不生效

  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
其实在 YARN 中没有下面这个配置了,但经过检查代码 JobClient 代码后还是做了如下配置,值为 ResourceManager 的地址,

   <property>
        <name>mapred.job.tracker</name>
        <value>com3:8031</value>
   </property>

虽然这次找到了 ResourceManager,但总是碰到 Unknown rpc kind RPC_WRITABLE 的问题

经检查,服务端,也就是 ResourceManager 端每次使用一种 RPC 类型,都会注册到一个 Map 变量中,并且只能处理已经注册过的 RPC 类型,

这里一共就只有两种类型: Google 的 protobuf 和 hadoop 的 Writable

public class ProtobufRpcEngine implements RpcEngine {
  public static final Log LOG = LogFactory.getLog(ProtobufRpcEngine.class);
  
  static { // Register the rpcRequest deserializer for WritableRpcEngine 
    org.apache.hadoop.ipc.Server.registerProtocolEngine(
        RPC.RpcKind.RPC_PROTOCOL_BUFFER, RpcRequestWritable.class,
        new Server.ProtoBufRpcInvoker());
  }

但是 服务端只注册了 protobuf ,因此无法接受客户端 提交 Job 时使用的 Writable 类型的消息,引起上面错误

经检查客户端代码,特为 客户端提交Job 时使用的协议 JobSubmissionProtocol 指定使用的 RPC 类型为 protobuf:

  <property>
   	<name>rpc.engine.org.apache.hadoop.mapred.JobSubmissionProtocol</name>
   	<value>org.apache.hadoop.ipc.ProtobufRpcEngine</value>
   </property>

结果造成如下错误:

Exception in thread "main" java.lang.NullPointerException
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:138)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:193)
	at org.apache.hadoop.mapred.$Proxy10.getStagingAreaDir(Unknown Source)
	at org.apache.hadoop.mapred.JobClient.getStagingAreaDir(JobClient.java:1340)
	at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:102)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:954)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:948)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:948)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
	at mr.ref.WordCount.main(WordCount.java:90)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

其原因是 ResourceManager 接受了这个消息,但在处理过程中对消息内容做了错误的假设,结果仍是无法处理

如果不指定客户端协议类型,则就是这个错误:

14/03/31 11:16:52 ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unknown rpc kind RPC_WRITABLE
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unknown rpc kind RPC_WRITABLE
	at org.apache.hadoop.ipc.Client.call(Client.java:1238)
	at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:225)


修改客户端 RPC 类型没用,那么修改服务端的,默认值为

<property>
	<name>yarn.ipc.rpc.class</name>
	<value>org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC</value>
</property>
我尝试过改为 org.apache.hadoop.hbase.ipc.WritableRpcEngine 及其他值,都会引起各种各样的问题,并且服务端协议相关的配置就有好几个,还得想别的方法

便去检查 mapreduce.framework.name 为什么没生效,根据这个值找到了一个 JobClient ,其 init 方法同 eclipse 依赖中的 JobClient 不同,这才注意到这段注释

  /**
   * Connect to the default cluster
   * @param conf the job configuration.
   * @throws IOException
   */
  public void init(JobConf conf) throws IOException {
    setConf(conf);
    cluster = new Cluster(conf);
    clientUgi = UserGroupInformation.getCurrentUser();
  }

这还是 MR1 时代的 JobClient,在 /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.0.0-cdh4.5.0.jar 

和  /usr/lib/hadoop-0.20-mapreduce/hadoop-core-2.0.0-mr1-cdh4.5.0.jar 中都有一个 JobClient ,前者才是 YARN 时代的 

通过检查 运行 Job 时的 CLASSPATH 后,修正了 CLASSPATH,修改文件  /usr/lib/hadoop/libexec/hadoop-layout.sh

将 HADOOP_MAPRED_HOME=${HADOOP_MAPRED_HOME:-"/usr/lib/hadoop-0.20-mapreduce"}

改为 HADOOP_MAPRED_HOME=${HADOOP_MAPRED_HOME:-"/usr/lib/hadoop-mapreduce"}

就可以了,但此时提交 Job 还是出错,非常难找,把 YARN 日志调为 DEBUG ,客户端,后台仍没有任何 ERROR,这个问题回头再说


上面其实就是个关于 变量设置的问题,后来安装伪分布集群时才发现,CDH 文档已经做了说明

如果要将 Job 提交到 YARN ,需设置:

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
如果要将 Job 提交到 JobTracker ,需做如下设置,这也是默认值

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-0.20-mapreduce

因为今天安装完单点集群后,仍不能向 YARN 提交 Job,并且问题很难找,DEBUG 日志中都没有 ERROR,只有一个 WARN:

提交 Job 后,客户端控制台停止输出,此时的 ResourceManager 日志, 注意 WARN 和 FAIL

13714 2014-03-31 19:50:50,870 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1396266549856_0001 S      tate change from ACCEPTED to FAILED
13715 2014-03-31 19:50:50,870 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.ser      ver.resourcemanager.scheduler.event.AppRemovedSchedulerEvent.EventType: APP_REMOVED
13716 2014-03-31 19:50:50,870 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.ser      ver.resourcemanager.rmnode.RMNodeCleanAppEvent.EventType: CLEANUP_APP
13717 2014-03-31 19:50:50,870 DEBUG org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing com2:55147 of type       CLEANUP_APP
13718 2014-03-31 19:50:50,871 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: post-assignContain      ers
13719 2014-03-31 19:50:50,871 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: showRequ      ests: application=application_1396266549856_0001 headRoom=memory: 6144 currentConsumption=0
13720 2014-03-31 19:50:50,871 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: showRequ      ests: application=application_1396266549856_0001 request={Priority: 0, Capability: memory: 2048}
13721 2014-03-31 19:50:50,871 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Node after allocat      ion com2:55147 resource = memory: 8192
13722 2014-03-31 19:50:50,872 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application applicat      ion_1396266549856_0001 requests cleared
13723 2014-03-31 19:50:50,872 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.ser      ver.resourcemanager.RMAppManagerEvent.EventType: APP_COMPLETED
13724 2014-03-31 19:50:50,872 DEBUG org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: RMAppManager processing event for       application_1396266549856_0001 of type APP_COMPLETED
************************************
13725 2014-03-31 19:50:50,872 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root     OPERATION=Applicatio      n Finished - Failed TARGET=RMAppManager     RESULT=FAILURE  DESCRIPTION=App failed with state: FAILED       PERMISSIONS=Appl      ication application_1396266549856_0001 failed 1 times due to AM Container for appattempt_1396266549856_0001_000001 exited wi      th  exitCode: 1 due to:
13726 .Failing this attempt.. Failing the application.        APPID=application_1396266549856_0001
13727 2014-03-31 19:50:50,876 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=applicatio      n_1396266549856_0001,name=word count,user=root,queue=default,state=FAILED,trackingUrl=com2:8088/proxy/application_1396266549      856_0001/,appMasterHost=N/A,startTime=1396266647295,finishTime=1396266650870
************************************
13728 2014-03-31 19:50:51,519 DEBUG org.apache.hadoop.ipc.Server:  got #55

同一时间的 NodeManager 日志:

2014-03-31 19:50:50,528 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root OPERATION=Container Finished - Failed   TARGET=ContainerImpl    RESULT=FAILURE  DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE    APPID=application_1396266549856_0001    CONTAINERID=container_1396266549856_0001_01_000001
2014-03-31 19:50:50,280 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /opt/data/hadoop/hadoop-yarn/nm-local-dir/usercache/root/appcache/application_1396266549856_0001/container_1396266549856_0001_01_000001/default_container_executor.sh]
2014-03-31 19:50:50,493 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from task is : 1
2014-03-31 19:50:50,493 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
2014-03-31 19:50:50,494 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_1396266549856_0001_01_000001 of type UPDATE_DIAGNOSTICS_MSG
2014-03-31 19:50:50,494 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_1396266549856_0001_01_000001 completed with exit code 1
2014-03-31 19:50:50,495 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 1
2014-03-31 19:50:50,495 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerExitEvent.EventType: CONTAINER_EXITED_WITH_FAILURE
2014-03-31 19:50:50,495 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_1396266549856_0001_01_000001 of type CONTAINER_EXITED_WITH_FAILURE
2014-03-31 19:50:50,496 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1396266549856_0001_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
2014-03-31 19:50:50,496 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent.EventType: CLEANUP_CONTAINER
2014-03-31 19:50:50,496 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1396266549856_0001_01_000001

暂不在这浪费时间了,因为才发现 CDH 已经提供了现成配置 hadoop-conf-pseudo.x86_64 ,直接安装就 OK,简述安装步骤:

准备 CDH  仓库配置

[cloudera-cdh4.2.1]
name=Cloudera's Distribution for Hadoop, Version 4.2.1
baseurl=http://archive-primary.cloudera.com/cdh4/redhat/6/x86_64/cdh/4.2.1/
gpgkey =  http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
gpgcheck = 1
物理机为 Fedora ,需要安装:

yum -y install createrepo yum-utils

复制仓库到本地,创建本地仓库
mkdir -p /var/www/cloudera-cdh4/cdh4/4.2.1/RPMS
reposync -p /var/www/cloudera-cdh4/cdh4/4.2.1/RPMS --repoid=cloudera-cdh4.2.1
createrepo -o /var/www/cloudera-cdh4/cdh4/4.2.1 /var/www/cloudera-cdh4/cdh4/4.2.1/RPMS

将仓库配置为 Web 访问目录,供虚拟机安装,我的 Apache 版本为 2.4.7,RHEL6.X 中的版本可能比较低,权限配置略有差别

在 CentOS 6.4 自带的 2.2.15 中,需要将 Require all granted 这行去掉

# cat /etc/httpd/conf.d/cloudera.conf
 
NameVirtualHost 192.168.3.1:80
<VirtualHost 192.168.3.1:80>
    DocumentRoot /var/www/cloudera-cdh4
    ServerName 192.168.3.1
    <Directory />
    Options All
    AllowOverride All
    Require all granted
    </Directory>
</VirtualHost>
安装 hadoop 的主机中还要安装一个包,在 IOS 文件中就有

yum -y install nc
启动 httpd,在 虚拟机中配置 yum 

# cat /etc/yum.repos.d/cloudera-cdh4.2.1.repo 
[cloudera-cdh4.2.1]
name=cdh4.2.1
baseurl=http://192.168.3.1/cdh4/4.2.1/
gpgcheck = 0
enable=1
此时便可安装 hadoop

yum -y install hadoop.x86_64 hadoop-hdfs-namenode.x86_64 hadoop-hdfs-datanode.x86_64 
yum -y install hadoop-client.x86_64 hadoop-mapreduce.x86_64 hadoop-conf-pseudo.x86_64
yum -y install hadoop-yarn-resourcemanager.x86_64 hadoop-yarn-nodemanager.x86_64
现成的 conf.pseudo ,很贴心
alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.pseudo/ 30
alternatives --set hadoop-conf /etc/hadoop/conf.pseudo/
格式化,启动 HDFS
sudo -u hdfs hdfs namenode -format
/etc/init.d/hadoop-hdfs-namenode start
/etc/init.d/hadoop-hdfs-datanode start
创建工作目录

sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done_intermediate
sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging
sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn
sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn
启动 YARN

/etc/init.d/hadoop-yarn-resourcemanager start
/etc/init.d/hadoop-yarn-nodemanager start
创建用户目录

sudo -u hdfs hadoop fs -mkdir /user/hdfs
sudo -u hdfs hadoop fs -chown hdfs /user/hdfs
sudo -u hdfs hadoop fs -mkdir /user/root
sudo -u hdfs hadoop fs -chown root /user/root
sudo -u hdfs hadoop fs -mkdir /user/mapred
sudo -u hdfs hadoop fs -chown mapred /user/mapred
sudo -u hdfs hadoop fs -mkdir /user/yarn
sudo -u hdfs hadoop fs -chown yarn /user/yarn

安装完成

下面使用一个测试用户,测试提交 Job,跟文档一样,就叫 joe

[root@com2 mr]# useradd joe
[root@com2 mr]# passwd joe
 
[root@com2 mr]# su joe
[joe@com2 mr]$ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
[joe@com2 mr]# sudo -u hdfs hadoop fs -mkdir /user/joe
[joe@com2 mr]# sudo -u hdfs hadoop fs -chown joe /user/joe
 
[joe@com2 mr]$ hadoop fs -mkdir input
[joe@com2 mr]$ hadoop fs -put /etc/hadoop/conf/*.xml input
[joe@com2 mr]$ hadoop fs -ls input
Found 4 items
-rw-r--r--   1 joe supergroup       1461 2014-03-31 21:35 input/core-site.xml
-rw-r--r--   1 joe supergroup       1854 2014-03-31 21:35 input/hdfs-site.xml
-rw-r--r--   1 joe supergroup       1325 2014-03-31 21:35 input/mapred-site.xml
-rw-r--r--   1 joe supergroup       2262 2014-03-31 21:35 input/yarn-site.xml
运行 MapReduce ,查看结果
[joe@com2 mr]$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 'dfs[a-z.]+'
[joe@com2 mr]$ hadoop fs -ls output23
Found 2 items
-rw-r--r--   1 joe supergroup          0 2014-03-31 21:37 output23/_SUCCESS
-rw-r--r--   1 joe supergroup        150 2014-03-31 21:37 output23/part-r-00000
[joe@com2 mr]$
[joe@com2 mr]$ hadoop fs -cat output23/part-r-00000 | head
1   dfs.safemode.min.datanodes
1   dfs.safemode.extension
1   dfs.replication
1   dfs.namenode.name.dir
1   dfs.namenode.checkpoint.dir
1   dfs.datanode.data.dir



参考:

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Quick-Start/cdh4qs_topic_3_3.html


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值