Zookeeper集群
先需要搭建zookeeper集群, 请参考前面的文章: Zookeeper集群
Hadoop配置
core-site.xml:
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-2.8.0/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>192.168.56.1:2181,192.168.56.101:2181,192.168.56.102:2181</value>
</property>
<property>
<name>ha.zookeeper.session-timeout.ms</name>
<value>3000</value>
</property>
hdfs-site.xml:
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<!-- mycluster下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>192.168.56.1:9000</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>192.168.56.101:9000</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>192.168.56.1:50070</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>192.168.56.101:50070</value>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://192.168.56.1:8485;192.168.56.101:8485;192.168.56.102:8485/mycluster</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/hadoop-2.8.0/tmp/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
</value>
</property>
<!-- 配置隔离机制,多个机制用换行分割,即每个机制暂用一行 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免密码登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>~/.ssh/id_rsa</value>
</property>
<!-- 配置sshfence隔离机制超时时间 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<!--指定namenode名称空间的存储地址 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/hadoop-2.8.0/hdfs/name</value>
</property>
<!--指定datanode数据存储地址 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///opt/hadoop-2.8.0/hdfs/data</value>
</property>
<!--指定数据冗余份数 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
mapred-site.xml:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>0.0.0.0:10020</value>
</property>
<!-- 配置 MapReduce JobHistory Server web ui 地址, 默认端口19888 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>0.0.0.0:19888</value>
</property>
yarn-site.xml:
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!--开启自动恢复功能 -->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<!--配置resourcemanager -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>192.168.56.1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>192.168.56.101</value>
</property>
<!-- <property> <name>yarn.resourcemanager.ha.id</name> <value>rm1</value>
<description>If we want to launch more than one RM in single node,we need
this configuration</description> </property> -->
<!-- 指定zk集群地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>192.168.56.1:2181,192.168.56.101:2181,192.168.56.102:2181</value>
</property>
!--配置与zookeeper的连接地址-->
<property>
<name>yarn.resourcemanager.zk-state-store.address</name>
<value>192.168.56.1:2181,192.168.56.101:2181,192.168.56.102:2181</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>192.168.56.1:2181,192.168.56.101:2181,192.168.56.102:2181</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
<value>/yarn-leader-election</value>
<description>Optionalsetting.Thedefaultvalueis/yarn-leader-election
</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
vi etc/hadoop/slaves:
192.168.56.1
192.168.56.101
192.168.56.102
环境变量
hadoop-env.sh和yarn-env.sh中添加JAVA_HOME:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home
启动Hadoop集群:
1)启动zookeeper集群: 分别在3个zookeeper服务器上执行: zkServer.sh start
2)启动journalnode集群: 在第一台机器上执行: sbin/hadoop-daemons.sh start journalnode
3)格式化zkfc,让在zookeeper中生成ha节点:在01机器上执行:hdfs zkfc -formatZK
4)格式化hdfs: 在01机器上: hadoop namenode -format
5)启动NameNode: 在01上: sbin/hadoop-daemon.sh start namenode
在02上: bin/hdfs namenode -bootstrapStandby
sbin/hadoop-daemon.sh start namenode
6)启动datanode集群:在01上: sbin/hadoop-daemons.sh start datanode
7)启动yarn集群: 在01上:sbin/start-yarn.sh
8)启动ZKFC: 在01上: sbin/hadoop-daemons.sh start zkfc
Web界面
启动完成后可以访问:
http://192.168.56.1:50070
http://192.168.56.101:50070
http://192.168.56.1:8088