规划机器及应用
我们这次使用三台机器进行HA架构的搭建 hadoop1 hadoop2 hadoop3
机器及角色分配如下:
hadoop1 | hadoop2 | hadoop3 |
---|---|---|
zookeeper1 | zookeeper2 | zookeeper3 |
journalnode1 | journalnode2 | journalnode3 |
NameNode1 | NameNode2 | |
zkfc1 | zkfc2 | |
DataNode1 | DataNode2 | DataNode3 |
ResourceManager1 | ResourceManager2 | |
NodeMananger1 | NodeMananger2 | NodeMananger3 |
hadoop配置文件
1. core-site.xml
<configuration>
<!-- 指定hdfs的NameService为ns1 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1/</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop-2.7.5/data/</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
</property>
</configuration>
2.hdfs-site.xml
<configuration>
<!-- 指定hdfs的NameService为ns1, 需要和core-site.xml中保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<!-- ns1下面有两个NameNode节点,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>hadoop1:9000</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>hadoop1:50070</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>hadoop2:9000</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>hadoop2:50070</value>
</property>
<!-- 指定NameNode的元数据再JournalNode的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/ns1</value>
</property>
<!-- 指定JournalNode在本地磁盘存放的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/hadoop-2.7.5/journaldata</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制方法,多个机制用换行分割, 即每个机制暂用一行 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔离机制是需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<!-- 配置sshfence隔离机制超时时间 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>
3.mapred-site.xml
<configuration>
<!-- 指定mr框架为yarn方式 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
4.yarn-site.xml
<configuration>
<!-- 开启RM高可用 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop2</value>
</property>
<!-- 指定zk集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
配置slaves
只修改hadoop1上的/data/hadoop-2.7.5/etc/hadoop/slaves
hadoop1
hadoop2
hadoop3
配置ssh免密登陆
ssh-keygen rsa
ssh-copy-id hadoop1
启动
- 首先启动zookeeper, hadoop1,hadoop2,hadoop3全部启动
./zkServer.sh start 启动
./zkserver.sh status 查看状态
- 启动JournalNode, hadoop1,hadoop2,hadoop3全部启动
hadoop-daemon.sh start journalnode
- 格式化HDFS
在hadoop1上运行
hadoop namenode -format
为了保证两个NameNode初始化的fsimage一致,直接把 data文件 拷贝过去
scp -r /data/hadoop-2.7.5/data/ hadoop2:/data/hadoop-2.7.5/
或者在hadoop2上执行
hdfs namenode -bootstrapStandby
- 格式化ZKFC
在hadoop1上执行
hdfs zkfc -formatZK
是为了在zookeeper上创建相应的文件,所以执行一次就好了
- 启动hdfs
hadoop1上运行
[hadoop@hadoop1 hadoop]$ start-dfs.sh
Starting namenodes on [hadoop1 hadoop2]
hadoop1: namenode running as process 3826. Stop it first.
hadoop2: namenode running as process 2965. Stop it first.
hadoop1: datanode running as process 4024. Stop it first.
hadoop3: datanode running as process 2928. Stop it first.
hadoop2: datanode running as process 3036. Stop it first.
Starting journal nodes [hadoop1 hadoop2 hadoop3]
hadoop2: journalnode running as process 2657. Stop it first.
hadoop3: journalnode running as process 2661. Stop it first.
hadoop1: journalnode running as process 3077. Stop it first.
Starting ZK Failover Controllers on NN hosts [hadoop1 hadoop2]
hadoop2: zkfc running as process 3171. Stop it first.
hadoop1: zkfc running as process 4461. Stop it first.
- 启动yarn框架
hadoop1上运行
[hadoop@hadoop1 hadoop]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /data/hadoop-2.7.5/logs/yarn-hadoop-resourcemanager-hadoop1.out
hadoop3: starting nodemanager, logging to /data/hadoop-2.7.5/logs/yarn-hadoop-nodemanager-hadoop3.out
hadoop2: starting nodemanager, logging to /data/hadoop-2.7.5/logs/yarn-hadoop-nodemanager-hadoop2.out
hadoop1: starting nodemanager, logging to /data/hadoop-2.7.5/logs/yarn-hadoop-nodemanager-hadoop1.out
在hadoop2上启动resourcemanager
[hadoop@hadoop2 hadoop]$ yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /data/hadoop-2.7.5/logs/yarn-hadoop-resourcemanager-hadoop2.out
全部启动完成
[hadoop@hadoop1 hadoop]$ jps
3826 NameNode
3077 JournalNode
2967 QuorumPeerMain
4024 DataNode
5692 ResourceManager
4461 DFSZKFailoverController
5821 NodeManager
6223 Jps
[hadoop@hadoop2 hadoop]$ jps
2657 JournalNode
3890 Jps
3171 DFSZKFailoverController
2965 NameNode
3782 ResourceManager
3640 NodeManager
2585 QuorumPeerMain
3036 DataNode
[hadoop@hadoop3 hadoop]$ jps
2928 DataNode
2661 JournalNode
2566 QuorumPeerMain
3270 NodeManager
3403 Jps
测试上传文件用例
需要把配置文件拷贝到代码路径下
package cn.itcast.hadoop.hdfs;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class HdfsUtailHA {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://ns1/"), conf, "hadoop");
fs.copyFromLocalFile(new Path("/home/hadoop/jdk-8u161-linux-x64.tar.gz"), new Path("hdfs://ns1/"));
}
}