先上干货,原理后便补充
官网:apache.hadoop.org
需要java环境:
server1:172.25.254.1
server2:172.25.254.2
server3:172.25.254.3
server4:172.25.254.4
server5:172.25.254.5
##一.环境配置
####1.hadoop用户,所有主机hadoop用户uid,gid一致,密码相同
server1~5创建hadoop用户,这里以server1为例,保证五台主机的hadoop用户uid和gid一致
[root@server1 ~]# useradd -u 1000 hadoop
[root@server1 ~]# id hadoop
uid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop)
[root@server1 ~]# passwd hadoop ##设置密码westos
Changing password for user hadoop.
New password: westos
BAD PASSWORD: The password is shorter than 8 characters
Retype new password: westos
passwd: all authentication tokens updated successfully.
这个命令也可以
[root@server5 ~]# echo westos | passwd --stdin hadoop
其他主机操作同server1
##2.hardoop环境配置
[hadoop@server1 ~]# su - hadoop
[hadoop@server1 ~]$ ls
hadoop-2.7.3.tar.gz jdk-7u79-linux-x64.tar.gz
[hadoop@server1 ~]$ tar zxf hadoop-2.7.3.tar.gz
[hadoop@server1 ~]$ tar zxf jdk-7u79-linux-x64.tar.gz
[hadoop@server1 ~]$ ln -s hadoop-2.7.3 hadoop
[hadoop@server1 ~]$ ln -s jdk1.7.0_79/ java
完成后进入解压后的目录中修改hadoop环境变量
[hadoop@server1 ~]$ cd hadoop/etc/hadoop/
[hadoop@server1 hadoop]$ vim hadoop-env.sh
24 # The java implementation to use.
25 export JAVA_HOME=/home/hadoop/java
[hadoop@server1 hadoop]$ vim slaves
172.25.254.1
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop/etc/hadoop
##1.单机模式
java算法
[hadoop@server1 hadoop]$ cd ..
[hadoop@server1 etc]$ cd ..
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
/home/hadoop/hadoop新建个目录input
[hadoop@server1 hadoop]$ mkdir input
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ cp etc/hadoop/*.xml input/
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount input/ output ##其中wordcount时统计单词的算法,执行完成后会再当前目录下自动output,cat output/* 查看结果
[hadoop@server1 hadoop]$ cat output/*
省略很多
##二.上边的环境基本搭建好了,伪分布式部署
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ vim etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://172.25.254.1:9000</value>
</property>
</configuration>
[hadoop@server1 hadoop]$ vim etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value> ##上传文件的副本数,默认为3
</property>
</configuration>
免密,包括自己与自己的ssh通信
[hadoop@server1 hadoop]$ exit
logout
[root@server1 ~]# vim /etc/hosts
172.25.254.2 server2
172.25.254.3 server3
172.25.254.4 server4
172.25.254.1 server1
172.25.254.5 server5
[hadoop@server1 hadoop]$ ssh-keygen
[hadoop@server1 hadoop]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub server1
[hadoop@server1 hadoop]$ scp -r /home/hadoop/.ssh/ hadoop@server2:~
[hadoop@server1 hadoop]$ scp -r /home/hadoop/.ssh/ hadoop@server3:~
[hadoop@server1 hadoop]$ scp -r /home/hadoop/.ssh/ hadoop@server4:~
[hadoop@server1 hadoop]$ scp -r /home/hadoop/.ssh/ hadoop@server5:~
格式化文件系统,/tmp/会生成一些文件
[hadoop@server1 hadoop]$ bin/hdfs namenode -format ##注意观察退出码为0
18/05/18 08:55:42 INFO util.ExitUtil: Exiting with status 0
启动hdfs
[hadoop@server1 hadoop]$ sbin/start-dfs.sh
设置java环境变量
[hadoop@server1 ~]$ vim .bash_profile
PATH=$PATH:$HOME/.local/bin:$HOME/bin:/home/hadoop/java/bin
[hadoop@server1 ~]$ source .bash_profile
jps查看启动进程信息
[hadoop@server1 ~]$ jps
7015 DataNode
7193 SecondaryNameNode
6890 NameNode
7356 Jps
浏览器查看
http://172.25.254.1:50070
建立云端目录
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop
浏览器查看
http://172.25.254.1:50070/explorer.html#/
给目录里存些文件浏览器查看
[hadoop@server1 hadoop]$ bin/hdfs dfs -put etc/hadoop/*.xml /user/hadoop/
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls /user/hadoop/
如果你这Replication为3那么定义的副本数1没生效,删除/tmp/的文件,去检查core-site.xml,关闭hdfs后重新格式化,再次启动hdfs,所有数据都会失去
##三.分布式,有其他节点
[hadoop@server1 hadoop]$ cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml ##版本高些可能没有这个模板,直接编辑就行
[hadoop@server1 hadoop]$ vim etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
[hadoop@server1 hadoop]$ vim etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
[hadoop@server1 hadoop]$ sbin/start-yarn.sh
[hadoop@server1 hadoop]$ jps
11472 NodeManager
10529 NameNode
10660 DataNode
11701 Jps
10827 SecondaryNameNode
11373 ResourceManager
浏览器
http://172.25.254.1:8088
目前只有一个存储节点还是seerver1自身,添加server2和server3
关闭yarn和dfs
[hadoop@server1 hadoop]$ sbin/stop-yarn.sh
[hadoop@server1 hadoop]$ sbin/stop-dfs.sh
[hadoop@server1 hadoop]$ jps
12334 Jps
nfs文件系统部署
[hadoop@server1 hadoop]$ systemctl status rpcbind ##检查下rpcnind状态没有安装有了开启并设置开机启动,这里已经安装并且是开机启动
nfs也已经安装
[hadoop@server1 hadoop]$ rpm -q nfs-utils
nfs-utils-1.3.0-0.54.el7.x86_64
[hadoop@server1 hadoop]$ exit
logout
[root@server1 ~]# vim /etc/exports
/home/hadoop 172.25.254.0/255.255.255.0(rw,anonuid=1000,anongid=1000)
[root@server1 ~]# exportfs -rv
exporting 172.25.254.0/255.255.255.0:/home/hadoop
[root@server1 ~]# systemctl start nfs
[root@server1 ~]# systemctl enable nfs
[root@server1 ~]# showmount -e
Export list for server1:
/home/hadoop 172.25.254.0/255.255.255.0
server2挂载
[root@server2 ~]# yum install rpcbind -y
[root@server2 ~]# systemctl start rpcbind
[root@server2 ~]# systemctl enable rpcbind
[root@server2 ~]# yum install nfs-utils -y
[root@server2 ~]# systemctl start nfs
[root@server2 ~]# systemctl enable nfs
[root@server2 ~]# mount 172.25.254.1:/home/hadoop/ /home/hadoop/
[root@server2 ~]# df
开机启动什么的写进/etc/fstab就行
server3
[root@server3 ~]# yum install rpcbind nfs-utils -y
[root@server3 ~]# systemctl start nfs rpcbind
[root@server3 ~]# systemctl enable nfs rpcbind
[root@server3 ~]# mount 172.25.254.1:/home/hadoop/ /home/hadoop/
[root@server3 ~]# df
ll -d /home/hadoop/
分别查看server1,server2,server3所有人所有组相同都是hadoop
切换hadoop用户
[root@server1 ~]# su - hadoop
[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ vim etc/hadoop/slaves
172.25.254.2
172.25.254.3
[hadoop@server1 hadoop]$ vim etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
重新格式化
[hadoop@server1 hadoop]$ sbin/stop-all.sh
[hadoop@server1 hadoop]$ bin/hadoop namenode -format
18/05/18 10:06:03 INFO util.ExitUtil: Exiting with status 0
[hadoop@server1 hadoop]$ sbin/start-dfs.sh
[hadoop@server1 hadoop]$ sbin/start-yarn.sh
[hadoop@server1 hadoop]$ jps
14978 SecondaryNameNode
15801 Jps
13915 ResourceManager
13564 NameNode
server2和server3
[root@server2 ~]# su - hadoop
[hadoop@server2 ~]$ jps
3120 NodeManager
3057 DataNode
3284 Jps
[root@server3 ~]# su - hadoop
Last login: Fri May 18 08:54:26 EDT 2018 from 172.25.254.1 on pts/2
[hadoop@server3 ~]$ jps
3251 Jps
3085 NodeManager
3022 DataNode
浏览器查看节点信息
如法炮制,上传云端文件
[hadoop@server3 hadoop]$ bin/hdfs dfs -mkdir /westos
[hadoop@server3 hadoop]$ bin/hdfs dfs -mkdir /westos/linux
[hadoop@server3 hadoop]$ bin/hdfs dfs -put etc/hadoop/*.xml /westos/linux
浏览器查看有两个数据节点
##手动添加数据节点server4
[root@server4 ~]# yum install rpcbind nfs-utils -y
[root@server4 ~]# systemctl start rpcbind nfs
[root@server4 ~]# systemctl enable rpcbind nfs
[root@server4 ~]# mount 172.25.254.1:/home/hadoop/ /home/hadoop/
[root@server4 ~]# su - hadoop
[hadoop@server4 ~]$ df
网络共享文件系统都可以写会同步
[hadoop@server4 ~]$ vim hadoop/etc/hadoop/slaves
172.25.254.2
172.25.254.3
172.25.254.4
[hadoop@server4 ~]$ hadoop/sbin/hadoop-daemon.sh start datanode
[hadoop@server4 ~]$ jps
6738 Jps
6649 DataNode
[hadoop@server4 ~]$ hadoop/sbin/yarn-daemon.sh start nodemanager
[hadoop@server4 ~]$ jps
6649 DataNode
6772 NodeManager
6881 Jps
[hadoop@server4 hadoop]$ ps ax
server1上查看数据存储情况
[hadoop@server1 hadoop]$ bin/hdfs dfsadmin -report
Configured Capacity: 25725763584 (23.96 GB)
Present Capacity: 13218832384 (12.31 GB)
DFS Remaining: 13218615296 (12.31 GB)
DFS Used: 217088 (212 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (3):
Name: 172.25.254.3:50010 (server3)
Hostname: server3
Decommission Status : Normal
Configured Capacity: 8575254528 (7.99 GB)
DFS Used: 106496 (104 KB)
Non DFS Used: 4960321536 (4.62 GB)
DFS Remaining: 3614826496 (3.37 GB)
DFS Used%: 0.00%
DFS Remaining%: 42.15%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri May 18 10:44:26 EDT 2018
Name: 172.25.254.2:50010 (server2)
Hostname: server2
Decommission Status : Normal
Configured Capacity: 8575254528 (7.99 GB)
DFS Used: 106496 (104 KB)
Non DFS Used: 5846388736 (5.44 GB)
DFS Remaining: 2728759296 (2.54 GB)
DFS Used%: 0.00%
DFS Remaining%: 31.82%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri May 18 10:44:26 EDT 2018
Name: 172.25.254.4:50010 (server4)
Hostname: server4
Decommission Status : Normal
Configured Capacity: 8575254528 (7.99 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 1700220928 (1.58 GB)
DFS Remaining: 6875029504 (6.40 GB)
DFS Used%: 0.00%
DFS Remaining%: 80.17%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri May 18 10:44:28 EDT 2018
上传大文件
[hadoop@server1 hadoop]$ dd if=/dev/zero of=bigfile bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 87.6928 s, 6.0 MB/s
[hadoop@server1 hadoop]$ bin/hdfs dfs -put bigfile
put: `.': No such file or directory ##报错
新建云端目录
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop
[hadoop@server1 hadoop]$ bin/hdfs dfs -put bigfile
卡住
浏览器查看如下,表示正在上传
ok上传完成
如上图,两个副本,也就是说总共1G,事实上数据被分割成块存到了三个数据结点上,
down掉server3,数据迁移
[hadoop@server1 hadoop]$ bin/hdfs dfsadmin -report ##这个命令查看数据存储情况,用兴趣自行查看
[hadoop@server1 hadoop]$ vim etc/hadoop/host-exclude ##没有自行创建,名字和下边的文件写入内容对应,注意路径
172.25.254.3 ##要down掉的数据节点的ip
[hadoop@server1 hadoop]$ vim etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value> ##这里的2是备份副本数,改3效果更明显,因为有三台主机
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/home/hadoop/hadoop/etc/hadoop/host-exclude</value>
</property>
</configuration>
[hadoop@server1 hadoop]$ bin/hdfs dfsadmin -refreshNodes
Refresh nodes successful
查看存储情况server状态为Decommission in progress,表示正在迁移数据,稍等
[hadoop@server1 hadoop]$ bin/hdfs dfsadmin -report
当为 Decommissioned的时候表示迁移完成,同时server2和server4的存储有所增加
[hadoop@server1 hadoop]$ bin/hdfs dfsadmin -report
迁移后server2上504M,server4503M总共1g多,ok
再看看迁移前的server2和server4总共875M其他的在server3上
关闭server3的hdfs,yarn退出集群
##四.hadoop高可用
[hadoop@server1 hadoop]$ sbin/stop-all.sh
[hadoop@server1 hadoop]$ exit
[hadoop@server1 ~]# rm -fr /tmp/*
[root@server1 ~]# su - hadoop
-------------
[hadoop@server2 ~]$ exit
[root@server2 ~]# rm -fr /tmp/*
[root@server2 ~]# su - hadoop
-----------
[hadoop@server3 ~]$ exit
[root@server3 ~]# rm -fr /tmp/*
[root@server3 ~]# su - hadoop
---------
[hadoop@server4 ~]$ exit
[root@server4 ~]# rm -fr /tmp/*
[root@server4 ~]# su - hadoop
###接下来zookeeper安装
server1-5都会有这个包,nfs网络文件共享
[hadoop@server2 ~]$ tar zxf zookeeper-3.4.9.tar.gz ##稍等,可能会卡会
[hadoop@server2 ~]$ cd zookeeper-3.4.9
[hadoop@server2 zookeeper-3.4.9]$ cd conf/
[hadoop@server2 conf]$ ls
configuration.xsl log4j.properties zoo_sample.cfg
[hadoop@server2 conf]$ cp zoo_sample.cfg zoo.cfg
[hadoop@server2 conf]$ vim zoo.cfg
server.1=172.25.254.2:2888:3888
server.2=172.25.254.3:2888:3888
server.3=172.25.254.4:2888:3888
####指定对应id
*server2
[hadoop@server2 conf]$ mkdir /tmp/zookeeper
[hadoop@server2 conf]$ cd /tmp/zookeeper/
[hadoop@server2 zookeeper]$ echo 1 > myid
[hadoop@server2 zookeeper]$ cat myid
1
*server3
[hadoop@server3 hadoop]$ mkdir /tmp/zookeeper
[hadoop@server3 hadoop]$ cd /tmp/zookeeper/
[hadoop@server3 zookeeper]$ echo 2 > myid
*server4
[hadoop@server4 ~]$ mkdir /tmp/zookeeper
[hadoop@server4 ~]$ cd /tmp/zookeeper/
[hadoop@server4 zookeeper]$ echo 3 > myid
server2操作
[hadoop@server2 zookeeper]$ cd /home/hadoop/zookeeper-3.4.9/bin ##你线麻烦了加入环境变量随你
启动zookerper
*server2
[hadoop@server2 bin]$ ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[hadoop@server2 bin]$ ./zkServer.sh status ##可能会包错不用管,继续开其他的
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.
*server3
hadoop@server3 zookeeper]$ cd ~hadoop/zookeeper-3.4.9/bin/
[hadoop@server3 bin]$ ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[hadoop@server3 bin]$ ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Mode: leader ##leader角色
*server4
[hadoop@server4 zookeeper]$ cd ~hadoop/zookeeper-3.4.9/bin/
[hadoop@server4 bin]$ ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[hadoop@server4 bin]$ ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Mode: follower ##follower角色
这个时候在回来查看server2zookeeper状态为follower
###在leader上执行下:
[hadoop@server3 zookeeper-3.4.9]$ bin/zkCli.sh
....一大队省略信息
[zk: localhost:2181(CONNECTED) 0] ls
准备server5
[root@server5 ~]# yum install rpcbind nfs-utils -y
[root@server5 ~]# systemctl start rpcbind nfs
[root@server5 ~]# systemctl enable rpcbind nfs
[root@server5 ~]# mount 172.25.254.1:/home/hadoop/ /home/hadoop/
[root@server5 ~]# df
有很多挂载的东西,那是之前docker做的东西,看最后一个挂载上了
[root@server5 ~]# su - hadoop
[hadoop@server5 ~]$ df
设置开机启动
普通用户没权限
[hadoop@server5 ~]$ chkconfig rpcbind on
###namenode高可用
以下大量文件配置
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ vim etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://masters</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>172.25.254.2:2181,172.25.254.3:2181,172.25.254.4:2181</value>
</property>
</configuration>
[hadoop@server1 hadoop]$ vim etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>masters</value>
</property>
<property>
<name>dfs.ha.namenodes.masters</name>
<value>h1,h2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.masters.h1</name>
<value>172.25.254.1:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.masters.h1</name>
<value>172.25.254.1:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.masters.h2</name>
<value>172.25.254.5:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.masters.h2</name>
<value>172.25.254.5:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://172.25.254.2:8485;172.25.254.3:8485;172.25.254.4:8485/masters</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/tmp/journaldata</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.masters</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>
####启动 hdfs 集群(按顺序启动)
#####1.在三个 DN 上依次启动 zookeeper 集群
启动zookeeper,刚才已经启动,以防万一再启动下,如果启动会告诉你已经启动并返回进程id
[hadoop@server2 zookeeper-3.4.9]$ bin/zkServer.sh start
[hadoop@server3 zookeeper-3.4.9]$ bin/zkServer.sh start
[hadoop@server4 zookeeper-3.4.9]$ bin/zkServer.sh start
#####2.在三个 DN 上依次启动 journalnode(第一次启动 hdfs 必须先启动 journalnode)*
*server2
[hadoop@server2 zookeeper-3.4.9]$ cd
[hadoop@server2 ~]$ cd hadoop
[hadoop@server2 hadoop]$ sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server2.out
[hadoop@server2 hadoop]$ jps
2699 QuorumPeerMain
2797 JournalNode
2847 Jps
*server3
[hadoop@server3 ~]$ cd hadoop/
[hadoop@server3 hadoop]$ sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server3.out
[hadoop@server3 hadoop]$ jps
2677 QuorumPeerMain
2810 JournalNode
2859 Jps
*server4
[hadoop@server4 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server4 hadoop]$ sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server4.out
[hadoop@server4 hadoop]$ jps
3258 JournalNode
2939 QuorumPeerMain
3307 Jps
###格式化 HDFS 集群
h1格式化
[hadoop@server1 hadoop]$ bin/hdfs namenode -format
18/05/18 12:11:17 INFO util.ExitUtil: Exiting with status 0
[hadoop@server1 hadoop]$ scp -r /tmp/hadoop-hadoop server5:/tmp ##Namenode 数据默认存放在/tmp,需要把数据拷贝到 h2,回车不需要密码的
[hadoop@server1 hadoop]$ bin/hdfs zkfc -formatZK ##格式化 zookeeper (只需在 h1 上执行即可)
[hadoop@server1 hadoop]$ sbin/start-dfs.sh ##启动 hdfs 集群,清密切关注输出信息
####检查各个节点状态
*server1
[hadoop@server1 hadoop]$ jps[hadoop@server1 hadoop]$ jps
5203 Jps
4798 NameNode
5118 DFSZKFailoverController
*server5
[hadoop@server5 ~]$ jps
5466 DFSZKFailoverController
5341 NameNode
5594 Jps
*server2
[hadoop@server2 hadoop]$ jps
2897 DataNode
3002 Jps
2699 QuorumPeerMain
2797 JournalNode
*server3
[hadoop@server3 hadoop]$ jps
2677 QuorumPeerMain
3015 Jps
2810 JournalNode
2909 DataNode
*server4
[hadoop@server4 hadoop]$ jps
3573 Jps
3416 DataNode
3258 JournalNode
2939 QuorumPeerMain
####浏览器查看:
ha1处于active状态是server1,h2处于standy状态,server5
server3(leader)查看下
[hadoop@server3 bin]$ ./zkCli.sh
模拟故障切换
[hadoop@server1 hadoop]$ jps
3762 Jps
2633 NameNode
2958 DFSZKFailoverController
[hadoop@server1 hadoop]$ kill -9 2633
[hadoop@server1 hadoop]$ jps
2958 DFSZKFailoverController
3775 Jps
浏览器查看namenode为server5
启动server1 namenode 为挂起状态
[hadoop@server1 hadoop]$ sbin/hadoop-daemon.sh start namenode
leader查看namenode切换h2,就是server5
*挂掉server5,namenode到server1上,
*开启server5的namenode
##2.RM(RESOURCEMANAGER)高可用
NN和RM最好分开,都占资源
编辑 mapred-site.xml 文件
[hadoop@server1 hadoop]$ vim etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
编辑 yarn-site.xml 文件
[hadoop@server1 hadoop]$ vim etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>RM_CLUSTER</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>172.25.254.1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>172.25.254.5</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>172.25.254.2:2181,172.25.254.3:2181,172.25.254.4:2181</value>
</property>
</configuration>
启动 yarn 服务
[hadoop@server1 hadoop]$ sbin/start-yarn.sh
[hadoop@server1 hadoop]$ jps
16644 DFSZKFailoverController
17396 NameNode
18274 Jps
18180 ResourceManager
RM2 上需要手动启动
[hadoop@host5 hadoop]$ sbin/yarn-daemon.sh start resourcemanager
[hadoop@server5 hadoop]$ jps
12898 ResourceManager
11923 DFSZKFailoverController
12503 NameNode
12930 Jps
RM在server1,访问server5:8088
自动跳转server1:8088
####yarn故障测试
[hadoop@server1 hadoop]$ jps
16644 DFSZKFailoverController
18548 Jps
17396 NameNode
18180 ResourceManager
[hadoop@server1 hadoop]$ kill -9 18180
[hadoop@server1 hadoop]$ jps
16644 DFSZKFailoverController
17396 NameNode
18558 Jps
*资源管理节点在server5上