准备
一些依赖包
hadoop-2.7.4-tar.gz
zookeeper-3.4.6.tar.gz
jdk-8u144-linux-x64.tar.gz虚拟机 centos-7 -86_64
master 192.168.10.4 hadoop jdk
master1 192.168.10.8 hadoop jdk
slave1 192.168.10.5 hadoop jdk zookeeper
slave2 192.168.10.6 hadoop jdk zookeeper
slave3 192.168.10.7 hadoop jdk zookeeper
- 虚拟机的IP配置、源更新、关闭防火墙自行百度
- 电脑笔记本内存最好16G以上master节点分配2-3G,其余1G
开始安装
zookeeper的安装
将zookeeper-3.4.6.jar解压到slave1相应目录下
cd /usr/local/zookeeper-3.4.6/conf/
cp zoo_sample.cfg zoo.cfg
修改再zoo.cfg中添加配置
vim zoo.cfg
dataDir=/usr/local/zookeeper-3.4.6/tmp
在zoo.cfg的末尾增加节点记录通讯
其中,2888,3888是zoo内部和外部通讯端口
server.1=slave1:2888:3888
server.2=slave2:2888:3888
server.3=slave3:2888:3888
在/usr/local/zookeeper-3.4.6下创建文件夹tmp,然后再tmp中创建空文件myid,将id写入该文件,id和slave编号对应
mkdir /usr/local/zookeeper-3.4.6/tmp
touch /usr/local/zookeeper-3.4.6/tmp/myid
echo 1 > /usr/local/zookeeper-3.4.6/tmp/myid
这样,slave1上的zookeeper就配置好了,可以将zookeeper直接发送给其他两个slave节点,然后将myid里面的数值修改。
scp -r /usr/local/zookeeper-3.4.6/ slave2:/usr/local/
scp -r /usr/local/zookeeper-3.4.6/ slave3:/usr/local/
#在slave2上
echo 2 > /usr/local/zookeeper-3.4.6/tmp/myid
#在slave3上
echo 3 > /usr/local/zookeeper-3.4.6/tmp/myid
到这里,ZOOKEEPER就配置完成
JDK的安装
将jdk-8u144-linux-x64.tar.gz解压复制到master节点的/usr/local
在~/.bashrc环境变量文件里增加JAVA环境变量(/etc/profile都行)
export JAVA_HOME=/usr/local/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin
配置完JAVA环境变量后,source一下源文件,然后发送给其他的节点,节约时间
source ~/.bashrc
scp -r /usr/local/jdk1.8.0_144/ root@slave1:/usr/local
scp -r /usr/local/jdk1.8.0_144/ root@slave2:/usr/local
scp -r /usr/local/jdk1.8.0_144/ root@slave3:/usr/local
scp -r /usr/local/jdk1.8.0_144/ root@master1:/usr/local
发送完毕,在每个节点source了源文件后,输入java -version测试
如果出现java的版本,那就代表java配置完成
[root@slave1 conf]# java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
HADOOP的配置
1、创建hadoop账号
useradd hadoop
passwd hadoop
2、配置5台虚拟机之间的ssh免密登录(方便后面启动服务)
ssh-keygen -t rsa
#直接回车确认,知道生成密钥对
#将公钥发送给所有节点,包括自己本身
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@master
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave1
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave2
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave3
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@master1
#修改公钥文件权限
chmod 0600 ~/.ssh/authorized_keys
#发送完毕后,查看一下是否每个文件里都有5个公钥
cat ~/.ssh/authorized_keys
2、解压hadoop到Master节点的/usr/local/目录下
tar -zxf hadoop1.8.0_144.jar.gz /usr/local/hadoop
3、更改hadoop文件所属用户(修改为hadoop)
chown -R hadoop:hadoop /usr/local/hadoop
4、配置hadoop的环境变量
su hadoop
vim ~/.bashrc
#在文件最后添加
export JAVA_HOME=/usr/local/jdk1.8.0_144
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_INSTALL=$HADOOP_HOME
export PATH=$PATH:JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
#输入完后保存退出,记得source一下源文件
source ~/.bashrc
5、添加hadoop的jdk
cd /usr/local/hadoop/etc/hadoop
vim hadoop-env.sh
#在配置里面增加
export JAVA_HOME=/usr/local/jdk1.8.0_144
6、修改core-site.xml配置文件
cd /usr/local/hadoop/etc/hadoop
vim core-site.xml
#这个文件将会指定hadoop HA集群向外提供服务的时候的名字
<configuration>
<!-- 指定hdfs的nameservice为ns1 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns/</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>slave1:2181,slave2:2181,slave3:2181</value>
</property>
</configuration>
7、配置hdfs-site.xml文件
vim hdfs-site.xml
#大部分hdfs的错误都是这个文件配置错误导致
<configuration>
<!--name,data目录-->
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.namenode.data.dir</name>
<value>/usr/local/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>secondary:50090</value>
</property>
<!--副本数量-->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!--HA配置-->
<!--指定hdfs的nameservice为master,需要和core-site.xml一致-->
<property>
<name>dfs.nameservices</name>
<value>ns</value>
</property>
<property>
<name>dfs.ha.namenodes.ns</name>
<value>master,master1</value>
</property>
<!--master,master1的RPC通讯地址-->
<property>
<name>dfs.namenode.rpc-address.ns.master</name>
<value>master:9000</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns.master1</name>
<value>master1:9000</value>
</property>
<!--nmaster,master1的HTTP通讯地址-->
<property>
<name>dfs.namenode.http-address.ns.master</name>
<value>master:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.ns.master1</name>
<value>master1:50070</value>
</property>
<!--rpc服务-->
<property>
<name>dfs.namenode.servicerpc-address.nn.master</name>
<value>master:53310</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nn.master1</name>
<value>master1:53310</value>
</property>
<!--namdenode的同步-->
<property>
<name>dfs.journalnode.http-address</name>
<value>0.0.0.0:8480</value>
</property>
<property>
<name>dfs.journalnode.rpc-address</name>
<value>0.0.0.0:8485</value>
</property>
<!--指定namenode的元数据在journalnode上的存放位置-->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://slave1:8485;slave2:8485;slave3:8485/ns</value>
</property>
<!--journalNode存放数据位置-->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/usr/local/hadoop/dfs/journal</value>
</property>
<!--namenode失败自动切换-->
<property>
<name>dfs.client.failover.proxy.provider.ns</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!--免密登录-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<!--配置ssh隔离超时时间-->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<!--基于zk的zkfc自动切换机制-->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>slave1:2181,slave2:2181,slave3:2181</value>
</property>
<!--指定ZooKeeper超时间隔,单位毫秒 -->
<property>
<name>ha.zookeeper.session-timeout.ms</name>
<value>2000</value>
</property>
</configuration>
8、yarn-site.xml的修改
vim yarn-site.xml
#资源调度的配置
<configuration>
<property>
<!--NodeManager上运行的服务(这里是mapreduce_shuffle)-->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!--指定cluster的ID-->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<!--两台主机的标识符-->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!--RM1-->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>master</value>
</property>
<!--RM2-->
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>master1</value>
</property>
<!--RM故障后自动切换-->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!--RM故障后自动恢复-->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!--两个8088的http访问接口-->
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>master1:8088</value>
</property>
<!--ZK集群-->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>slave1:2181,slave2:2181,slave3:2181</value>
</property>
</configuration>
9、mapred-site.xml文件
vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!--配置MapReduce jobhistory地址-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<!--配置MapReduce的http地址-->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
10、在master上创建journal文件夹
mkdir /usr/local/hadoop/dfs/journal
然后将hadoop文件夹发送给其他的虚拟机
scp -r /usr/local/hadoop root@slave1:/usr/local/
scp -r /usr/local/hadoop root@slave2:/usr/local/
scp -r /usr/local/hadoop root@slave3:/usr/local/
scp -r /usr/local/hadoop root@master1:/usr/local/
11、修改zookeeper启动服务脚本的权限,不然在hadoop用户下启动会出现权限问题
chmod a+xwr zkServer.sh
到这里,简单的Hadoop HA集群就搭建完了
HADOOP的启动
1、在slave1、2、3上启动journalnode
hadoop-daemon.sh start journalnode
2、在master上格式化namenode(只第一次启动进行)
hdfs namenode -format
格式化后会在hadoop下生成tmp文件夹,发送给master1
scp -r tmp master1:/usr/local/hadoop/
在salve1,2,3上启动zookeeper
zkServer.sh start
在master上格式化ZKFC(只在第一次搭建)
hdfs zkfc -formatZK
在master1上同步数据(只在第一次搭建)
hdfs namenode -bootstrapStandby
在namenode上启动zkfc
hadoop-daemon.sh start zkfc
在master1上启动yarn
yarn-daemon.sh start resourcemanager
在master上启动yarn和hdfs
start-dfs.sh
start-yarn.sh
在master上启动historyserver
mr-jobhistory-daemon.sh start historyserver
到这里,hadoop所有服务启动完成,可以进如两个登录界面查看
master
master1
yarn界面
- 如果有任何建议或者意见,请联系我。 谢谢