hadoop分布式集群的搭建

修改hosts文件

在上章中 CentOS 7已经配置了Java环境,采用搭建elasticsearch集群的三台 Linux CentOS 7机器,搭建三节点 Hadoop分布式集群,其中node01作为Master,node2和node3作为slaves。参考:http://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/ClusterSetup.html

NodeNameIP地址
node01192.168.92.90
node02192.168.92.91
node03192.168.92.92

永久设置主机名,需要修改hosts文件,设置虚拟机的ip和主机名的映射关系,并关闭防火墙

[root@node01 ~]# vim /etc/sysconfig/network
#########
HOSTNAME=node01
[root@node01 ~]# vim /etc/hosts
#########
192.168.92.90 node01
192.168.92.91 node02
192.168.92.92 node03
[root@node01 ~]# systemctl stop firewalld
[root@node01 ~]# systemctl disable firewalld.service

[root@node02 ~]# vim /etc/sysconfig/network
#########
HOSTNAME=node02
[root@node02 ~]# vim /etc/hosts
#########
192.168.92.90 node01
192.168.92.91 node02
192.168.92.92 node03
[root@node02 ~]# systemctl stop firewalld
[root@node02 ~]# systemctl disable firewalld.service

[root@node03 ~]# vim /etc/sysconfig/network
#########
HOSTNAME=node03
[root@node03 ~]# vim /etc/hosts
#########
192.168.92.90 node01
192.168.92.91 node02
192.168.92.92 node03
[root@node03 ~]# systemctl stop firewalld
[root@node03 ~]# systemctl disable firewalld.service

配置ssh免密码登录

hadoop通过SSH实现对各节点的管理,因此需要配置ssh免密码登录,,实现node01免密码登录到node02和node03。

[root@node01 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:UGcKoMkBmrZQXNzVTKoyOMFEWXPfY0LmZSZ7xfSwLrI root@node01
The key's randomart image is:
+---[RSA 2048]----+
|.+===oo.*+Bo+    |
|.*o+.o.B %o..+   |
|+.*   . B = . .  |
|o .o   o + o     |
| .o o . S . .    |
|   . o   o .     |
|        E        |
|                 |
|                 |
+----[SHA256]-----+
[root@node01 ~]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[root@node01 ~]# ssh node01
Last login: Mon Feb 24 12:50:34 2020
[root@node01 ~]# scp ~/.ssh/id_rsa.pub root@node02:~/
root@node02's password:
id_rsa.pub             100%  393   351.3KB/s   00:00

[root@node02 ~]$ mkdir ~/.ssh
[root@node02 ~]$ cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
[root@node02 ~]# rm -rf ~/id_rsa.pub

[root@node01 ~]# ssh node02
Last login: Mon Feb 24 15:49:50 2020 from node01
[root@node02 ~]#

[root@node01 ~]# scp ~/.ssh/id_rsa.pub root@node03:~/
root@node03's password:
id_rsa.pub             100%  393   351.3KB/s   00:00

[root@node03 ~]# mkdir ~/.ssh
[root@node03 ~]# cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
[root@node03 ~]# rm -rf ~/id_rsa.pub

[root@node01 ~]# ssh node03
Last login: Mon Feb 24 15:45:09 2020 from 192.168.92.1
[root@node03 ~]#

下载hadoop

下载hadoop官方二进制的版本,这里下载hadoop3.2.1版本,下载链接:http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz,下载完成后,我们选择解压安装至/usr/local/hadoop/ 中

[root@node01 ~]# mkdir -p /root/opt/module/hadoop/
[root@node01 ~]# cd opt/module/hadoop/
[root@node01 hadoop ~]# wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
[root@node01 hadoop ~] # tar -zxvf hadoop-3.2.1.tar.gz
[root@node01 hadoop ~] # cd hadoop-3.2.1
[root@node01 hadoop-3.2.1]# ll
总用量 180
drwxr-xr-x. 2 hadoop hadoop    203 9月  11 00:51 bin
drwxr-xr-x. 3 hadoop hadoop     20 9月  10 23:58 etc
drwxr-xr-x. 2 hadoop hadoop    106 9月  11 00:51 include
drwxr-xr-x. 3 hadoop hadoop     20 9月  11 00:51 lib
drwxr-xr-x. 4 hadoop hadoop    288 9月  11 00:51 libexec
-rw-rw-r--. 1 hadoop hadoop 150569 9月  10 22:35 LICENSE.txt
-rw-rw-r--. 1 hadoop hadoop  22125 9月  10 22:35 NOTICE.txt
-rw-rw-r--. 1 hadoop hadoop   1361 9月  10 22:35 README.txt
drwxr-xr-x. 3 hadoop hadoop   4096 9月  10 23:58 sbin
drwxr-xr-x. 4 hadoop hadoop     31 9月  11 01:11 share

修改core-site.xml

需要在/etc/hadoop/目录下,修改配置文件core-site.xml。在配置文件中添加了两项内容,一个是fs.defaultFS,它是指定HDFS的主节点,即node01的所在的centos7机器。另一个是hadoop.tmp.dir,用于指定Hadoop缓存数据的目录需要手工创建该目录:mkdir -p /root/opt/data/tep

[root@node01 hadoop-3.2.1]# cd etc/hadoop/
[root@node01 hadoop]# vim core-site.xml
#############
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://node01:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/root/opt/data/tep</value>
    </property>
    <property>
  		<name>dfs.http.address</name>
  		<value>0.0.0.0:50070</value>
	</property>
</configuration>

修改 hdfs-site.xml

配置HDFS上的数据块副本个数的参数,默认设置下,副本个数为3,设置的副本数量必须小于等于机器数量。

[root@node01 hadoop]# vim hdfs-site.xml
#############
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.permissions.enables</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>node01:50090</value>
    </property>
    <property>
    	<name>dfs.namenode.http-address</name>
    	<value>node01:9870</value>
    </property>
</configuration>

修改mapred-site.xml

为了确保MapReduce使用YARN来进行资源管理和调度,需要修改mapred-site.xml

[root@node01 hadoop]# vim mapred-site.xml
#############
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

把 hdfs 中文件副本的数量设置为 1,因为现在伪分布集群只有一 个节点

修改 yarn-site.xml

指定mapreduceshuffl,在hadooop3.X 配置Hadoop相关变量,并指定resourcemanager所在的主机名为node01,

[root@node01 hadoop]# vim yarn-site.xml
#############
<configuration>
	<property>
		<name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
    	<name>yarn.nodemanager.env-whitelist</name>	   
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CL ASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
    <property>
    	<name>yarn.resourcemanager.hostname</name>
    	<value>node01</value>
    </property>
</configuration>

 Hadoop环境变量配置

[root@node01 hadoop]# vim /etc/profile
############
export HADOOP_HOME=/root/opt/module/hadoop/hadoop-3.2.1/
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
[root@node01 hadoop]# source /etc/profile
[root@node01 hadoop]# hadoop version
Hadoop 3.2.1
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
Compiled by rohithsharmaks on 2019-09-10T15:56Z
Compiled with protoc 2.5.0
From source with checksum 776eaf9eee9c0ffc370bcbc1888737
This command was run using /root/opt/module/hadoop/hadoop-3.2.1/share/hadoop/common/hadoop-common-3.2.1.jar
[root@node01 hadoop]# vim hadoop-env.sh
############
export JAVA_HOME=/usr/local/java/jdk1.8.0_231
export HADOOP_LOG_DIR=/root/opt/data/tep

[root@node01 hadoop]# vim yarn-env.sh
############
export JAVA_HOME=/usr/local/java/jdk1.8.0_231

[root@node01 hadoop]# vim workers
############
node02
node03

[root@node01 hadoop] cd ../..
[root@node01 hadoop-3.2.1]# vim sbin/start-dfs.sh
############
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

[root@node01 hadoop-3.2.1]# vim sbin/stop-dfs.sh
############
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root


[root@node01 hadoop-3.2.1]# vim sbin/start-yarn.sh
############
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

[root@node01 hadoop-3.2.1]# vim sbin/stop-yarn.sh
############
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

搭建Zookeeper

ZooKeeper是一个分布式的,开放源码的分布式应用程序协调服务,大数据培训是Google的Chubby一个开源的实现,是Hadoop的重要组件。

下面搭建Zookeeper

[root@node01] mkdir -p opt/module/zookeeper
[root@node01] cd  opt/module/zookeeper
[root@node01 zookeeper]# wget http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.5.6/apache-zookeeper-3.5.6-bin.tar.gz
[root@node01 zookeeper]# tar -zxvf apache-zookeeper-3.5.6-bin.tar.gz
[root@node01 zookeeper]# cd apache-zookeeper-3.5.6-bin
[root@node01 apache-zookeeper-3.5.6-bin]# ll
总用量 32
drwxr-xr-x. 2 elsearch elsearch   232 10月  9 04:14 bin
drwxr-xr-x. 2 elsearch elsearch    70 2月  27 11:20 conf
drwxr-xr-x. 5 elsearch elsearch  4096 10月  9 04:15 docs
drwxr-xr-x. 2 root     root      4096 2月  27 11:02 lib
-rw-r--r--. 1 elsearch elsearch 11358 10月  5 19:27 LICENSE.txt
drwxr-xr-x. 2 root     root        46 2月  27 11:17 logs
-rw-r--r--. 1 elsearch elsearch   432 10月  9 04:14 NOTICE.txt
-rw-r--r--. 1 elsearch elsearch  1560 10月  9 04:14 README.md
-rw-r--r--. 1 elsearch elsearch  1347 10月  5 19:27 README_packaging.txt
drwxr-xr-x. 3 root     root        35 2月  27 11:30 zkdata
drwxr-xr-x. 3 root     root        23 2月  27 11:23 zklog
[root@node01 apache-zookeeper-3.5.6-bin]# pwd
/root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin
[root@node01 apache-zookeeper-3.5.6-bin]# mkdir zkdata
[root@node01 apache-zookeeper-3.5.6-bin]# mkdir zklog
[root@node01 apache-zookeeper-3.5.6-bin]# cd conf/
[root@node01 conf]# mv zoo_sample.cfg zoo.cfg
[root@node01 conf]# vim zoo.cfg
#############
dataDir=/root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin/zkdata
dataLogDir=/root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin/zklog
server.1=192.168.92.90:2888:3888
server.2=192.168.92.91:2888:3888
server.3=192.168.92.92:2888:3888
[root@node01 conf]# cd ../zkdata/
[root@node01 zkdata]# echo "1" >> myid
[root@node01 zkdata]# vim /etc/profile
##############
export ZOOKEEPER_HOME=/root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin/
export PATH=$PATH:$ZOOKEEPER_HOME/bin
[root@node01 zkdata] # source /etc/profile

9.1.10 启动hadoop和Zookeeper集群

启动hadoop集群前,我们先拷贝主节点node01到node02和node03,第一次开启Hadoop集群需要格式化HDFS

[root@node02 ]# mkdir -p /root/opt
[root@node03 ]# mkdir -p /root/opt

[root@node01 ]# #拷贝到两个从节点
[root@node01 ]# scp -rp opt/ root@node02:/root/opt/
[root@node01 ]# scp -rp opt/ root@node03:/root/opt/

[root@node01 ]# scp -rp /etc/profile node02:/etc/profile
[root@node01 ]# scp -rp /etc/profile node03:/etc/profile

[root@node02]# source /etc/profile
[root@node03]# source /etc/profile

[root@node01 ]# cd ../../
[root@node01 hadoop-3.2.1]# #格式化HDFS
[root@node01 hadoop-3.2.1]# bin/hdfs namenode -format
# 如果在后面的日志信息中能看到这一行,则说明 namenode 格式化成功。
2020-02-24 15:21:28,893 INFO common.Storage: Storage directory /root/opt/data/tep/dfs/name has been successfully formatted.
# 启动hadoop
[root@node01 hadoop-3.2.1]# sbin/start-all.sh

我们可以在三台机器分别输入jps命令,来判断集群是否启动成功,如果看到以下服务则启动成功。在node01节点上可以看到NameNodeResourceManagerSecondrryNameNodejps 进程

[root@node01 ~]# jps
3601 ResourceManager
3346 SecondaryNameNode
4074 Jps
3069 NameNode
[root@node02 ~]# jps
3473 Jps
3234 NodeManager
3114 DataNode
[root@node03 ~]# jps
3031 NodeManager
3256 Jps
2909 DataNode

这时,我们可以访问http://192.168.92.90:9870或者http://node01:9870,Hadoop页面如下图9-1所示

图片

Hadoop页面9-1

hadoop2.x向hadoop3.x进化过程中,网页访问端口也进行了更改由50070端口更改为9870端口

下面是Hadoop3.x版本改变的端口号:

类别应用Hadoop2.xHadoop3.x
NameNodePortsNameNode80209820
NameNodePortsNameNode  HTTP UI500709870
NameNodePortsNameNode HTTPS UI504709871
SecondaryNameNode portsSecondaryNameNode HTTP500919869
SecondaryNameNode portsSecondaryNameNode HTTP UI500909868
DataNode portsDataNode IPC500209867
DataNode portsDataNode500109866
DataNode portsDataNode HTTP UI500759864
DataNode portsNameNode504759865

我们可以访问:http://192.168.92.90:8088/,查看集群状态,如下图9-2所示

图片

hadoop集群状态9-2

如果我们想停止集群,执行命令sbin/stop-all.sh

# 停止hadoop
[root@node01 hadoop-3.2.1]# sbin/stop-all.sh

至此,Hadoop分布式集群搭建成功。

下面启动Zookeeper集群,首先修改分发的myid文件,再分别启动Zookeeper。

[root@node02 apache-zookeeper-3.5.6-bin]# vim zkdata/myid
2
[root@node03 apache-zookeeper-3.5.6-bin]# vim zkdata/myid
3

#分别启动Zookeeper
[root@node01 apache-zookeeper-3.5.6-bin]# bin/zkServer.sh start
[root@node02 apache-zookeeper-3.5.6-bin]# bin/zkServer.sh start
[root@node03 apache-zookeeper-3.5.6-bin]# bin/zkServer.sh start

# 查看Zookeeper启动出现的节点QuorumPeerMain
[root@node01]# jps
16962 Jps
16005 NameNode
16534 ResourceManager
16903 QuorumPeerMain
16282 SecondaryNameNode

[root@node02]# jps
8402 Jps
8037 NodeManager
7914 DataNode
8202 QuorumPeerMain

#查看zookeeper选举状态
[root@node01 apache-zookeeper-3.5.6-bin]# bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: follower
[root@node02 apache-zookeeper-3.5.6-bin]# bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: leader
[root@node03 apache-zookeeper-3.5.6-bin]# bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: follower

至此,Zookeeper集群搭建成功。在zookeeper集群中,node02为leader,node01和node03为follower

下面使用客户端命令简单操作zookeeper

[root@node02 apache-zookeeper-3.5.6-bin]# bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 0]# ls查看当前 ZooKeeper 中所包含的内容
[zk: localhost:2181(CONNECTED) 0] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 1] # ls2查看当前节点详细数据
[zk: localhost:2181(CONNECTED) 1] ls2 /
[zookeeper]
cZxid = 0x0
ctime = Thu Jan 01 08:00:00 CST 1970
mZxid = 0x0
mtime = Thu Jan 01 08:00:00 CST 1970
pZxid = 0x0
cversion = -1
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 1
[zk: localhost:2181(CONNECTED) 2] # create创建节点
[zk: localhost:2181(CONNECTED) 2] create /zk myData
Created /zk
[zk: localhost:2181(CONNECTED) 3] # get获得节点的值
[zk: localhost:2181(CONNECTED) 3] get /zk
myData
[zk: localhost:2181(CONNECTED) 4] # set修改节点的值
[zk: localhost:2181(CONNECTED) 4] set /zk myData1
[zk: localhost:2181(CONNECTED) 5] get /zk
myData1
[zk: localhost:2181(CONNECTED) 6] # create创建子节点
[zk: localhost:2181(CONNECTED) 6] create /zk/zk01 myData2
Created /zk/zk01
[zk: localhost:2181(CONNECTED) 7] # stat检查状态
[zk: localhost:2181(CONNECTED) 7] stat /zk
cZxid = 0x100000008
ctime = Thu Feb 27 12:39:43 CST 2020
mZxid = 0x100000009
mtime = Thu Feb 27 12:42:37 CST 2020
pZxid = 0x10000000b
cversion = 1
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 7
numChildren = 1
[zk: localhost:2181(CONNECTED) 8] # rmr移除节点
[zk: localhost:2181(CONNECTED) 8] rmr /zk

至此,我们对zookeeper就算有了一个入门的了解,当然zookeeper远比我们这里描述的功能多,比如用zookeeper实现集群管理,分布式锁,分布式队列,zookeeper集群leader选举,Java API编程等。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值