修改hosts文件
在上章中 CentOS 7已经配置了Java环境,采用搭建elasticsearch集群的三台 Linux CentOS 7机器,搭建三节点 Hadoop分布式集群,其中node01作为Master,node2和node3作为slaves。参考:http://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/ClusterSetup.html
NodeName | IP地址 |
---|---|
node01 | 192.168.92.90 |
node02 | 192.168.92.91 |
node03 | 192.168.92.92 |
永久设置主机名,需要修改hosts文件,设置虚拟机的ip
和主机名的映射关系,并关闭防火墙
[root@node01 ~]# vim /etc/sysconfig/network
#########
HOSTNAME=node01
[root@node01 ~]# vim /etc/hosts
#########
192.168.92.90 node01
192.168.92.91 node02
192.168.92.92 node03
[root@node01 ~]# systemctl stop firewalld
[root@node01 ~]# systemctl disable firewalld.service
[root@node02 ~]# vim /etc/sysconfig/network
#########
HOSTNAME=node02
[root@node02 ~]# vim /etc/hosts
#########
192.168.92.90 node01
192.168.92.91 node02
192.168.92.92 node03
[root@node02 ~]# systemctl stop firewalld
[root@node02 ~]# systemctl disable firewalld.service
[root@node03 ~]# vim /etc/sysconfig/network
#########
HOSTNAME=node03
[root@node03 ~]# vim /etc/hosts
#########
192.168.92.90 node01
192.168.92.91 node02
192.168.92.92 node03
[root@node03 ~]# systemctl stop firewalld
[root@node03 ~]# systemctl disable firewalld.service
配置ssh免密码登录
hadoop通过SSH实现对各节点的管理,因此需要配置ssh免密码登录,,实现node01免密码登录到node02和node03。
[root@node01 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:UGcKoMkBmrZQXNzVTKoyOMFEWXPfY0LmZSZ7xfSwLrI root@node01
The key's randomart image is:
+---[RSA 2048]----+
|.+===oo.*+Bo+ |
|.*o+.o.B %o..+ |
|+.* . B = . . |
|o .o o + o |
| .o o . S . . |
| . o o . |
| E |
| |
| |
+----[SHA256]-----+
[root@node01 ~]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[root@node01 ~]# ssh node01
Last login: Mon Feb 24 12:50:34 2020
[root@node01 ~]# scp ~/.ssh/id_rsa.pub root@node02:~/
root@node02's password:
id_rsa.pub 100% 393 351.3KB/s 00:00
[root@node02 ~]$ mkdir ~/.ssh
[root@node02 ~]$ cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
[root@node02 ~]# rm -rf ~/id_rsa.pub
[root@node01 ~]# ssh node02
Last login: Mon Feb 24 15:49:50 2020 from node01
[root@node02 ~]#
[root@node01 ~]# scp ~/.ssh/id_rsa.pub root@node03:~/
root@node03's password:
id_rsa.pub 100% 393 351.3KB/s 00:00
[root@node03 ~]# mkdir ~/.ssh
[root@node03 ~]# cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
[root@node03 ~]# rm -rf ~/id_rsa.pub
[root@node01 ~]# ssh node03
Last login: Mon Feb 24 15:45:09 2020 from 192.168.92.1
[root@node03 ~]#
下载hadoop
下载hadoop官方二进制的版本,这里下载hadoop3.2.1版本,下载链接:http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz,下载完成后,我们选择解压安装至/usr/local/hadoop/
中
[root@node01 ~]# mkdir -p /root/opt/module/hadoop/
[root@node01 ~]# cd opt/module/hadoop/
[root@node01 hadoop ~]# wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
[root@node01 hadoop ~] # tar -zxvf hadoop-3.2.1.tar.gz
[root@node01 hadoop ~] # cd hadoop-3.2.1
[root@node01 hadoop-3.2.1]# ll
总用量 180
drwxr-xr-x. 2 hadoop hadoop 203 9月 11 00:51 bin
drwxr-xr-x. 3 hadoop hadoop 20 9月 10 23:58 etc
drwxr-xr-x. 2 hadoop hadoop 106 9月 11 00:51 include
drwxr-xr-x. 3 hadoop hadoop 20 9月 11 00:51 lib
drwxr-xr-x. 4 hadoop hadoop 288 9月 11 00:51 libexec
-rw-rw-r--. 1 hadoop hadoop 150569 9月 10 22:35 LICENSE.txt
-rw-rw-r--. 1 hadoop hadoop 22125 9月 10 22:35 NOTICE.txt
-rw-rw-r--. 1 hadoop hadoop 1361 9月 10 22:35 README.txt
drwxr-xr-x. 3 hadoop hadoop 4096 9月 10 23:58 sbin
drwxr-xr-x. 4 hadoop hadoop 31 9月 11 01:11 share
修改core-site.xml
需要在/etc/hadoop/
目录下,修改配置文件core-site.xml。在配置文件中添加了两项内容,一个是fs.defaultFS
,它是指定HDFS的主节点,即node01的所在的centos7机器。另一个是hadoop.tmp.dir
,用于指定Hadoop缓存数据的目录需要手工创建该目录:mkdir -p /root/opt/data/tep
。
[root@node01 hadoop-3.2.1]# cd etc/hadoop/
[root@node01 hadoop]# vim core-site.xml
#############
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://node01:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/root/opt/data/tep</value>
</property>
<property>
<name>dfs.http.address</name>
<value>0.0.0.0:50070</value>
</property>
</configuration>
修改 hdfs-site.xml
配置HDFS上的数据块副本个数的参数,默认设置下,副本个数为3,设置的副本数量必须小于等于机器数量。
[root@node01 hadoop]# vim hdfs-site.xml
#############
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions.enables</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node01:50090</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>node01:9870</value>
</property>
</configuration>
修改mapred-site.xml
为了确保MapReduce使用YARN来进行资源管理和调度,需要修改mapred-site.xml
[root@node01 hadoop]# vim mapred-site.xml
#############
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
把 hdfs 中文件副本的数量设置为 1,因为现在伪分布集群只有一 个节点
修改 yarn-site.xml
指定mapreduce
的shuffl
,在hadooop3.X 配置Hadoop相关变量,并指定resourcemanager
所在的主机名为node01,
[root@node01 hadoop]# vim yarn-site.xml
#############
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CL ASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node01</value>
</property>
</configuration>
Hadoop环境变量配置
[root@node01 hadoop]# vim /etc/profile
############
export HADOOP_HOME=/root/opt/module/hadoop/hadoop-3.2.1/
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
[root@node01 hadoop]# source /etc/profile
[root@node01 hadoop]# hadoop version
Hadoop 3.2.1
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
Compiled by rohithsharmaks on 2019-09-10T15:56Z
Compiled with protoc 2.5.0
From source with checksum 776eaf9eee9c0ffc370bcbc1888737
This command was run using /root/opt/module/hadoop/hadoop-3.2.1/share/hadoop/common/hadoop-common-3.2.1.jar
[root@node01 hadoop]# vim hadoop-env.sh
############
export JAVA_HOME=/usr/local/java/jdk1.8.0_231
export HADOOP_LOG_DIR=/root/opt/data/tep
[root@node01 hadoop]# vim yarn-env.sh
############
export JAVA_HOME=/usr/local/java/jdk1.8.0_231
[root@node01 hadoop]# vim workers
############
node02
node03
[root@node01 hadoop] cd ../..
[root@node01 hadoop-3.2.1]# vim sbin/start-dfs.sh
############
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
[root@node01 hadoop-3.2.1]# vim sbin/stop-dfs.sh
############
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
[root@node01 hadoop-3.2.1]# vim sbin/start-yarn.sh
############
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
[root@node01 hadoop-3.2.1]# vim sbin/stop-yarn.sh
############
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
搭建Zookeeper
ZooKeeper是一个分布式的,开放源码的分布式应用程序协调服务,大数据培训是Google的Chubby一个开源的实现,是Hadoop的重要组件。
下面搭建Zookeeper
[root@node01] mkdir -p opt/module/zookeeper
[root@node01] cd opt/module/zookeeper
[root@node01 zookeeper]# wget http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.5.6/apache-zookeeper-3.5.6-bin.tar.gz
[root@node01 zookeeper]# tar -zxvf apache-zookeeper-3.5.6-bin.tar.gz
[root@node01 zookeeper]# cd apache-zookeeper-3.5.6-bin
[root@node01 apache-zookeeper-3.5.6-bin]# ll
总用量 32
drwxr-xr-x. 2 elsearch elsearch 232 10月 9 04:14 bin
drwxr-xr-x. 2 elsearch elsearch 70 2月 27 11:20 conf
drwxr-xr-x. 5 elsearch elsearch 4096 10月 9 04:15 docs
drwxr-xr-x. 2 root root 4096 2月 27 11:02 lib
-rw-r--r--. 1 elsearch elsearch 11358 10月 5 19:27 LICENSE.txt
drwxr-xr-x. 2 root root 46 2月 27 11:17 logs
-rw-r--r--. 1 elsearch elsearch 432 10月 9 04:14 NOTICE.txt
-rw-r--r--. 1 elsearch elsearch 1560 10月 9 04:14 README.md
-rw-r--r--. 1 elsearch elsearch 1347 10月 5 19:27 README_packaging.txt
drwxr-xr-x. 3 root root 35 2月 27 11:30 zkdata
drwxr-xr-x. 3 root root 23 2月 27 11:23 zklog
[root@node01 apache-zookeeper-3.5.6-bin]# pwd
/root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin
[root@node01 apache-zookeeper-3.5.6-bin]# mkdir zkdata
[root@node01 apache-zookeeper-3.5.6-bin]# mkdir zklog
[root@node01 apache-zookeeper-3.5.6-bin]# cd conf/
[root@node01 conf]# mv zoo_sample.cfg zoo.cfg
[root@node01 conf]# vim zoo.cfg
#############
dataDir=/root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin/zkdata
dataLogDir=/root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin/zklog
server.1=192.168.92.90:2888:3888
server.2=192.168.92.91:2888:3888
server.3=192.168.92.92:2888:3888
[root@node01 conf]# cd ../zkdata/
[root@node01 zkdata]# echo "1" >> myid
[root@node01 zkdata]# vim /etc/profile
##############
export ZOOKEEPER_HOME=/root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin/
export PATH=$PATH:$ZOOKEEPER_HOME/bin
[root@node01 zkdata] # source /etc/profile
9.1.10 启动hadoop和Zookeeper集群
启动hadoop集群前,我们先拷贝主节点node01到node02和node03,第一次开启Hadoop集群需要格式化HDFS
[root@node02 ]# mkdir -p /root/opt
[root@node03 ]# mkdir -p /root/opt
[root@node01 ]# #拷贝到两个从节点
[root@node01 ]# scp -rp opt/ root@node02:/root/opt/
[root@node01 ]# scp -rp opt/ root@node03:/root/opt/
[root@node01 ]# scp -rp /etc/profile node02:/etc/profile
[root@node01 ]# scp -rp /etc/profile node03:/etc/profile
[root@node02]# source /etc/profile
[root@node03]# source /etc/profile
[root@node01 ]# cd ../../
[root@node01 hadoop-3.2.1]# #格式化HDFS
[root@node01 hadoop-3.2.1]# bin/hdfs namenode -format
# 如果在后面的日志信息中能看到这一行,则说明 namenode 格式化成功。
2020-02-24 15:21:28,893 INFO common.Storage: Storage directory /root/opt/data/tep/dfs/name has been successfully formatted.
# 启动hadoop
[root@node01 hadoop-3.2.1]# sbin/start-all.sh
我们可以在三台机器分别输入jps命令,来判断集群是否启动成功,如果看到以下服务则启动成功。在node01节点上可以看到NameNode
、ResourceManager
、SecondrryNameNode
和jps
进程
[root@node01 ~]# jps
3601 ResourceManager
3346 SecondaryNameNode
4074 Jps
3069 NameNode
[root@node02 ~]# jps
3473 Jps
3234 NodeManager
3114 DataNode
[root@node03 ~]# jps
3031 NodeManager
3256 Jps
2909 DataNode
这时,我们可以访问http://192.168.92.90:9870或者http://node01:9870,Hadoop页面如下图9-1所示
Hadoop页面9-1
hadoop2.x向hadoop3.x进化过程中,网页访问端口也进行了更改由50070端口更改为9870端口
下面是Hadoop3.x版本改变的端口号:
类别 | 应用 | Hadoop2.x | Hadoop3.x |
---|---|---|---|
NameNodePorts | NameNode | 8020 | 9820 |
NameNodePorts | NameNode HTTP UI | 50070 | 9870 |
NameNodePorts | NameNode HTTPS UI | 50470 | 9871 |
SecondaryNameNode ports | SecondaryNameNode HTTP | 50091 | 9869 |
SecondaryNameNode ports | SecondaryNameNode HTTP UI | 50090 | 9868 |
DataNode ports | DataNode IPC | 50020 | 9867 |
DataNode ports | DataNode | 50010 | 9866 |
DataNode ports | DataNode HTTP UI | 50075 | 9864 |
DataNode ports | NameNode | 50475 | 9865 |
我们可以访问:http://192.168.92.90:8088/,查看集群状态,如下图9-2所示
hadoop集群状态9-2
如果我们想停止集群,执行命令sbin/stop-all.sh
# 停止hadoop
[root@node01 hadoop-3.2.1]# sbin/stop-all.sh
至此,Hadoop分布式集群搭建成功。
下面启动Zookeeper集群,首先修改分发的myid文件,再分别启动Zookeeper。
[root@node02 apache-zookeeper-3.5.6-bin]# vim zkdata/myid
2
[root@node03 apache-zookeeper-3.5.6-bin]# vim zkdata/myid
3
#分别启动Zookeeper
[root@node01 apache-zookeeper-3.5.6-bin]# bin/zkServer.sh start
[root@node02 apache-zookeeper-3.5.6-bin]# bin/zkServer.sh start
[root@node03 apache-zookeeper-3.5.6-bin]# bin/zkServer.sh start
# 查看Zookeeper启动出现的节点QuorumPeerMain
[root@node01]# jps
16962 Jps
16005 NameNode
16534 ResourceManager
16903 QuorumPeerMain
16282 SecondaryNameNode
[root@node02]# jps
8402 Jps
8037 NodeManager
7914 DataNode
8202 QuorumPeerMain
#查看zookeeper选举状态
[root@node01 apache-zookeeper-3.5.6-bin]# bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: follower
[root@node02 apache-zookeeper-3.5.6-bin]# bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: leader
[root@node03 apache-zookeeper-3.5.6-bin]# bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: follower
至此,Zookeeper集群搭建成功。在zookeeper集群中,node02为leader,node01和node03为follower
下面使用客户端命令简单操作zookeeper
[root@node02 apache-zookeeper-3.5.6-bin]# bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 0]# ls查看当前 ZooKeeper 中所包含的内容
[zk: localhost:2181(CONNECTED) 0] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 1] # ls2查看当前节点详细数据
[zk: localhost:2181(CONNECTED) 1] ls2 /
[zookeeper]
cZxid = 0x0
ctime = Thu Jan 01 08:00:00 CST 1970
mZxid = 0x0
mtime = Thu Jan 01 08:00:00 CST 1970
pZxid = 0x0
cversion = -1
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 1
[zk: localhost:2181(CONNECTED) 2] # create创建节点
[zk: localhost:2181(CONNECTED) 2] create /zk myData
Created /zk
[zk: localhost:2181(CONNECTED) 3] # get获得节点的值
[zk: localhost:2181(CONNECTED) 3] get /zk
myData
[zk: localhost:2181(CONNECTED) 4] # set修改节点的值
[zk: localhost:2181(CONNECTED) 4] set /zk myData1
[zk: localhost:2181(CONNECTED) 5] get /zk
myData1
[zk: localhost:2181(CONNECTED) 6] # create创建子节点
[zk: localhost:2181(CONNECTED) 6] create /zk/zk01 myData2
Created /zk/zk01
[zk: localhost:2181(CONNECTED) 7] # stat检查状态
[zk: localhost:2181(CONNECTED) 7] stat /zk
cZxid = 0x100000008
ctime = Thu Feb 27 12:39:43 CST 2020
mZxid = 0x100000009
mtime = Thu Feb 27 12:42:37 CST 2020
pZxid = 0x10000000b
cversion = 1
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 7
numChildren = 1
[zk: localhost:2181(CONNECTED) 8] # rmr移除节点
[zk: localhost:2181(CONNECTED) 8] rmr /zk
至此,我们对zookeeper就算有了一个入门的了解,当然zookeeper远比我们这里描述的功能多,比如用zookeeper实现集群管理,分布式锁,分布式队列,zookeeper集群leader选举,Java API编程等。