hadoop完全分布式+zookeeper集群+NameNode HA+ yarn HA

hadoop完全分布式+zookeeper集群+NameNode HA+ yarn HA

1、centos7基础环境

系统IP主机名用户名密码
centos 710.1.1.101masterrootpassword
centos 710.1.1.102slave1rootpassword
centos 710.1.1.103slave2rootpassword
组件版本Linux版本下载地址
java8https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html
hadoop2.7.1https://archive.apache.org/dist/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
zookeeper3.4.8https://archive.apache.org/dist/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz

1.1、关闭防火墙

  • master

    [root@master ~]# systemctl stop firewalld 
    [root@master ~]# systemctl disable firewalld 
    Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
    Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.
    
  • slave1

    [root@slave1 ~]# systemctl stop firewalld 
    [root@slave1 ~]# systemctl disable firewalld 
    Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
    Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.
    
  • slave2

    [root@slave2 ~]# systemctl stop firewalld 
    [root@slave2 ~]# systemctl disable firewalld 
    Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
    Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.
    

1.2、配置hosts文件

  • master

    [root@master ~]# vi /etc/hosts
    10.1.1.101 master
    10.1.1.102 slave1
    10.1.1.103 slave2
    [root@master ~]# scp /etc/hosts slave1:/etc/
    [root@master ~]# scp /etc/hosts slave2:/etc/
    

1.3、 配置SSH

  1. 三台主机生成密钥文件

    • master

      [root@master ~]# ssh-keygen -t rsa -P ''
      Generating public/private rsa key pair.
      Enter file in which to save the key (/root/.ssh/id_rsa): 
      Created directory '/root/.ssh'.
      Your identification has been saved in /root/.ssh/id_rsa.
      Your public key has been saved in /root/.ssh/id_rsa.pub.
      The key fingerprint is:
      94:ad:26:3c:7d:2f:1e:88:be:43:2c:39:50:f0:91:84 root@master
      The key's randomart image is:
      +--[ RSA 2048]----+
      |  .+o.           |
      |  E.o.   o       |
      |   ..   o .      |
      |  .  . o .       |
      |   . o+ S .      |
      |    + o= o .     |
      |     +. . o .    |
      |     ..  . o     |
      |      oo  .      |
      +-----------------+
      
    • slave1

      [root@slave1 ~]# ssh-keygen -t rsa -P ''
      Generating public/private rsa key pair.
      Enter file in which to save the key (/root/.ssh/id_rsa): 
      Created directory '/root/.ssh'.
      Your identification has been saved in /root/.ssh/id_rsa.
      Your public key has been saved in /root/.ssh/id_rsa.pub.
      The key fingerprint is:
      14:36:0b:e8:e3:59:76:f0:26:1d:a8:e7:53:f2:59:27 root@slave1
      The key's randomart image is:
      +--[ RSA 2048]----+
      |     ...+        |
      |    . oo.+       |
      |   . . +o.       |
      |    + *.* E .    |
      |   . B BSo o     |
      |    o o o        |
      |       .         |
      |                 |
      |                 |
      +-----------------+
      
    • slave2

      [root@slave2 ~]# ssh-keygen -t rsa -P ''
      Generating public/private rsa key pair.
      Enter file in which to save the key (/root/.ssh/id_rsa): 
      Created directory '/root/.ssh'.
      Your identification has been saved in /root/.ssh/id_rsa.
      Your public key has been saved in /root/.ssh/id_rsa.pub.
      The key fingerprint is:
      ba:8f:b0:86:a3:c9:70:50:cc:bf:db:16:3c:7e:49:65 root@slave2
      The key's randomart image is:
      +--[ RSA 2048]----+
      |                 |
      | o               |
      |  +              |
      | . .      E      |
      |.   ..  So       |
      | .   .+..        |
      |. ..o..+ .       |
      |ooo .=ooo        |
      |oo oo.+o.        |
      +-----------------+
      
  2. master分发公钥

    • master

      [root@master ~]# ssh-copy-id -i master
      [root@master ~]# ssh-copy-id -i slave1
      [root@master ~]# ssh-copy-id -i slave2
      

1.4、 配置NTP服务

本次实验需要自行配置yum仓库

主机名IPNTP服务用户名密码
master10.1.1.101Serverrootpassword
slave110.1.1.102Clientrootpassword
slave210.1.1.103Clientrootpassword
  1. yum在线安装NTP服务

    • master
    [root@master ~]# yum install -y vim ntp
    [root@master ~]# systemctl start ntpd
    [root@master ~]# systemctl enable ntpd
    Created symlink from /etc/systemd/system/multi-user.target.wants/ntpd.service to /usr/lib/systemd/system/ntpd.service.
    [root@master ~]# systemctl status ntpd
    ● ntpd.service - Network Time Service
       Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
       Active: active (running) since Sun 2021-06-20 09:14:48 CST; 1 day 9h ago
     Main PID: 12909 (ntpd)
       CGroup: /system.slice/ntpd.service
               └─12909 /usr/sbin/ntpd -u ntp:ntp -g
    
    Jun 20 09:14:48 master ntpd[12909]: Listen normally on 3 eno16777736 10.1.1.101 UDP 123
    Jun 20 09:14:48 master ntpd[12909]: Listen normally on 4 lo ::1 UDP 123
    Jun 20 09:14:48 master ntpd[12909]: Listen normally on 5 eno16777736 fe80::20c:29ff:fe5a:51a2 UDP 123
    Jun 20 09:14:48 master ntpd[12909]: Listening on routing socket on fd #22 for interface updates
    Jun 20 09:14:48 master ntpd[12909]: 0.0.0.0 c016 06 restart
    Jun 20 09:14:48 master ntpd[12909]: 0.0.0.0 c012 02 freq_set kernel 0.000 PPM
    Jun 20 09:14:48 master ntpd[12909]: 0.0.0.0 c011 01 freq_not_set
    Jun 20 09:14:55 master ntpd[12909]: 0.0.0.0 c61c 0c clock_step +120960.270610 s
    Jun 21 18:50:56 master ntpd[12909]: 0.0.0.0 c614 04 freq_mode
    Jun 21 18:50:57 master ntpd[12909]: 0.0.0.0 c618 08 no_sys_peer
    
    • slave1

      [root@slave1 ~]# yum install -y vim ntp
      [root@slave1 ~]# systemctl start ntpd
      [root@slave1 ~]# systemctl enable ntpd
      Created symlink from /etc/systemd/system/multi-user.target.wants/ntpd.service to /usr/lib/systemd/system/ntpd.service.
      [root@slave1 ~]# systemctl status  ntpd
      ● ntpd.service - Network Time Service
         Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
         Active: active (running) since Sun 2021-06-20 04:47:27 CST; 1 day 14h ago
       Main PID: 11069 (ntpd)
         CGroup: /system.slice/ntpd.service
                 └─11069 /usr/sbin/ntpd -u ntp:ntp -g
      
      Jun 20 04:47:27 slave1 ntpd[11069]: Listen normally on 3 eno16777736 10.1.1.102 UDP 123
      Jun 20 04:47:27 slave1 ntpd[11069]: Listen normally on 4 lo ::1 UDP 123
      Jun 20 04:47:27 slave1 ntpd[11069]: Listen normally on 5 eno16777736 fe80::20c:29ff:fec8:8c15 UDP 123
      Jun 20 04:47:27 slave1 ntpd[11069]: Listening on routing socket on fd #22 for interface updates
      Jun 20 04:47:27 slave1 ntpd[11069]: 0.0.0.0 c016 06 restart
      Jun 20 04:47:27 slave1 ntpd[11069]: 0.0.0.0 c012 02 freq_set kernel 0.000 PPM
      Jun 20 04:47:27 slave1 ntpd[11069]: 0.0.0.0 c011 01 freq_not_set
      Jun 20 04:47:34 slave1 ntpd[11069]: 0.0.0.0 c61c 0c clock_step +137292.879993 s
      Jun 21 18:55:47 slave1 ntpd[11069]: 0.0.0.0 c614 04 freq_mode
      Jun 21 18:55:48 slave1 ntpd[11069]: 0.0.0.0 c618 08 no_sys_peer
      
    • slave2

      [root@slave2 ~]# yum install -y vim ntp
      [root@slave2 ~]# systemctl start ntpd
      [root@slave2 ~]# systemctl enable ntpd
      Created symlink from /etc/systemd/system/multi-user.target.wants/ntpd.service to /usr/lib/systemd/system/ntpd.service.
      [root@slave2 ~]# systemctl status  ntpd
      ● ntpd.service - Network Time Service
         Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
         Active: active (running) since Sun 2021-06-20 02:21:35 CST; 1 day 16h ago
       Main PID: 10686 (ntpd)
         CGroup: /system.slice/ntpd.service
                 └─10686 /usr/sbin/ntpd -u ntp:ntp -g
      
      Jun 20 02:21:35 slave2 ntpd[10686]: Listen normally on 3 eno16777736 10.1.1.103 UDP 123
      Jun 20 02:21:35 slave2 ntpd[10686]: Listen normally on 4 lo ::1 UDP 123
      Jun 20 02:21:35 slave2 ntpd[10686]: Listen normally on 5 eno16777736 fe80::20c:29ff:fe43:d407 UDP 123
      Jun 20 02:21:35 slave2 ntpd[10686]: Listening on routing socket on fd #22 for interface updates
      Jun 20 02:21:35 slave2 ntpd[10686]: 0.0.0.0 c016 06 restart
      Jun 20 02:21:35 slave2 ntpd[10686]: 0.0.0.0 c012 02 freq_set kernel 0.000 PPM
      Jun 20 02:21:35 slave2 ntpd[10686]: 0.0.0.0 c011 01 freq_not_set
      Jun 20 02:21:44 slave2 ntpd[10686]: 0.0.0.0 c61c 0c clock_step +146100.650069 s
      Jun 21 18:56:44 slave2 ntpd[10686]: 0.0.0.0 c614 04 freq_mode
      Jun 21 18:56:45 slave2 ntpd[10686]: 0.0.0.0 c618 08 no_sys_peer
      
  2. 配置服务端

    • master

      [root@master ~]# echo "driftfile /var/lib/ntp/drift" > /etc/ntp.conf 
      [root@master ~]# echo "restrict default nomodify notrap nopeer noquery" >> /etc/ntp.conf 
      [root@master ~]# echo "restrict 127.0.0.1" >> /etc/ntp.conf 
      [root@master ~]# echo "restrict ::1" >> /etc/ntp.conf 
      [root@master ~]# echo "server 127.127.1.0" >> /etc/ntp.conf 
      [root@master ~]# echo "Fudge 127.127.1.0 stratum 10" >> /etc/ntp.conf 
      [root@master ~]# echo "includefile /etc/ntp/crypto/pw" >> /etc/ntp.conf 
      [root@master ~]# echo "keys /etc/ntp/keys" >> /etc/ntp.conf 
      [root@master ~]# echo "disable monitor" >> /etc/ntp.conf
      
  3. 配置客户端

    • slave1

      [root@slave1 ~]# echo "driftfile /var/lib/ntp/drift" > /etc/ntp.conf 
      [root@slave1 ~]# echo "server master" >> /etc/ntp.conf 
      [root@slave1 ~]# echo "driftfile /var/lib/ntp/drift" >> /etc/ntp.conf 
      [root@slave1 ~]# echo "server master" >>  /etc/ntp.conf 
      [root@slave1 ~]# echo "Fudge master stratum 10" >>  /etc/ntp.conf 
      [root@slave1 ~]# echo "includefile /etc/ntp/crypto/pw" >>  /etc/ntp.conf 
      [root@slave1 ~]# echo "keys /etc/ntp/keys" >>  /etc/ntp.conf 
      [root@slave1 ~]# echo "disable monitor" >>  /etc/ntp.conf
      [root@slave1 ~]# scp /etc/ntp.conf slave2:/etc/
      
  4. 客户端配置后台时间同步脚本

    • slave1

      [root@slave1 ~]# echo "*/1 * * * * /usr/sbin/ntpdate -u master > /dev/null 2 >& 1" >  /var/spool/cron/update.cron
      [root@slave1 ~]# crontab /var/spool/cron/update.cron
      [root@slave1 ~]# systemctl restart crond
      [root@slave1 ~]# systemctl enable crond 
      
    • slave2

      [root@slave2 ~]# echo "*/1 * * * * /usr/sbin/ntpdate -u master > /dev/null 2 >& 1" >  /var/spool/cron/update.cron
      [root@slave2 ~]# crontab /var/spool/cron/update.cron
      [root@slave2 ~]# systemctl restart crond
      [root@slave2 ~]# systemctl enable crond 
      
  5. 三台机器重启ntp服务

    [root@master ~]# systemctl restart ntpd
    [root@slave1 ~]# systemctl restart ntpd
    [root@slave2 ~]# systemctl restart ntpd
    

2、安装java

3.1、卸载openjdk

[root@master ~]# rpm -qa |grep openjdk
通过rpm -e --nodeps "查询出来的rpm包" 去卸载

3.2、安装java

  1. 将/h3cu下面的java安装到/usr/local/src下

    [root@master ~]# tar -xzvf /h3cu/jdk-8u144-linux-x64.tar.gz -C /usr/local/src/
    
  2. 将解压后的java文件重命名为java

    [root@master ~]# mv /usr/local/src/jdk1.8.0_144 /usr/local/src/java
    
  3. 配置java环境变量,仅使当前用户生效

    [root@master ~]# vi /root/.bash_profile
    export JAVA_HOME=/usr/local/src/java
    export PATH=$PATH:$JAVA_HOME/bin
    
  4. 加载环境变量,查看java的版本信息

    [root@master ~]# source /root/.bash_profile 
    [root@master ~]# java -version 
    java version "1.8.0_144"
    Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
    Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
    
  5. 将java分发给slave1和slave2

    [root@master ~]# scp -r /usr/local/src/java slave1:/usr/local/src/
    [root@master ~]# scp -r /usr/local/src/java slave2:/usr/local/src/
    [root@master ~]# scp /root/.bash_profile slave1:/root/
    [root@master ~]# scp /root/.bash_profile slave2:/root/
    

3、安装zookeeper集群

将/h3cu下的zookeeper解压到/usr/local/src

[root@master ~]# tar -xvzf /h3cu/zookeeper-3.4.8.tar.gz -C /usr/local/src/
  1. 将解压后文件重命名为zookeeper

    [root@master ~]# mv /usr/local/src/zookeeper-3.4.8 /usr/local/src/zookeeper
    
  2. 配置zookeeper环境变量,加载环境变量,仅对当前用户生效

    [root@master ~]# vi /root/.bash_profile 
    export ZOOKEEPER_HOME=/usr/local/src/zookeeper
    export PATH=$PATH:$ZOOKEEPER_HOME/bin
    [root@master ~]# source /root/.bash_profile 
    
  3. 配置zoo.cfg配置文件

    dataDir进行修改,server三行写入进去

    [root@master ~]# cp /usr/local/src/zookeeper/conf/zoo_sample.cfg /usr/local/src/zookeeper/conf/zoo.cfg
    [root@master ~]# vi /usr/local/src/zookeeper/conf/zoo.cfg 
    dataDir=/usr/local/src/zookeeper/data
    server.1=master:2888:3888
    server.2=slave1:2888:3888
    server.3=slave2:2888:3888
    
  4. 配置myid文件

    [root@master ~]# mkdir /usr/local/src/zookeeper/data
    [root@master ~]# echo "1" > /usr/local/src/zookeeper/data/myid
    
  5. 将文件分发给slave1和slave2

    [root@master ~]# scp -r /usr/local/src/zookeeper slave1:/usr/local/src/ 
    [root@master ~]# scp -r /usr/local/src/zookeeper slave2:/usr/local/src/
    [root@master ~]# scp /root/.bash_profile slave1:/root/
    [root@master ~]# scp /root/.bash_profile slave2:/root/
    
  6. 修改slave1和slave2的myid文件

    • slave1

      [root@slave1 ~]# echo 2 > /usr/local/src/zookeeper/data/myid 
      
    • slave2

      [root@slave2 ~]# echo 3 > /usr/local/src/zookeeper/data/myid 
      
  7. 分别启动zk集群

    • master

      [root@master ~]# source /root/.bash_profile 
      [root@master ~]# zkServer.sh start 
      ZooKeeper JMX enabled by default
      Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
      Starting zookeeper ... STARTED
      
    • slave1

      [root@slave1 ~]# source /root/.bash_profile 
      [root@slave1 ~]# zkServer.sh start 
      ZooKeeper JMX enabled by default
      Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
      Starting zookeeper ... STARTED
      
    • slave2

      [root@slave2 ~]# source /root/.bash_profile 
      [root@slave2 ~]# zkServer.sh start 
      ZooKeeper JMX enabled by default
      Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
      Starting zookeeper ... STARTED
      
  8. 分别查看zk集群的状态

    注意:leader和follower是选举出来不是固定在某台机器上

    [root@master ~]# zkServer.sh status
    ZooKeeper JMX enabled by default
    Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
    Mode: follower
    [root@master ~]# jps
    17120 QuorumPeerMain
    17230 Jps
    [root@slave1 ~]# zkServer.sh status
    ZooKeeper JMX enabled by default
    Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
    Mode: leader
    [root@slave1 ~]# jps
    15721 Jps
    15050 QuorumPeerMain
    [root@slave2 ~]# zkServer.sh status
    ZooKeeper JMX enabled by default
    Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
    Mode: follower
    [root@slave2 ~]# jps
    14965 QuorumPeerMain
    15647 Jps
    

4、配置hadoop HA

  1. 将/h3cu下的hadoop解压到/usr/lcoal/src下(master上操作)

    [root@master ~]# tar -xzf /h3cu/hadoop-2.7.1.tar.gz -C /usr/local/src/
    
  2. 将解压后的hadoop文件重命名为hadoop

    [root@master ~]# mv /usr/local/src/hadoop-2.7.1 /usr/local/src/hadoop
    
  3. 配置hadoop环境变量,仅当前用户生效

    [root@master ~]# vi /root/.bash_profile 
    export HADOOP_HOME=/usr/local/src/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    
  4. 加载环境变量,查看hadoop版本

    [root@master ~]# source /root/.bash_profile 
    [root@master ~]# hadoop version 
    Hadoop 2.7.1
    Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a
    Compiled by jenkins on 2015-06-29T06:04Z
    Compiled with protoc 2.5.0
    From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
    This command was run using /usr/local/src/hadoop/share/hadoop/common/hadoop-common-2.7.1.jar
    
  5. 配置slaves

    [root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/slaves 
    master
    slave1
    slave2
    

4.1 配置文件

####hadoop-env.sh

[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/hadoop-env.sh 
export JAVA_HOME=/usr/local/src/java
yarn-env.sh
[root@master ~]# vim /usr/local/src/hadoop/etc/hadoop/yarn-env.sh 
export JAVA_HOME=/usr/local/src/java
hdfs-site.xml

命令:

[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/hdfs-site.xml 

配置文件内容:

<property>
  <!--是否开启DFS文件权限-->
  <name>dfs.permissions.enabled</name>
  <value>false</value>
</property>
<property>
  <!--配置DataNode副本数量-->
  <name>dfs.replication</name>
  <value>3</value>
</property>
<property>
  <!--本地NameNode的数据存放地址-->
  <name>dfs.namenode.name.dir</name>
  <value>/usr/local/src/hadoop/dfs/name/data</value>
</property>
<property>
  <!--本地DataNode的数据存放地址-->
  <name>dfs.datanode.data.dir</name>
  <value>/usr/local/src/hadoop/dfs/data/data</value>
</property>
<property>
  <!--NameNode设置空间命名-->
  <name>dfs.nameservices</name>
  <value>mycluster</value>
</property>
<property>
  <!--配置NameNode的ID列表-->
  <name>dfs.ha.namenodes.mycluster</name>
  <value>nn1,nn2</value>
</property>
<property>
  <!--master主机监听的标准RPC地址-->
  <name>dfs.namenode.rpc-address.mycluster.nn1</name>
  <value>master:8020</value>
</property>
<property>
  <!--slave1主机监听的标准RPC地址-->
  <name>dfs.namenode.rpc-address.mycluster.nn2</name>
  <value>slave1:8020</value>
</property>
<property>
  <!--master主机监听的标准HTTP地址-->
  <name>dfs.namenode.http-address.mycluster.nn1</name>
  <value>master:50070</value>
</property>
<property>
  <!--slave1主机监听的标准HTTP地址-->
  <name>dfs.namenode.http-address.mycluster.nn2</name>
  <value>slave1:50070</value>
</property>
<property>
  <!--标识NameNode将在其中写入/读取编辑内容的JN组的URI-->
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://master:8485;slave1:8485;slave2:8485/mycluster</value>
</property>
<property>
  <!--配置Java类的名称,DFS客户端将使用该Java类来确定哪个NameNode是当前的Active-->
  <name>dfs.client.failover.proxy.provider.mycluster</name> 
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> 
</property>
<property>
  <!--是否启动自动故障转移-->
  <name>dfs.ha.automatic-failover.enabled</name>
  <value>true</value>
</property>
<property>
  <!--指定自动故障zookeeper集群-->
  <name>ha.zookeeper.quorum</name>
  <value>master:2181,slave1:2181,slave2:2181</value>
</property>
<property>
  <!--脚本或Java类的列表,将在故障转移期间用来隔离Active NameNode-->
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>
<property>
  <!--用逗号分隔的SSH私钥文件列表-->
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/root/.ssh/id_rsa</value>
</property>
<property>
  <!--为SSH配置一个以毫秒为单位的超时-->
  <name>dfs.ha.fencing.ssh.connect-timeout</name>
  <value>30000</value>
</property>
<property>
  <!--设置namenode线程,处理datanode发出RPC请>求-->
  <name>dfs.namenode.handler.count</name>
  <value>100</value>
</property>
<property>
  <!-- 指定web可以访问hdfs目录 -->
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>
<property>
  <!--hdfs设置文件块大小,该处设置为256M-->
  <name>dfs.blocksize</name>
  <value>268435456</value>
</property>
core-site.xml

命令:

[root@master ~]# vim /usr/local/src/hadoop/etc/hadoop/core-site.xml

配置内容:

<property>
  <!--namenode的URL地址-->
  <name>fs.defaultFS</name>
  <value>hdfs://mycluster</value>
</property>
<property>
  <!--journalnode日志服务器数据本地地址-->
  <name>dfs.journalnode.edits.dir</name>
  <value>/usr/local/src/hadoop/journalnode</value>
</property>
<property>
  <!--hadoop临时文件路径-->
  <name>hadoop.tmp.dir</name>
  <value>/usr/local/src/hadoop/dfs/tmp</value>
</property>
<property>
  <!--缓存大小-->
  <name>io.file.buffer.size</name>
  <value>4096</value>
</property>
<property>
  <!-- 指定可以在任何IP访问-->
  <name>hadoop.proxuuser.hduser.hosts</name>
  <value>*</value>
</property>
<property>
  <!-- 指定所有用户可以访问 -->
  <name>hadoop.proxyuser.hduser.groups</name>
  <value>*</value>
</property>
yarn-site.xml

命令:

[root@master ~]# vim /usr/local/src/hadoop/etc/hadoop/yarn-site.xml

配置内容:

<property>
  <!--启用RM HA,默认为false-->
  <name>yarn.resourcemanager.ha.enabled</name>
  <value>true</value>
</property>
<property>
  <!--标识集群。由选民使用,以确保RM不会接替另一个群集的活动状态-->
  <name>yarn.resourcemanager.cluster-id</name>
  <value>RMcluster</value>
</property>
<property>
  <!--RM的逻辑ID列表-->
  <name>yarn.resourcemanager.ha.rm-ids</name>
  <value>rm1,rm2</value>
</property>
<property>
  <!-- RM 管理接口地址:端口-->
  <name>yarn.resourcemanager.hostname.rm1</name>
  <value>master</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname.rm2</name>
  <value>slave1</value>
</property>
<property>
  <!--设置master网页界面端口-->
  <name>yarn.resourcemanager.webapp.address.rm1</name>
  <value>master:8088</value>
</property>
<property>
  <!--设置slave1网页界面端口-->
  <name>yarn.resourcemanager.webapp.address.rm2</name>
  <value>slave1:8088</value>
</property> 
<property>
  <!--ZK仲裁的地址。用于状态存储和嵌入式领导者选举。-->
  <name>yarn.resourcemanager.zk-address</name>
  <value>master:2181,slave1:2181,slave2:2181</value>
</property>
<property>  
  <!-- resourcemanager 失联后重新链接的时间 -->
  <name>yarn.resourcemanager.connect.retry-interval.ms</name>
  <value>2000</value>
</property>
<property>
  <!--执行rm恢复机制实现类-->
  <name>yarn.resourcemanager.store.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
  <!--指定resourcemanager地址-->
  <name>yarn.resourcemanager.hostname</name>
  <value>master</value>
</property>
<property>
  <!--启用自动故障转移;默认情况下,仅在启用HA时才启用它。-->
  <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
  <value>true</value>
</property> 
<property>
  <!-- 失联等待链接时间-->
  <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
  <value>5000</value>
</property>
<property>
  <!--Shuffle service 需要加以设置的Map Reduce的应用程序服务-->
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
  <!--启用RM自动恢复功能-->
  <name>yarn.resourcemanager.recovery.enabled</name>
  <value>true</value>
</property>
mapred-site.xml

命令:

[root@master ~]# cp /usr/local/src/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/src/hadoop/etc/hadoop/mapred-site.xml
[root@master ~]# vim /usr/local/src/hadoop/etc/hadoop/mapred-site.xml

配置内容:

<property>
  <!--使用yarn管理hadoop集群上的资源-->
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>

4.2 格式化操作

分发文件
[root@master ~]# scp -r /usr/local/src/hadoop slave1:/usr/local/src/ & 
[root@master ~]# scp -r /usr/local/src/hadoop slave2:/usr/local/src/ & 
[root@master ~]# scp -r /root/.bash_profile slave1:/root/ & 
[root@master ~]# scp -r /root/.bash_profile slave2:/root/ & 
zkfc 格式化
[root@master ~]# hdfs zkfc -formatZK
21/06/23 07:56:01 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at master/10.1.1.101:8020
21/06/23 07:56:02 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
21/06/23 07:56:02 INFO zookeeper.ZooKeeper: Client environment:host.name=master
21/06/23 07:56:02 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_144
...
21/06/23 07:56:07 INFO zookeeper.ZooKeeper: Session: 0x17a34723b95000b closed
21/06/23 07:56:07 INFO zookeeper.ClientCnxn: EventThread shut down
启动journalnode
[root@master ~]# hadoop-daemons.sh start journalnode 
slave1: starting journalnode, logging to /usr/local/src/hadoop/logs/hadoop-root-journalnode-slave1.out
slave2: starting journalnode, logging to /usr/local/src/hadoop/logs/hadoop-root-journalnode-slave2.out
master: starting journalnode, logging to /usr/local/src/hadoop/logs/hadoop-root-journalnode-master.out
[root@master ~]# jps
21266 QuorumPeerMain
28184 JournalNode
28233 Jps
[root@slave1 ~]# jps
37478 Jps
37406 JournalNode
33215 QuorumPeerMain
[root@slave2 ~]# jps
36816 Jps
36744 JournalNode
33183 QuorumPeerMain
namenode格式化
[root@master ~]# hdfs namenode -format 
21/06/23 08:01:31 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = master/10.1.1.101
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.1
....
21/06/23 08:01:33 INFO namenode.FSImage: Allocated new BlockPoolId: BP-298621373-10.1.1.101-1624406493315
21/06/23 08:01:33 INFO common.Storage: Storage directory /usr/local/src/hadoop/dfs/name/data has been successfully formatted.
21/06/23 08:01:33 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
21/06/23 08:01:33 INFO util.ExitUtil: Exiting with status 0
21/06/23 08:01:33 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/10.1.1.101
************************************************************/
启动集群,备份namenode数据
[root@master ~]# start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master slave1]
slave1: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-root-namenode-slave1.out
master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-root-namenode-master.out
slave1: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-root-datanode-slave1.out
slave2: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-root-datanode-slave2.out
master: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-root-datanode-master.out
Starting journal nodes [master slave1 slave2]
slave2: journalnode running as process 36744. Stop it first.
slave1: journalnode running as process 37406. Stop it first.
master: journalnode running as process 28184. Stop it first.
Starting ZK Failover Controllers on NN hosts [master slave1]
slave1: starting zkfc, logging to /usr/local/src/hadoop/logs/hadoop-root-zkfc-slave1.out
master: starting zkfc, logging to /usr/local/src/hadoop/logs/hadoop-root-zkfc-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-root-resourcemanager-localhost.localdomain.out
slave2: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-root-nodemanager-slave2.out
slave1: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-root-nodemanager-slave1.out
master: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-root-nodemanager-master.out
[root@master ~]# jps
29136 NodeManager
28929 DFSZKFailoverController
21266 QuorumPeerMain
28184 JournalNode
28648 DataNode
29032 ResourceManager
29421 Jps
28431 NameNode
[root@slave1 ~]# jps
37697 DFSZKFailoverController
37768 NodeManager
37580 DataNode
37406 JournalNode
33215 QuorumPeerMain
37871 Jps
[root@slave2 ~]# jps
36976 NodeManager
36854 DataNode
37079 Jps
36744 JournalNode
33183 QuorumPeerMain
slave1 NameNode同步主节点信息
[root@slave1 ~]# hdfs namenode -bootstrapStandby
21/06/23 08:05:33 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = slave1/10.1.1.102
STARTUP_MSG:   args = [-bootstrapStandby]
STARTUP_MSG:   version = 2.7.1
=====================================================
About to bootstrap Standby ID nn2 from:
           Nameservice ID: mycluster
        Other Namenode ID: nn1
  Other NN's HTTP address: http://master:50070
  Other NN's IPC  address: master/10.1.1.101:8020
             Namespace ID: 321931140
            Block pool ID: BP-298621373-10.1.1.101-1624406493315
               Cluster ID: CID-d443340a-5baa-411e-9dee-81ab0b4b08b8
           Layout version: -63
       isUpgradeFinalized: true
=====================================================
21/06/23 08:05:34 INFO common.Storage: Storage directory /usr/local/src/hadoop/dfs/name/data has been successfully formatted.
21/06/23 08:05:35 INFO namenode.TransferFsImage: Opening connection to http://master:50070/imagetransfer?getimage=1&txid=0&storageInfo=-63:321931140:0:CID-d443340a-5baa-411e-9dee-81ab0b4b08b8
21/06/23 08:05:35 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
21/06/23 08:05:35 INFO namenode.TransferFsImage: Transfer took 0.00s at 0.00 KB/s
21/06/23 08:05:35 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 351 bytes.
21/06/23 08:05:35 INFO util.ExitUtil: Exiting with status 0
21/06/23 08:05:35 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at slave1/10.1.1.102
************************************************************/
[root@slave1 ~]# hadoop-daemon.sh start namenode 
starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-root-namenode-localhost.localdomain.out
[root@slave1 ~]# jps
38128 Jps
37697 DFSZKFailoverController
37959 NameNode
37768 NodeManager
37580 DataNode
37406 JournalNode
33215 QuorumPeerMain

4.3 测试集群

4.3.1 查看web浏览器

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

4.3.2 运行mapreduce程序
[root@master ~]# vim wordcount.txt
The lives of most men are determined by their environment
They accept the circumstancesamid which fate has thrown them 
not only with resignation but even with good will
They arelike streetcars running contentedly on their rails 
and they despise the sprightly flivver thatdashes in and 
out of the traffic and speeds so jauntily across the open country
I respectthem they are good citizens  good husbands and good fathers
and of course somebody hasto pay the taxes but I do not find them exciting
I am fascinated by the men few enough in allconscience
who take life in their own hands and seem to mould it to their own liking
It may bethat we have no such thing as free will
but at all events we have the illusion ofit  At acrossroad it does seem
to us that we might go either to the right or the left and  the choiceonce made
it is difficult to see that the whole course of the world history obliged us
to takethe turning we didI never met a more interesting man than Mayhew  He was
a lawyerin Detroit  He was an able anda successful one  By the time he was thirty-five
he had a large and a lucrative praaice  he hadamassed a competence  and he stood on
the threshold of a distinguished career  He had ana cute brain  anattractive personality
and uprightness  There was no reason why he shouldnot become  financially or politically
a power in the land  One evening he was sitting in his clubwith a group of friends and
they were perhaps a little worse (or the better) for liquor  One ofthem had recently come
from Italy and he told them of a house he had seen at Capri  a houseon the hill  overlooking
the Bay of Naples  with a large and shady garden  He described to themthe beauty of the
most beautifulisland in the Mediterranean 
[root@master ~]# hdfs dfs -put wordcount.txt /
[root@master ~]# hadoop jar /usr/local/src/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /wordcount.txt /output
21/06/23 08:28:08 INFO input.FileInputFormat: Total input paths to process : 1
21/06/23 08:28:08 INFO mapreduce.JobSubmitter: number of splits:1
21/06/23 08:28:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1624407510976_0001
21/06/23 08:28:08 INFO impl.YarnClientImpl: Submitted application application_1624407510976_0001
21/06/23 08:28:08 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1624407510976_0001/
21/06/23 08:28:08 INFO mapreduce.Job: Running job: job_1624407510976_0001
21/06/23 08:28:18 INFO mapreduce.Job: Job job_1624407510976_0001 running in uber mode : false
21/06/23 08:28:18 INFO mapreduce.Job:  map 0% reduce 0%
21/06/23 08:28:26 INFO mapreduce.Job:  map 100% reduce 0%
21/06/23 08:28:34 INFO mapreduce.Job:  map 100% reduce 100%
21/06/23 08:28:34 INFO mapreduce.Job: Job job_1624407510976_0001 completed successfully
21/06/23 08:28:34 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=2463
		FILE: Number of bytes written=242401
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=1829
		HDFS: Number of bytes written=1676
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=6030
		Total time spent by all reduces in occupied slots (ms)=4895
		Total time spent by all map tasks (ms)=6030
		Total time spent by all reduce tasks (ms)=4895
		Total vcore-seconds taken by all map tasks=6030
		Total vcore-seconds taken by all reduce tasks=4895
		Total megabyte-seconds taken by all map tasks=6174720
		Total megabyte-seconds taken by all reduce tasks=5012480
	Map-Reduce Framework
		Map input records=24
		Map output records=314
		Map output bytes=2969
		Map output materialized bytes=2463
		Input split bytes=95
		Combine input records=314
		Combine output records=196
		Reduce input groups=196
		Reduce shuffle bytes=2463
		Reduce input records=196
		Reduce output records=196
		Spilled Records=392
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=172
		CPU time spent (ms)=2110
		Physical memory (bytes) snapshot=324399104
		Virtual memory (bytes) snapshot=4162691072
		Total committed heap usage (bytes)=219676672
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=1734
	File Output Format Counters 
		Bytes Written=1676
[root@master ~]# hdfs dfs -cat /output/* | head -5
(or	1
At	1
Bay	1
By	1
Capri	1

在这里插入图片描述

  • 4
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值