搭建Hadoop完全分布式

 首先准备好三台机,配好ip

 一、配置 Linux 系统基础环境

 设置服务器的主机名称

[root@localhost ~]# hostnamectl set-hostname master 

[root@localhost ~]# bash

[root@master ~]# hostname 

master 

绑定主机名与 IP 地址 

[root@master ~]# vi /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4

                  localhost4.localdomain4

::1              localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.20.8 master 

查看 SSH 服务状态 

 [root@master ~]# systemctl status sshd 

关闭防火墙 

[root@master ~]# systemctl stop firewalld 

[root@master ~]# systemctl status firewalld

[root@master ~]# systemctl disable firewalld 

创建 hadoop 用户 

[root@master ~]# useradd hadoop

[root@master ~]# echo "1234" |passwd --stdin hadoop 

更改用户 hadoop 的密码 。

passwd:所有的身份验证令牌已经成功更新。 

二、安装 JAVA 环境

JDK 安 装 包 需 要 在 Oracle 官 网 下 载 , 下 载 地 址 为 :https://www.oracle.com/java /technologies /javase-jdk8-downloads.html,本教材采用 的 Hadoop 2.7.1 所需要的 JDK 版本为 JDK7 以上,这里采用的安装包为 jdk-8u152-linuxx64.tar.gz。

卸载自带 OpenJDK

键入命令

[root@master ~]# rpm -qa  | grep java

javapackages-tools-3.4.1-11.el7.noarch

java-1.8.0-openjdk-1.8.0.352.b08-2.el7_9.x86_64

tzdata-java-2022e-1.el7.noarch

python-javapackages-3.4.1-11.el7.noarch

java-1.8.0-openjdk-headless-1.8.0.352.b08-2.el7_9.x86_64 

卸载相关服务,键入命令 

[root@master ~]# rpm -e --nodeps javapackages-tools-3.4.1-11.el7.noarch

[root@master ~]# rpm -e --nodeps java-1.8.0-openjdk-1.8.0.352.b082.el7_9.x86_64

[root@master ~]# rpm -e --nodeps tzdata-java-2022e-1.el7.noarch

[root@master ~]# rpm -e --nodeps python-javapackages-3.4.1-11.el7.noarch

[root@master ~]# rpm -e --nodeps java-1.8.0-openjdk-headless-1.8.0.352.b082.el7_9.x86_64

[root@master ~]# rpm -qa | grep java

查看删除结果再次键入命令 java -version 出现以下结果表示删除功

[root@master ~]# java --version

bash: java: 未找到命令 

安装 JDK

[root@master ~]# tar -zxvf /opt/software/jdk-8u152-linux-x64.tar.gz  -C /usr/local/src/

[root@master ~]# ls /usr/local/src/ 

jdk1.8.0_152 

设置 JAVA 环境变量

[root@master ~]# vi /etc/profile

在文件的最后增加如下两行:  

export JAVA_HOME=/usr/local/src/jdk1.8.0_152

export PATH=$PATH:$JAVA_HOME/bin 

执行 source 使设置生效:  

[root@master ~]# source /etc/profile 

检查 JAVA 是否可用。  

[root@master ~]# echo $JAVA_HOME /usr/local/src/jdk1.8.0_152

[root@master ~]# java -version

java version "1.8.0_152"

Java(TM) SE Runtime Environment (build 1.8.0_152-b16)

Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode)

能够正常显示 Java 版本则说明 JDK 安装并配置成功。

三、安装 Hadoop 软件

安装 Hadoop 软件

[root@master ~]# tar -zxvf /opt/software/hadoop-2.7.1.tar.gz  -C /usr/local/src/

[root@master ~]# ll /usr/local/src/

总用量 0

drwxr-xr-x. 9 10021 10021 149 6月  29 2015 hadoop-2.7.1

drwxr-xr-x. 8    10   143 255 9月  14 2017 jdk1.8.0_152

查看 Hadoop 目录,得知 Hadoop 目录内容如下:  

[root@master ~]# ll /usr/local/src/hadoop-2.7.1/ 

配置 Hadoop 环境变量 

[root@master ~]# vi /etc/profile 

在文件的最后增加如下两行:  

export HADOOP_HOME=/usr/local/src/hadoop-2.7.1  

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

执行 source 使用设置生效:

[root@master ~]# source  /etc/profile 
 
检查设置是否生效:  

[root@master ~]# hadoop 

修改目录所有者和所有者 

[root@master ~]# chown -R hadoop:hadoop /usr/local/src/

[root@master ~]# ll /usr/local/src/

总用量 0

drwxr-xr-x. 9 hadoop hadoop 149 6月  29 2015 hadoop-2.7.1

drwxr-xr-x. 8 hadoop hadoop 255 9月  14 2017 jdk1.8.0_152  

/usr/local/src 目录的所有者已经改为 hadoop 了。 

四、 安装单机版 Hadoop 系统

配置 Hadoop 配置文件 

[root@master ~]# cd /usr/local/src/hadoop-2.7.1/

[root@master hadoop-2.7.1]# ls

bin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share

[root@master hadoop-2.7.1]# vi  etc/hadoop/hadoop-env.sh  

在文件中查找 export JAVA_HOME 这行,将其改为如下所示内容:

export JAVA_HOME=/usr/local/src/jdk1.8.0_152 

切换到 hadoop 用户

使用 hadoop 这个用户来运行 Hadoop 软件。  

[root@master hadoop-2.7.1]# su - hadoop

[hadoop@master ~]$ id  

uid=1001(hadoop) gid=1001(hadoop) 组=1001(hadoop) 环境=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

创建输入数据存放目录 

将输入数据存放在~/input 目录(hadoop 用户主目录下的 input 目录中)。  

[hadoop@master ~]$ mkdir ~/input

[hadoop@master ~]$ ls  

Input

创建数据输入文件 

[hadoop@master ~]$ vi  input/data.txt

输入如下内容,保存退出。  

Hello World

Hello Hadoop

Hello Husan 

测试 MapReduce 运行 

[hadoop@master ~]$ hadoop  jar  /usr/local/src/hadoop2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount  ~/input/data.txt   ~/output 

[hadoop@master ~]$ ll output/ 

[hadoop@master ~]$ cat output/part-r-00000  

Hadoop 1

Hello  3

Husan 1

World 1 

[root@master ~]# vi /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

 ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.20.8 master

192.168.20.9 slave1

192.168.20.10 slave2

生成 SSH 密钥

[root@master ~]# rpm -qa | grep openssh

openssh-server-7.4p1-11.el7.x86_64

openssh-7.4p1-11.el7.x86_64

openssh-clients-7.4p1-11.el7.x86_64

[root@master ~]# rpm -qa | grep  rsync

rsync-3.1.2-11.el7_9.x86_64 

切换到 hadoop 用户 

[root@master ~]# su - hadoop

[hadoop@master ~]$ 

每个节点生成秘钥对 

[hadoop@master ~]$ ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):  

Created directory '/home/hadoop/.ssh'.

Enter passphrase (empty for no passphrase):  

Enter same passphrase again:  

Your identification has been saved in /home/hadoop/.ssh/id_rsa.

Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.

The key fingerprint is:

SHA256:LOwqw+EjBHJRh9U1GdRHfbhV5+5BX+/hOHTEatwIKdU hadoop@master

The key's randomart image is:

+---[RSA 2048]----+

|   ..oo. o==...o+|

|  . ..  . o.oE+.=|

|   .     . o . *+|

|o .  . .  . o B.+|

|o.    o S    * =+|

| ..  . .    o +oo|

|.o .  .      o .o|

|. *  .        .  |

| . +.            |

+----[SHA256]-----+

查看"/home/hadoop/"下是否有".ssh"文件夹,且".ssh"文件下是否有两个刚生产的无密码密钥对。 

[hadoop@master ~]$ ls ~/.ssh/

id_rsa  id_rsa.pub 

将 id_rsa.pub 追加到授权 key 文件中 

#master  

[hadoop@master ~]$ cat  ~/.ssh/id_rsa.pub >>  ~/.ssh/authorized_keys

[hadoop@master ~]$ ls ~/.ssh/

authorized_keys  id_rsa  id_rsa.pub

修改文件"authorized_keys"权限 

#master  

[hadoop@master ~]$ chmod  600  ~/.ssh/authorized_keys

[hadoop@master ~]$ ll ~/.ssh/

总用量 12

-rw-------. 1 hadoop hadoop  395 11月 14 16:18 authorized_keys

-rw-------. 1 hadoop hadoop 1679 11月 14 16:14 id_rsa

-rw-r--r--. 1 hadoop hadoop  395 11月 14 16:14 id_rsa.pub

配置 SSH 服务 

#master  

[hadoop@master ~]$ su - root

密码:

上一次登录:一 11月 14 15:48:10 CST 2022从 192.168.47.1pts/1 上

[root@master ~]# vi /etc/ssh/sshd_config

PubkeyAuthentication yes  #找到此行,并把#号注释删除。 

重启 SSH 服务

[root@master ~]# systemctl restart sshd

切换到 hadoop 用户

[root@master ~]# su - hadoop 

验证 SSH 登录本机 

[hadoop@master ~]$ ssh localhost ]

交换 SSH 密钥 

将 Master 节点的公钥 id_rsa.pub 复制到每个 Slave 点 

[hadoop@master ~]$ scp ~/.ssh/id_rsa.pub hadoop@slave1:~/

hadoop@slave1's password:  

id_rsa.pub                                                            100%  395   303.6KB/s   00:00    

[hadoop@master ~]$ scp ~/.ssh/id_rsa.pub hadoop@slave2:~/

The authenticity of host 'slave2 (192.168.47.142)' can't be established.

ECDSA key fingerprint is

SHA256:KvO9HlwdCTJLStOxZWN7qrfRr8FJvcEw2hzWAF9b3bQ.

ECDSA key fingerprint is MD5:07:91:56:9e:0b:55:05:05:58:02:15:5e:68:db:be:73.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'slave2,192.168.47.142' (ECDSA) to the list of known hosts. hadoop@slave2's password:  id_rsa.pub                                                            100%  395   131.6KB/s   00:00 

在每个 Slave 节点把 Master 节点复制的公钥复制到authorized_keys 文件 

hadoop 用户登录 slave1 和 slave2 节点,执行命令。  

[hadoop@slave1 ~]$ cat ~/id_rsa.pub >>~/.ssh/authorized_keys  

[hadoop@slave2 ~]$ cat ~/id_rsa.pub >>~/.ssh/authorized_keys 

在每个 Slave 节点删除 id_rsa.pub 文件 

[hadoop@slave1 ~]$ rm -rf ~/id_rsa.pub  

[hadoop@slave2 ~]$ rm  -rf ~/id_rsa.pub 

 五、Hadoop平台环境配置

修改 slave1 机器主机名  

[root@localhost ~]# hostnamectl set-hostname slave1

[root@localhost ~]# bash

[root@slave1 ~]# 

[root@master ~]# vi /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

 ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.20.8 master

192.168.20.9 slave1

192.168.20.10 slave2

[root@slave1 ~]# useradd hadoop

[root@slave1 ~]# su -  hadoop

[hadoop@slave1 ~]$

[hadoop@slave1 ~]$ ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):  

Created directory '/home/hadoop/.ssh'.

Enter passphrase (empty for no passphrase):  

Enter same passphrase again:  

Your identification has been saved in /home/hadoop/.ssh/id_rsa.

Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.

The key fingerprint is:

SHA256:LOwqw+EjBHJRh9U1GdRHfbhV5+5BX+/hOHTEatwIKdU hadoop@slave1

The key's randomart image is:

+---[RSA 2048]----+

|   ..oo. o==...o+|

|  . ..  . o.oE+.=|

|   .     . o . *+|

|o .  . .  . o B.+|

|o.    o S    * =+|

| ..  . .    o +oo|

|.o .  .      o .o|

|. *  .        .  |

| . +.            |

+----[SHA256]-----+

#slave1  

[hadoop@slave1 ~]$ cat  ~/.ssh/id_rsa.pub >>  ~/.ssh/authorized_keys

[hadoop@slave1 ~]$ ls ~/.ssh/

authorized_keys  id_rsa  id_rsa.pub 

#slave1  

[hadoop@slave1 ~]$ chmod  600  ~/.ssh/authorized_keys

[hadoop@slave1 ~]$ ll ~/.ssh/

总用量 12

-rw-------. 1 hadoop hadoop  395 11月 14 16:18 authorized_keys

-rw-------. 1 hadoop hadoop 1679 11月 14 16:14 id_rsa

-rw-r--r--. 1 hadoop hadoop  395 11月 14 16:14 id_rsa.pub

#slave1  

[hadoop@ slave1 ~]$ su - root

密码:

上一次登录:一 11月 14 15:48:10 CST 2022从 192.168.47.1pts/1 上

[root@ slave1 ~]# vi /etc/ssh/sshd_config

PubkeyAuthentication yes  #找到此行,并把#号注释删除。 

修改 slave2 机器主机名

[root@localhost ~]# hostnamectl set-hostname slave2

[root@localhost ~]# bash

[root@slave2 ~]#

[root@master ~]# vi /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

 ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.20.8 master

192.168.20.9 slave1

192.168.20.10 slave2

[root@slave2 ~]# useradd hadoop

[root@slave2 ~]# su -  hadoop

[hadoop@slave2 ~]$ 

[hadoop@slave2 ~]$ ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):  

Created directory '/home/hadoop/.ssh'.

Enter passphrase (empty for no passphrase):  

Enter same passphrase again:  

Your identification has been saved in /home/hadoop/.ssh/id_rsa.

Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.

The key fingerprint is:

SHA256:LOwqw+EjBHJRh9U1GdRHfbhV5+5BX+/hOHTEatwIKdU hadoop@slave2

The key's randomart image is:

+---[RSA 2048]----+

|   ..oo. o==...o+|

|  . ..  . o.oE+.=|

|   .     . o . *+|

|o .  . .  . o B.+|

|o.    o S    * =+|

| ..  . .    o +oo|

|.o .  .      o .o|

|. *  .        .  |

| . +.            |

+----[SHA256]-----+

#slave2 

[hadoop@slave2 ~]$ cat  ~/.ssh/id_rsa.pub >>  ~/.ssh/authorized_keys

[hadoop@slave2 ~]$ ls ~/.ssh/

authorized_keys  id_rsa  id_rsa.pub

#slave2  

[hadoop@slave2 ~]$ chmod  600  ~/.ssh/authorized_keys

[hadoop@slave2 ~]$ ll ~/.ssh/

总用量 12

-rw-------. 1 hadoop hadoop  395 11月 14 16:18 authorized_keys

-rw-------. 1 hadoop hadoop 1679 11月 14 16:14 id_rsa

-rw-r--r--. 1 hadoop hadoop  395 11月 14 16:14 id_rsa.pub

#slave2

[hadoop@ slave2 ~]$ su - root

密码:

上一次登录:一 11月 14 15:48:10 CST 2022从 192.168.20.8pts/1 上

[root@ slave2 ~]# vi /etc/ssh/sshd_config

PubkeyAuthentication yes  #找到此行,并把#号注释删除。 

将每个 Slave 节点的公钥保存到 Master 

(1)将 Slave1 节点的公钥复制到 Master 

 [hadoop@slave1 ~]$ scp ~/.ssh/id_rsa.pub hadoop@master:~/ 

(2)在 Master 节点把从 Slave 节点复制的公钥复制到 authorized_keys 文件  

[hadoop@master ~]$ cat ~/id_rsa.pub >>~/.ssh/authorized_keys  

(3)在 Master 节点删除 id_rsa.pub 文件

[hadoop@master ~]$ rm -rf ~/id_rsa.pub 

(4)将 Slave2 节点的公钥复制到 Master  

[hadoop@slave2 ~]$ scp ~/.ssh/id_rsa.pub hadoop@master:~/ 

The authenticity of host 'master (192.168.47.140)' can't be established.

ECDSA key fingerprint is

SHA256:KvO9HlwdCTJLStOxZWN7qrfRr8FJvcEw2hzWAF9b3bQ.

ECDSA key fingerprint is MD5:07:91:56:9e:0b:55:05:05:58:02:15:5e:68:db:be:73.

Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'master,192.168.47.140' (ECDSA) to the list of known hosts.

hadoop@master's password:  id_rsa.pub                                                            100%  395   326.6KB/s   00:00    

[hadoop@slave2 ~]$ 

(5)在 Master 节点把从 Slave 节点复制的公钥复制到 authorized_keys 文件  

[hadoop@master ~]$ cat ~/id_rsa.pub >>~/.ssh/authorized_keys

(6)在 Master 节点删除 id_rsa.pub 文件  

[hadoop@master ~]$ rm -rf ~/id_rsa.pub

 验证 SSH 无密码登录

查看 Master 节点 authorized_keys 文件 

[hadoop@master ~]$ cat ~/.ssh/authorized_keys  ssh

查看 Slave 节点 authorized_keys 文件 

[hadoop@slave1 ~]$ cat ~/.ssh/authorized_keys 

[hadoop@slave2 ~]$ cat ~/.ssh/authorized_keys

验证 Master 到每个 Slave 节点无密码登录

[hadoop@master ~]$ ssh slave1 Last login: Mon Nov 14 16:34:56 2022

[hadoop@slave1 ~]$  
 
[hadoop@master ~]$ ssh slave2

Last login: Mon Nov 14 16:49:34 2022 from 192.168.47.140 

 [hadoop@slave2 ~]$ 

配置两个子节点slave1、slave2的JDK环境

[root@master ~]# cd /usr/local/src/

[root@master src]# ls

hadoop-2.7.1  jdk1.8.0_152

[root@master src]# scp -r jdk1.8.0_152  root@slave1:/usr/local/src/

[root@master src]# scp -r jdk1.8.0_152  root@slave2:/usr/local/src/ 

#slave1

[root@slave1 ~]# ls /usr/local/src/

jdk1.8.0_152

[root@slave1 ~]# vi /etc/profile   #此文件最后添加下面两行

export JAVA_HOME=/usr/local/src/jdk1.8.0_152 

export PATH=$PATH:$JAVA_HOME/bin

[root@slave1 ~]# source /etc/profile

[root@slave1 ~]# java -version

java version "1.8.0_152"

Java(TM) SE Runtime Environment (build 1.8.0_152-b16)

Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode) 
 
#slave2

[root@slave2 ~]# ls /usr/local/src/

jdk1.8.0_152

[root@slave2 ~]# vi /etc/profile   #此文件最后添加下面两行

export JAVA_HOME=/usr/local/src/jdk1.8.0_152

export PATH=$PATH:$JAVA_HOME/bin

[root@slave2 ~]# source /etc/profile 

[root@slave2 ~]# java -version

java version "1.8.0_152" Java(TM) SE Runtime Environment (build 1.8.0_152-b16)

Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode) 

 六、Hadoop集群运行

在 Master 节点上安装 Hadoop 

1. 将 hadoop-2.7.1 文件夹重命名为 Hadoop

[root@master ~]# cd /usr/local/src/

[root@master src]# mv hadoop-2.7.1 hadoop

[root@master src]# ls

hadoop  jdk1.8.0_152

2. 配置 Hadoop 环境变量

[root@master src]# yum install -y vim

[root@master src]# vim /etc/profile

[root@master src]# tail -n 4 /etc/profile

export JAVA_HOME=/usr/local/src/jdk1.8.0_152

export PATH=$PATH:$JAVA_HOME/bin

export HADOOP_HOME=/usr/local/src/hadoop

export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

3. 使配置的 Hadoop 的环境变量生效

[root@master src]# su - hadoop  

上一次登录:一 2 月 28 15:55:37 CST 2022 从 192.168.41.143pts/1 上

[hadoop@master ~]$ source /etc/profile

[hadoop@master ~]$ exit

登出

4. 执行以下命令修改 hadoop-env.sh 配置文件

[root@master src]# cd /usr/local/src/hadoop/etc/hadoop/

[root@master hadoop]# vim hadoop-env.sh   #修改以下配置 export

JAVA_HOME=/usr/local/src/jdk1.8.0_152

配置 hdfs-site.xml 文件参数

[root@master hadoop]# vim hdfs-site.xml  #编辑以下内容

[root@master hadoop]# tail -n 14 hdfs-site.xml  

<configuration>  

<property>  

  <name>dfs.namenode.name.dir</name>  

  <value>file:/usr/local/src/hadoop/dfs/name</value>  

</property>  

<property>  

  <name>dfs.datanode.data.dir</name>  

  <value>file:/usr/local/src/hadoop/dfs/data</value>

</property>  

<property>  

  <name>dfs.replication</name>    

  <value>3</value>  

</property> 

</configuration> 

配置 core-site.xml 文件参数 

[root@master hadoop]# vim core-site.xml  #编辑以下内容

[root@master hadoop]# tail -n 14 core-site.xml  

<configuration>

<property>  

  <name>fs.defaultFS</name>  

  <value>hdfs://192.168.20.8:9000</value>  

</property>  

<property>  

  <name>io.file.buffer.size</name>  

  <value>131072</value>  

</property>  

<property>  

  <name>hadoop.tmp.dir</name>  

  <value>file:/usr/local/src/hadoop/tmp</value>  

</property>

</configuration>

配置 mapred-site.xml 

[root@master hadoop]# pwd

/usr/local/src/hadoop/etc/hadoop

[root@master hadoop]# cp mapred-site.xml.template mapred-site.xml

[root@master hadoop]# vim mapred-site.xml  #添加以下配置

[root@master hadoop]# tail -n 14 mapred-site.xml

<configuration>

<property>  

  <name>mapreduce.framework.name</name>  

  <value>yarn</value>

</property>  

<property>  

  <name>mapreduce.jobhistory.address</name>  

  <value>master:10020</value> 

</property>  

<property>  

  <name>mapreduce.jobhistory.webapp.address</name>  

  <value>master:19888</value>

</property>

</configuration>

配置 yarn-site.xml

[root@master hadoop]# vim yarn-site.xml  #添加以下配置

[root@master hadoop]# tail -n 32 yarn-site.xml  

<configuration>  
<!-- Site specific YARN configuration properties -->

<property>  

  <name>yarn.resourcemanager.address</name>  

  <value>master:8032</value>  

</property>  

<property>  

  <name>yarn.resourcemanager.scheduler.address</name>  

  <value>master:8030</value>  

</property>  

<property>  

  <name>yarn.resourcemanager.resource-tracker.address</name>  

  <value>master:8031</value>

</property>  

<property>  

  <name>yarn.resourcemanager.admin.address</name>  

  <value>master:8033</value>  

</property>  

<property>  

  <name>yarn.resourcemanager.webapp.address</name>  

  <value>master:8088</value>

</property>  

<property>  

  <name>yarn.nodemanager.aux-services</name>  

  <value>mapreduce_shuffle</value>  

</property>

<property>  

  <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>

  <value>org.apache.hadoop.mapred.ShuffleHandler</value>  

</property>

</configuration>

Hadoop 其他相关配置

1. 配置 masters 文件

[root@master hadoop]# vim masters

[root@master hadoop]# cat masters  

192.168.20.8

2. 配置 slaves 文件

[root@master hadoop]# vim slaves

[root@master hadoop]# cat slaves  

192.168.20.9

192.168.20.10

3. 新建目录

[root@master hadoop]# mkdir /usr/local/src/hadoop/tmp

[root@master hadoop]# mkdir /usr/local/src/hadoop/dfs/name -p

[root@master hadoop]# mkdir /usr/local/src/hadoop/dfs/data -p

4. 修改目录权限

[root@master hadoop]# chown -R hadoop:hadoop /usr/local/src/hadoop/

5. 同步配置文件到 Slave 节点

[root@master ~]# scp -r /usr/local/src/hadoop/ root@slave1:/usr/local/src/ 

[root@master ~]# scp -r /usr/local/src/hadoop/ root@slave2:/usr/local/src/ 

#slave1 配置

[root@slave1 ~]# yum install -y vim

[root@slave1 ~]# vim /etc/profile [root@slave1 ~]# tail -n 4 /etc/profile

export JAVA_HOME=/usr/local/src/jdk1.8.0_152

export PATH=$PATH:$JAVA_HOME/bin

export HADOOP_HOME=/usr/local/src/hadoop

export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

[root@slave1 ~]# chown -R hadoop:hadoop /usr/local/src/hadoop/

[root@slave1 ~]# su - hadoop  上一次登录:四 2 月 24 11:29:00 CST 2022 从 192.168.41.148pts/1 上

[hadoop@slave1 ~]$ source /etc/profile 


#slave2 配置

[root@slave2 ~]# yum install -y vim

[root@slave2 ~]# vim /etc/profile

[root@slave2 ~]# tail -n 4 /etc/profile

export JAVA_HOME=/usr/local/src/jdk1.8.0_152

export PATH=$PATH:$JAVA_HOME/bin

export HADOOP_HOME=/usr/local/src/hadoop

export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

[root@slave2 ~]# chown -R hadoop:hadoop /usr/local/src/hadoop/

[root@slave2 ~]# su - hadoop  上一次登录:四 2 月 24 11:29:19 CST 2022 从 192.168.41.148pts/1 上

[hadoop@slave2 ~]$ source /etc/profile

 七、大数据平台集群运行

配置 Hadoop 格式化

步骤一:NameNode 格式化 

[root@master ~]# su – hadoop    

[hadoop@master ~]# cd /usr/local/src/hadoop/  

[hadoop@master hadoop]$ bin/hdfs namenode –format  

结果:  20/05/02 16:21:50 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at master/192.168.1.6 ************************************************************/ 

步骤二:启动 NameNode 

[hadoop@master hadoop]$ hadoop-daemon.sh start namenode  
 
starting  namenode,  logging  to  /opt/module/hadoop- 2.7.1/logs/hadoop-hadoop-namenode-master.out

查看 Java 进程 

[hadoop@master hadoop]$ jps    

3557 NameNode  

3624 Jps 

slave节点 启动 DataNode  

 [hadoop@slave1 hadoop]$ hadoop-daemon.sh start datanode    

starting  datanode,  logging  to  /opt/module/hadoop- 2.7.1/logs/hadoop-hadoop-datanode-master.out  

[hadoop@slave2 hadoop]$ hadoop-daemon.sh start datanode  

starting  datanode,  logging  to  /opt/module/hadoop- 2.7.1/logs/hadoop-hadoop-datanode-master.out  

[hadoop@slave1 hadoop]$ jps    

3557 DataNode  

3725 Jps    

[hadoop@slave2 hadoop]$ jps    

3557 DataNode  

3725 Jps 

启动 SecondaryNameNode 

[hadoop@master hadoop]$ hadoop-daemon.sh start secondarynamenode  

starting secondarynamenode, logging to /opt/module/hadoop- 2.7.1/logs/hadoop-hadoop-secondarynamenode-master.out  

[hadoop@master hadoop]$ jps    

34257 NameNode  

34449 SecondaryNameNode  

34494 Jps

查看到有 NameNode 和 SecondaryNameNode 两个进程,就表明 HDFS 启动成功。

查看 HDFS 数据存放位置

[hadoop@master hadoop]$ ll dfs/    

总用量 0  

drwx------ 3 hadoop hadoop 21 8 月  14 15:26 data  

drwxr-xr-x 3 hadoop hadoop 40 8 月  14 14:57 name

[hadoop@master hadoop]$ ll ./tmp/dfs  

总用量 0 

drwxrwxr-x. 3 hadoop hadoop 21 5 月   2 16:34 namesecondary 

查看 HDFS 的报告 

[hadoop@master sbin]$ hdfs dfsadmin -report

Configured Capacity: 8202977280 (7.64 GB)  

Present Capacity: 4421812224 (4.12 GB)  

DFS Remaining: 4046110720 (3.77 GB)  

DFS Used: 375701504 (358.30 MB)  

DFS Used%: 8.50%  

Under replicated blocks: 88  

Blocks with corrupt replicas: 0  

Missing blocks: 0  

-------------------------------------------------  

Live datanodes (2):    

Name: 192.168.20.9:50010 (slave1)  

Hostname: slave1  

Decommission Status : Normal  

Configured Capacity: 4101488640 (3.82 GB)  

DFS Used: 187850752 (179.15 MB)  

Non DFS Used: 2109939712 (1.97 GB)  

DFS Remaining: 1803698176 (1.68 GB)  

DFS Used%: 4.58%  

DFS Remaining%: 43.98%  

Configured Cache Capacity: 0 (0 B)  

Cache Used: 0 (0 B)  

Cache Remaining: 0 (0 B)  

Cache Used%: 100.00%  

Cache Remaining%: 0.00%

Xceivers: 1  

Last contact: Mon May 04 18:32:32 CST 2020    

Name: 192.168.20.10:50010 (slave2)

Hostname: slave2

Decommission Status : Normal  

Configured Capacity: 4101488640 (3.82 GB)  

DFS Used: 187850752 (179.15 MB)  

Non DFS Used: 1671225344 (1.56 GB)  

DFS Remaining: 2242412544 (2.09 GB)  

DFS Used%: 4.58%  

DFS Remaining%: 54.67%  

Configured Cache Capacity: 0 (0 B)  

Cache Used: 0 (0 B)  

Cache Remaining: 0 (0 B)  

Cache Used%: 100.00%  

Cache Remaining%: 0.00%   

Xceivers: 1  

Last contact: Mon May 04 18:32:32 CST 2020

在浏览器的地址栏输入http://master:50070,进入页面可以查看NameNode和DataNode
信息

在浏览器的地址栏输入 http://master:50090,进入页面可以查看 SecondaryNameNode信息

在浏览器的地址栏输入:http://master:8088

八、Hive 组件安装配置

解压安装文件

(1)使用 root 用户,将 Hive 安装包

/opt/software/apache-hive-2.0.0-bin.tar.gz 路解压到/usr/local/src 路径下。

[root@master ~]# tar -zxvf /opt/software/apache-hive-2.0.0-bin.tar.gz -C /usr/local/src

(2)将解压后的 apache-hive-2.0.0-bin 文件夹更名为 hive;

[root@master ~]# mv /usr/local/src/apache-hive-2.0.0-bin /usr/local/src/hive

(3)修改 hive 目录归属用户和用户组为 hadoop

[root@master ~]# chown -R hadoop:hadoop /usr/local/src/hive

设置 Hive 环境 

卸载 MariaDB 数据库

(1)关闭 Linux 系统防火墙,并将防火墙设定为系统开机并不自动启动。

# 关闭防火墙服务

[root@master ~]# systemctl stop firewalld

# 设置防火墙服务开机不启动

[root@master ~]# systemctl disable firewalld

(2)卸载 Linux 系统自带的 MariaDB。

   1)首先查看 Linux 系统中 MariaDB 的安装情况。

# 查询已安装的 mariadb 软件包

[root@ master ~]# rpm -qa | grep mariadb

mariadb-libs-5.5.52-2.el7.x86_64

   2)卸载 MariaDB 软件包。

# 卸载 mariadb 软件包

[root@master ~]# rpm -e --nodeps mariadb-libs-5.5.56-2.el7.x86_64

安装 MySQL 数据库 

 (1)按如下顺序依次按照 MySQL 数据库的 mysql common、mysql libs、mysql client 软件包。

# MySQL 软件包路径

[root@master ~]# cd /opt/software/mysql-5.7.18/

[root@master ~]# rpm -ivh mysql-community-common-5.7.18-1.el7.x86_64.rpm

[root@master ~]# rpm -ivh mysql-community-libs-5.7.18-1.el7.x86_64.rpm

[root@master ~]# rpm -ivh mysql-community-client-5.7.18-1.el7.x86_64.rpm

(2)安装 mysql server 软件包。

[root@master ~]# rpm -ivh mysql-community-server-5.7.18-1.el7.x86_64.rpm

(3)修改 MySQL 数据库配置,在/etc/my.cnf 文件中添加如表 6-1 所示的 MySQL 数据 库配置项。

 将以下配置信息添加到/etc/my.cnf 文件 symbolic-links=0 配置信息的下方。

default-storage-engine=innodb

innodb_file_per_table

collation-server=utf8_general_ci

init-connect='SET NAMES utf8'

character-set-server=utf8

(4)启动 MySQL 数据库。

[root@master ~]# systemctl start mysqld

(5)查询 MySQL 数据库状态。mysqld 进程状态为 active (running),则表示 MySQL 数 据库正常运行。 如果 mysqld 进程状态为 failed,则表示 MySQL 数据库启动异常。此时需要排查 /etc/my.cnf 文件。

[root@master ~]# systemctl status mysqld

(6)查询 MySQL 数据库默认密码。 MySQL 数据库安装后的默认密码保存在/var/log/mysqld.log 文件中,在该文件中以 password 关键字搜索默认密码。

[root@master ~]# cat /var/log/mysqld.log | grep password

2020-05-07T02:34:03.336724Z 1 [Note] A temporary password is generated for root@localhost: MPg5lhk4?>Ui                                                                    # 默 认 密 码 为 MPg5lhk4?>Ui

MySQL 数据库是安装后随机生成的,所以每次安装后生成的默认密码不相同。

(7)MySQL 数据库初始化。

执行 mysql_secure_installation 命令初始化 MySQL 数据库,初始化过程中需要设定 数据库 root 用户登录密码,密码需符合安全规则,包括大小写字符、数字和特殊符号, 可设定密码为 Password123$。

在进行 MySQL 数据库初始化过程中会出现以下交互确认信息:

1)Change the password for root ? ((Press y|Y for Yes, any other key for No)表示是否更改 root 用户密码,在键盘输入 y 和回车。

2)Do you wish to continue with the password provided?(Press y|Y for Yes, any other key for No)表示是否使用设定的密码继续,在键盘输入 y 和回车。

3)Remove anonymous users? (Press y|Y for Yes, any other key for No)表示是 否删除匿名用户,在键盘输入 y 和回车。

4)Disallow root login remotely? (Press y|Y for Yes, any other key for No) 表示是否拒绝 root 用户远程登录,在键盘输入 n 和回车,表示允许 root 用户远程登录。

5)Remove test database and access to it? (Press y|Y for Yes, any other key for No)表示是否删除测试数据库,在键盘输入 y 和回车。

6)Reload privilege tables now? (Press y|Y for Yes, any other key for No) 表示是否重新加载授权表,在键盘输入 y 和回车。

mysql_secure_installation 命令执行过程如下:

[root@master ~]# mysql_secure_installation

Securing the MySQL server deployment.

Enter password for user root:    # 输入/var/log/mysqld.log 文件中查询 到的默认 root 用户登录密码

The 'validate_password' plugin is installed on the server.

The subsequent steps will run with the existing configuration of the plugin.

Using existing password for root.

Estimated strength of the password: 100

Change the password for root ? ((Press y|Y for Yes, any other key for No) : y

New password:                # 输入新密码 Password123$

Re-enter new password: # 再次输入新密码 Password123$

Estimated strength of the password: 100

Do you wish to continue with the password provided?(Press y|Y for Yes, any other key for No) : #输入y 

By default, a MySQL installation has an anonymous user, allowing anyone to log into MySQL without having to have a user account created for them. This is intended only for testing, and to make the installation go a bit smoother. You should remove them before moving into a production environment.

Remove anonymous users? (Press y|Y for Yes, any other key for No) : y # 输入 y

Success.

Normally, root should only be allowed to connect from 'localhost'. This ensures that someone cannot guess at the root password from the network.

Disallow root login remotely? (Press y|Y for Yes, any other key for No) : n # 输入 n

... skipping. By default, MySQL comes with a database named 'test' that anyone can access. This is also intended only for testing, and should be removed before moving into a production environment.

Remove test database and access to it? (Press y|Y for Yes, any other key for No) : y # 输入 y

- Dropping test database...

Success.

- Removing privileges on test database...

Success.  

Reloading the privilege tables will ensure that all changes made so far will take effect immediately.

Reload privilege tables now? (Press y|Y for Yes, any other key for No) : y # 输入 y

Success.

All done!

(8)添加 root 用户从本地和远程访问 MySQL 数据库表单的授权。

[root@master ~]# mysql -uroot -p

Enter password: # 输入新设定的密码 Password123$

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connection id is 20

Server version: 5.7.18 MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> grant all privileges on *.* to root@'localhost' identified by 'Password123$'; # 添加 root 用户本地访问授权

Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> grant all privileges on *.* to root@'%' identified by 'Password123$'; # 添加 root 用户远程访问授权

Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> flush privileges; # 刷新授权

Query OK, 0 rows affected (0.00 sec)

mysql> select user,host from mysql.user where user='root';

# 查询 root 用户授权情况

+------+-----------+

| user | host |

+------+-----------+

| root | % |

| root | localhost |

+------+-----------+

2 rows in set (0.00 sec)

mysql> exit; # 退出 MySQL 数据库

Bye

配置 Hive 组件

(1)设置 Hive 环境变量并使其生效。

# 在文件末尾追加以下配置内容

[root@master ~]# vi /etc/profile

# set hive environment

export HIVE_HOME=/usr/local/src/hive

export PATH=$PATH:$HIVE_HOME/bin

# 使环境变量配置生效

[root@master ~]# source /etc/profile

(2)修改 Hive 组件配置文件。

切换到 hadoop 用户执行以下对 Hive 组件的配置操作。

将/usr/local/src/hive/conf 文件夹下 hive-default.xml.template 文件,更名为 hive-site.xml。

[root@master ~]# su - hadoop

[hadoop@master ~]$ cp /usr/local/src/hive/conf/hive-default.xml.template /usr/local/src/hive/conf/hive-site.xml

(3)通过 vi 编辑器修改 hive-site.xml 文件实现 Hive 连接 MySQL 数据库,并设定 Hive 临时文件存储路径。

[hadoop@master ~]$ vi /usr/local/src/hive/conf/hive-site.xml

1)设置 MySQL 数据库连接。

<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&us
eSSL=false</value>
<description>JDBC connect string for a JDBC metastore</description>
2)配置 MySQL 数据库 root 的密码。
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value> Password123$ </value>
<description>password to use against s database</description>
</property>
3)验证元数据存储版本一致性。若默认 false,则不用修改。
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in is compatible with one from
Hive jars. Also disable automatic
False: Warn if the version information stored in metastore doesn't match
with one from in Hive jars.
</description>
</property>
4)配置数据库驱动。
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
5)配置数据库用户名 javax.jdo.option.ConnectionUserName 为 root。
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
6 )将以下位置的 ${system:java.io.tmpdir}/${system:user.name} 替换为
“/usr/local/src/hive/tmp”目录及其子目录。
需要替换以下 4 处配置内容:
<name>hive.querylog.location</name>
<value>/usr/local/src/hive/tmp</value>
<description>Location of Hive run time structured log file
</description>
<name>hive.exec.local.scratchdir</name>
<value>/usr/local/src/hive/tmp</value>
<name>hive.downloaded.resources.dir</name>
<value>/usr/local/src/hive/tmp/resources</value>
<name>hive.server2.logging.operation.log.location</name>
<value>/usr/local/src/hive/tmp/operation_logs</value>
7)在 Hive 安装目录中创建临时文件夹 tmp。
[hadoop@master ~]$ mkdir /usr/local/src/hive/tmp
至此,Hive 组件安装和配置完成。

初始化 hive 元数据
1)将 MySQL 数据库驱动(/opt/software/mysql-connector-java-5.1.46.jar)拷贝到Hive 安装目录的 lib 下;
[hadoop@master ~]$ cp /opt/software/mysql-connector-java-5.1.46.jar /usr/local/src/hive/lib/
2)重新启动 hadooop 即可
[hadoop@master lib]$ stop-all.sh
[hadoop@master lib]$ start-all.sh
3)初始化数据库
[hadoop@master ~]$schematool -initSchema -dbType mysql
4)启动 hive
[hadoop@master ~]$ hive

到这里,hive就算是搭建成功。

  • 28
    点赞
  • 27
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值