本文详细介绍搭建4个节点的完全分布式Hadoop集群的方法,Linux系统版本是CentOS 7,Hadoop版本是3.2.0,JDK版本是1.8。
一、准备环境
- 在VMware workstations上创建4个Linux虚拟机,并配置其静态IP。
有关【创建Linux虚拟机及配置网络】,请参考https://www.cnblogs.com/shireenlee4testing/p/9469855.html
2. 配置DNS(每个节点)
编辑配置文件,添加主节点和从节点的映射关系。
#vim /etc/hosts
192.168.44.3 hadoop01
192.168.44.4 hadoop02
192.168.44.5 hadoop03
192.168.44.6 hadoop04
3. 关闭防火墙(每个节点)
#关闭服务
[root@hadoop01 opt]# systemctl stop firewalld
#关闭开机自启动
[root@hadoop01 opt]# systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
4. 配置免密码登录
有关【配置免密码登录方法】,请参考
https://www.cnblogs.com/shireenlee4testing/p/10366061.html
5. 配置Java环境(每个节点)
有关【配置java环境方法】,请参考
https://www.cnblogs.com/shireenlee4testing/p/10368961.html
二、搭建Hadoop完全分布式集群
在各个节点上安装与配置Hadoop的过程都基本相同,因此可以在每个节点上安装好Hadoop后,在主节点master上进行统一配置,然后通过scp 命令将修改的配置文件拷贝到各个从节点上即可。
- 下载Hadoop安装包,解压,配置Hadoop环境变量
有关【Hadoop安装包下载方法】,请参考
https://www.cnblogs.com/shireenlee4testing/p/10365692.html
本文下载的Hadoop版本是3.2.0,指定一个目录(比如:/opt),使用rz命令上传Hadoop安装包到Linux系统,解压到指定目录,配置Hadoop环境变量,并使其生效。实现命令如下:
复制代码
#解压到/opt目录
[root@hadoop01 opt]# tar -zxvf hadoop-3.2.0.tar.gz
#链接/opt/hadoop-3.2.0到/opt/hadoop,方便后续配置
[root@hadoop01 opt] #ln -s hadoop-3.2.0 hadoop
#配置Hadoop环境变量
[root@hadoop01 opt]# vim /etc/profile
#Hadoop
export HADOOP_HOME=/opt/hadoop # 该目录为解压安装目录
export PATH=
P
A
T
H
:
PATH:
PATH:HADOOP_HOME/bin
export PATH=
P
A
T
H
:
PATH:
PATH:HADOOP_HOME/sbin
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
#保存后,使profile生效
[root@hadoop01 opt]# source /etc/profile
复制代码
2. 配置Hadoop环境脚本文件中的JAVA_HOME参数
复制代码
#进入Hadoop安装目录下的etc/hadoop目录
[root@hadoop01 ~]#cd /opt/hadoop/etc/hadoop
#分别在hadoop-env.sh、mapred-env.sh、yarn-env.sh文件中添加或修改如下参数:
[root@hadoop01 hadoop]# vim hadoop-env.sh
[root@hadoop01 hadoop]# vim mapred-env.sh
[root@hadoop01 hadoop]# vim yarn-env.sh
export JAVA_HOME="/opt/jdk" # 路径为jdk安装路径
#验证Hadoop配置是否生效
[root@hadoop01 hadoop]# hadoop version
Hadoop 3.2.0
Source code repository https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
Compiled by sunilg on 2019-01-08T06:08Z
Compiled with protoc 2.5.0
From source with checksum d3f0795ed0d9dc378e2c785d3668f39
This command was run using /opt/hadoop-3.2.0/share/hadoop/common/hadoop-common-3.2.0.jar
复制代码
3. 修改Hadoop配置文件
Hadoop安装目录下的etc/hadoop目录中,需修改core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、workers文件,根据实际情况修改配置信息。
(1)core-site.xml (配置Common组件属性)
<configuration>
<property>
<!-- 配置hdfs地址 -->
<name>fs.defaultFS</name>
<value>hdfs://hadoop01:9000</value>
</property>
<property>
<!-- 保存临时文件目录,需先在/opt/hadoop下创建tmp目录 -->
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/tmp</value>
</property>
</configuration>
(2)hdfs-site.xml (配置HDFS组件属性)
<configuration>
<property>
<!-- 主节点地址 -->
<name>dfs.namenode.http-address</name>
<value>hadoop01:50070</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/dfs/data</value>
</property>
<property>
<!-- 备份数为默认值3 -->
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>配置为false后,可以允许不要检查权限就生成dfs上的文件,方便倒是方便了,但是你需要防止误删除.</description>
</property>
</configuration>
(3)mapred-site.xml (配置Map-Reduce组件属性)
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<!--设置MapReduce的运行平台为yarn-->
</property>
</configuration>
(4)yarn-site.xml(配置资源调度属性)
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<!--指定yarn的ResourceManager管理界面的地址,不配的话,Active Node始终为0-->
<value>hadoop01</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<!-- #reducer获取数据的方式-->
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>忽略虚拟内存的检查,如果你是安装在虚拟机上,这个配置很有用,配上去之后后续操作不容易出问题。</description>
</property>
</configuration>
(5)workers文件
#增加从节点地址(若配置了hosts,可直接使用主机名,亦可用IP地址)
[root@hadoop01 hadoop]# vim workers
hadoop02
hadoop03
hadoop04
4. 将配置好的文件夹拷贝到其他从节点
复制代码
[root@hadoop01 hadoop]# scp -r /opt/hadoop-3.2.0 root@hadoop02:/opt/
[root@hadoop01 hadoop]# scp -r /opt/hadoop-3.2.0 root@hadoop03:/opt/
[root@hadoop01 hadoop]# scp -r /opt/hadoop-3.2.0 root@hadoop04:/opt/
[root@hadoop01 hadoop]# scp -r /opt/hadoop root@hadoop02:/opt/
[root@hadoop01 hadoop]# scp -r /opt/hadoop root@hadoop03:/opt/
[root@hadoop01 hadoop]# scp -r /opt/hadoop root@hadoop04:/opt/
复制代码
5. 配置启动脚本,添加HDFS和Yarn权限
复制代码
添加HDFS权限:编辑如下脚本,在第二行空白位置添加HDFS权限
[root@hadoop01 sbin]# vim sbin/start-dfs.sh
[root@hadoop01 sbin]# vim sbin/stop-dfs.sh
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
复制代码
复制代码
添加Yarn权限:编辑如下脚本,在第二行空白位置添加Yarn权限
[root@hadoop01 sbin]# vim sbin/start-yarn.sh
[root@hadoop01 sbin]# vim sbin/stop-yarn.sh
YARN_RESOURCEMANAGER_USER=root
HDFS_DATANODE_SECURE_USER=yarn
YARN_NODEMANAGER_USER=root
复制代码
注意:若不添加上述权限,则会报错:缺少用户权限定义所致。
复制代码
ERROR: Attempting to launch hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting launch.
Starting datanodes
ERROR: Attempting to launch hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting launch.
Starting secondary namenodes [localhost.localdomain]
ERROR: Attempting to launch hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting launch.
复制代码
6. 初始化 & 启动
复制代码
#格式化
[root@hadoop01 hadoop-3.2.0]# bin/hdfs namenode -format
#启动(两种方式均可启动)
方法一:
[root@hadoop01 hadoop-3.2.0]# sbin/start-all.sh
方法二:
[root@hadoop01 hadoop-3.2.0]# sbin/start-dfs.sh
[root@hadoop01 hadoop-3.2.0]# sbin/start-yarn.sh
复制代码
7. 验证Hadoop启动成功
复制代码
#主节点
[root@hadoop01 sbin]# jps
11329 NameNode
11831 ResourceManager
11592 SecondaryNameNode
12186 Jps
#从节点
[root@hadoop02 hadoop]# jps
5152 SecondaryNameNode
5085 DataNode
5245 NodeManager
5357 Jps
[root@hadoop03 opt]# jps
5080 DataNode
5178 NodeManager
5278 Jps
[root@hadoop04 opt]# jps
5090 NodeManager
5190 Jps
4991 DataNode
复制代码
8. Web端口访问
注:先开放端口或直接关闭防火墙
查看防火墙状态
firewall-cmd --state
临时关闭
systemctl stop firewalld
禁止开机启动
systemctl disable firewalld
在浏览器输入:http://hadoop01:8088打开ResourceManager页面。
在浏览器输入:http://hadoop01:50070打开Hadoop Namenode页面。
注意:如果输入主节点名称无法打开Web页面,则需要配置Windows上的hosts,路径如下:
C:\Windows\System32\drivers\etc\hosts
192.168.44.3 hadooop01
192.168.44.4 hadooop02
192.168.44.5 hadooop03
192.168.44.6 hadooop04
【版权说明】
上文全出自:
https://www.cnblogs.com/shireenlee4testing/p/10472018.html
纯属抄袭!
但经本人验证、改进后,步骤可靠,良心作品,值得记录。
如下、个人部署:
Linux操作系统hadoop3.2.0部署步骤:
说明:以trs普通用户身份登录服务器178-181进行部署
【服务器178】
一、准备安装包:hadoop-3.2.0.tar.gz和jdk-8u11-linux-x64.tar.gz
安装包路径:10.50.144.178服务器上
hadoop安装包:/home/trs/Hadoop/hadoop-3.2.0.tar.gz
jdk安装包:/home/trs/Hadoop/jdk-8u11-linux-x64.tar.gz
二、安装jdk
1. 解压到路径:/home/trs/Hadoop/下,
cd到/home/trs/Hadoop/下(cd /home/trs/Hadoop/),再使用下面代码进行解压:
tar -zxvf jdk-8u11-linux-x64.tar.gz -C /home/trs/Hadoop
此时,/home/trs/Hadoop目录下生成了jdk1.8.0_11文件夹
2. 配置普通用户环境变量
运行:vim ~/.bash_profile
进入编辑该文件,内容如下:
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
export JAVA_HOME=/home/trs/Hadoop/jdk1.8.0_11
export JRE_HOME=/home/trs/Hadoop/jdk1.8.0_11/jre
export CLASSPATH=.:$CLASSPATH:${JAVA_HOME}/lib:$JRE_HOME/lib
export PATH=$JAVA_HOME/bin:$PATH:$JRE_HOME/bin
3.测试jdk是否配置成功
#刷新上述配置,使配置立即生效:
source ~/.bash_profile
再运行java、javac命令,若出现许多说明,则配置成功
三、安装单机版hadoop
1. 解压到路径:/home/trs/Hadoop/下,
cd到/home/trs/Hadoop/下(cd /home/trs/Hadoop/),再使用下面代码进行解压:
tar -zxvf hadoop-3.2.0.tar.gz -C /home/trs/Hadoop
此时,/home/trs/Hadoop目录下生成了hadoop-3.2.0文件夹
2. 配置hadoop的jdk
vi /home/trs/Hadoop/hadoop-3.2.0/etc/hadoop/hadoop-env.sh
添加如下内容:
export JAVA_HOME=/home/trs/Hadoop/jdk1.8.0_11
3.配置配置普通用户环境变量
运行:vim ~/.bash_profile
进入编辑该文件,在底部添加如下内容:
export HADOOP_HOME=/home/trs/Hadoop/hadoop-3.2.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
4.测试hadoop是否配置成功
#刷新上述配置,使配置立即生效:
source ~/.bash_profile
再运行:hadoop version
若出现版本信息,则配置成功
5. 小测试,
5.1 新建一个input文件夹,并将etc/hadoop文件夹下的所有xml文件拷贝到该文件夹中,指令如下
mkdir -p ~/resource/input
cp /home/trs/Hadoop/hadoop-3.2.0/etc/hadoop/*.xml ~/resource/input
5.2 利用mapReduce方法从input文件夹下的文件中查找符合条件的字符串,并将结果保存到output文件夹中。
/home/trs/Hadoop/hadoop-3.2.0/bin/hadoop jar /home/trs/Hadoop/hadoop-3.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar grep ~/resource/input ~/resource/output 'dfs[a-z.]+'
5.3 查看结果:
cat ~/resource/output/*
5.4 正常配置后,应是如下结果:
trs@node178:~> cat ~/resource/output/*
1 dfsadmin
1 dfs.replication
trs@node178:~>
【注释】:在配置普通用户的环境变量时,也可使用下面的方法:
##########
vim ~/.bashrc #设置用户环境参数,使得Hadoop的命令和java命令能够在根目录下使用
在文件中添加:
export JAVA_HOME=/home/trs/Hadoop/jdk1.8.0_11
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export HADOOP_HOME=/home/trs/Hadoop/hadoop-3.2.0
export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
##########
source ~/.bashrc #使修改生效
四、伪分布式hadoop安装(在三、单机版hadoop安装基础上进行)
1. 编辑/home/trs/Hadoop/hadoop-3.2.0/etc/hadoop/core-site.xml,命令:
vi /home/trs/Hadoop/hadoop-3.2.0/etc/hadoop/core-site.xml
内容修改如下:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
2. 编辑/home/trs/Hadoop/hadoop-3.2.0/etc/hadoop/hdfs-site.xml,命令:
vi /home/trs/Hadoop/hadoop-3.2.0/etc/hadoop/hdfs-site.xml
内容修改如下:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
3. 免密码连接设置:
3.1 命令:ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
效果如下:
trs@node178:~/.ssh> ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Your identification has been saved in /home/trs/.ssh/id_rsa.
Your public key has been saved in /home/trs/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:4f8+BlMzV0ELlScla7MLDoO9eKBtCsHB8oOXW28TVBQ trs@node178
The key's randomart image is:
+---[RSA 2048]----+
| .E. .+=+|
| . . o++|
| . o o ++.|
| = o o + +..o |
| . B . S +..+. |
| . = + =o= . . |
| o . B +o. . |
| . + o .o |
| . oo. |
+----[SHA256]-----+
trs@node178:~/.ssh>
3.2 命令:cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
3.3 命令:chmod 0600 ~/.ssh/authorized_keys
3.4 命令:ssh node178
3.5 正常结果:
trs@node178:~/.ssh> ssh node178
Last login: Wed May 22 10:01:42 2019 from 127.0.0.1
trs@node178:~>
4. 文件系统格式化
4.1 运行命令:/home/trs/Hadoop/hadoop-3.2.0/bin/hdfs namenode -format
4.2 运行命令:/home/trs/Hadoop/hadoop-3.2.0/sbin/start-dfs.sh
4.3 运行命令:jps
4.4 正常结果:
trs@node178:~> /home/trs/Hadoop/hadoop-3.2.0/sbin/start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [node178]
trs@node178:~> jps
17344 DataNode
17649 SecondaryNameNode
17873 Jps
17178 NameNode
trs@node178:~>
4.5 访问NameNode 网站:
http://localhost:9870/
5. 生成执行MapReduce任务的目录,同样的用前文单机hadoop测试的方法。
5.1 命令:/home/trs/Hadoop/hadoop-3.2.0/bin/hdfs dfs -mkdir -p /user/demo/input
说明:用hdfs命令在hadoop内部文件系统dfs里的根目录下创建/user目录
5.2 命令:hdfs dfs -put ~/resource/input/*.xml /user/demo/input
说明:将本地~/resource/input/*.xml文件上传到hdfs文件系统的/user/demo/input路径下
5.3 命令:/home/trs/Hadoop/hadoop-3.2.0/bin/hadoop jar /home/trs/Hadoop/hadoop-3.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar grep /user/demo/input output 'dfs[a-z.]+'
说明:从dfs的/user/demo/input路径的文件里,找出dfs开头的文件,并将结果放到dfs里面的output目录下
5.4 将输出文件从分布式文件系统拷贝到本地文件系统查看
命令:hdfs dfs -get output output
说明:把hdfs的output文件夹拷贝到本地当前路径下,新建一个output文件夹内
5.5 查看输出文件:
命令:cat output/*
5.6 正常结果:
trs@node178:~/resource/output> cat output/*
1 dfsadmin
1 dfs.replication
trs@node178:~/resource/output>
5.7 在分布式文件系统上查看输出文件
命令:hdfs dfs -cat output/*
正常结果:
trs@node178:~/resource/output> hdfs dfs -cat output/*
1 dfsadmin
1 dfs.replication
trs@node178:~/resource/output>
6. 完成操作,停止守护进程
命令:/home/trs/Hadoop/hadoop-3.2.0/sbin/stop-dfs.sh
7. 关于yarn,待续……
五、hadoop完全分布式集群搭建
在各个节点上安装与配置Hadoop的过程都基本相同,因此可以在每个节点上安装好Hadoop后,在主节点master上进行统一配置,
然后通过scp 命令将修改的配置文件拷贝到各个从节点上即可。
1. 配置Hadoop环境脚本文件中的JAVA_HOME参数,路径:/home/trs/Hadoop/hadoop-3.2.0/etc/hadoop
1.1 分别在hadoop-env.sh、mapred-env.sh、yarn-env.sh文件中添加或修改如下参数:
export JAVA_HOME=/home/trs/Hadoop/jdk1.8.0_11
1.2 验证Hadoop配置是否生效
命令:hadoop version
正常结果:
trs@node178:~/Hadoop/hadoop-3.2.0/etc/hadoop> hadoop version
Hadoop 3.2.0
Source code repository https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
Compiled by sunilg on 2019-01-08T06:08Z
Compiled with protoc 2.5.0
From source with checksum d3f0795ed0d9dc378e2c785d3668f39
This command was run using /home/trs/Hadoop/hadoop-3.2.0/share/hadoop/common/hadoop-common-3.2.0.jar
trs@node178:~/Hadoop/hadoop-3.2.0/etc/hadoop>
2. 进一步修改Hadoop配置文件
Hadoop安装目录下的etc/hadoop目录中,需修改core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、workers文件,根据实际情况修改配置信息。
2.1 core-site.xml文件
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://node178:9000</value>
</property>
<property>
<!-- 保存临时文件目录,需先在/home/trs/Hadoop/hadoop-3.2.0/下创建tmp目录 -->
<name>hadoop.tmp.dir</name>
<value>/home/trs/Hadoop/hadoop-3.2.0/tmp</value>
</property>
</configuration>
2.2 hdfs-site.xml文件
<configuration>
<property>
<!-- 主节点地址 -->
<name>dfs.namenode.http-address</name>
<value>node178:50070</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/trs/Hadoop/hadoop-3.2.0/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/trs/Hadoop/hadoop-3.2.0/dfs/data</value>
</property>
<property>
<!-- 备份数为默认值3 -->
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>配置为false后,可以允许不要检查权限就生成dfs上的文件,方便倒是方便了,但是你需要防止误删除.</description>
</property>
</configuration>
2.3 mapred-site.xml文件
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<!--#设置MapReduce的运行平台为yarn-->
</property>
</configuration>
2.4 yarn-site.xml文件
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name> #指定yarn的ResourceManager管理界面的地址,不配的话,Active Node始终为0
<value>node178</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name> #reducer获取数据的方式
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>忽略虚拟内存的检查,如果你是安装在虚拟机上,这个配置很有用,配上去之后后续操作不容易出问题。</description>
</property>
</configuration>
2.5 workers文件,内容如下:
node179
node180
node181
3.将配置好的文件夹拷贝到其他从节点【3.0 先配置各机器免密登录ssh,3.1 以及master(node178)免密登录slave1(node179)、slave2(node180)、slave3(node181)】
3.1 node178免密码登录node179、node180、node181
a. node179 设计
将authorized_keys文件拷到node179节点,并将该节点(node179节点)的ssh密钥id_rsa.pub加入该文件中。
a.1 命令:scp ~/.ssh/authorized_keys trs@node179:/home/trs/.ssh/
a.2 正常效果:
trs@node178:~> scp ~/.ssh/authorized_keys trs@node179:/home/trs/.ssh/
Password:
authorized_keys 100% 393 0.4KB/s
trs@node178:~>
a.3 转往node179完成以下操作:将node179的id_rsa.pub追加到authorized_keys
trs@node179:~/.ssh> ls
authorized_keys id_rsa id_rsa.pub known_hosts
trs@node179:~/.ssh> cat id_rsa.pub >> authorized_keys
trs@node179:~/.ssh>
a.4 回到node178,验证是否免密登录
trs@node178:~> ssh node179
Last login: Thu May 23 10:25:36 2019 from 10.50.144.179
trs@node179:~> exit
logout
Connection to node179 closed.
trs@node178:~>
a.5 免密连接设置成功!
b. node180 设计
将authorized_keys文件拷到node180节点,并将该节点(node180节点)的ssh密钥id_rsa.pub加入该文件中。
b.1 命令:scp ~/.ssh/authorized_keys trs@node180:/home/trs/.ssh/
b.2 正常效果:
trs@node178:~> scp ~/.ssh/authorized_keys trs@node180:/home/trs/.ssh/
Password:
authorized_keys 100% 393 0.4KB/s
trs@node178:~>
b.3 转往node180完成以下操作:将node180的id_rsa.pub追加到authorized_keys
trs@node180:~/.ssh> ls
authorized_keys id_rsa id_rsa.pub known_hosts
trs@node180:~/.ssh> cat id_rsa.pub >> authorized_keys
trs@node180:~/.ssh>
b.4 回到node178,验证是否免密登录
trs@node178:~> ssh node180
Last login: Thu May 23 10:25:36 2019 from 10.50.144.180
trs@node180:~> exit
logout
Connection to node180 closed.
trs@node178:~>
b.5 免密连接设置成功!
c. node181 设计
将authorized_keys文件拷到node179节点,并将该节点(node179节点)的ssh密钥id_rsa.pub加入该文件中。
c.1 命令:scp ~/.ssh/authorized_keys trs@node181:/home/trs/.ssh/
c.2 正常效果:
trs@node178:~> scp ~/.ssh/authorized_keys trs@node181:/home/trs/.ssh/
Password:
authorized_keys 100% 393 0.4KB/s
trs@node178:~>
c.3 转往node181完成以下操作:将node181的id_rsa.pub追加到authorized_keys
trs@node181:~/.ssh> ls
authorized_keys id_rsa id_rsa.pub known_hosts
trs@node181:~/.ssh> cat id_rsa.pub >> authorized_keys
trs@node181:~/.ssh>
c.4 回到node178,验证是否免密登录
trs@node178:~> ssh node181
Last login: Thu May 23 10:25:36 2019 from 10.50.144.181
trs@node181:~> exit
logout
Connection to node181 closed.
trs@node178:~>
c.5 免密连接设置成功!
3.2 将配置好的文件夹拷贝到node179、node180、node181
a.命令:scp -r /home/trs/Hadoop trs@node179:/home/trs/
说明:将整个Hadoop目录拷贝到node179的trs用户目录下,包括子目录、文件
b.命令:scp -r /home/trs/Hadoop trs@node180:/home/trs/
说明:将整个Hadoop目录拷贝到node180的trs用户目录下,包括子目录、文件
c.命令:scp -r /home/trs/Hadoop trs@node181:/home/trs/
说明:将整个Hadoop目录拷贝到node181的trs用户目录下,包括子目录、文件
3.3 转去node179、node180、node181分别配置普通用户环境变量,详情看【node179、node180、node181部分】
4. 运行,测试
4.1 命令:hdfs namenode -format
说明:格式化,将会在临时文件夹/home/trs/Hadoop/hadoop-3.2.0/tmp下创建临时文件,避免每次都需要格式化
4.2 命令:/home/trs/Hadoop/hadoop-3.2.0/sbin/start-all.sh
说明:启动hadoop的dfs、yarn,相当于start-dfs.sh、start-yarn.sh分别启动
4.3 验证Hadoop启动成功
a. 主节点node178
命令:jps
效果如下:
trs@node178:~/Hadoop/hadoop-3.2.0/sbin> ./start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as trs in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [node178]
Starting resourcemanager
Starting nodemanagers
trs@node178:~/Hadoop/hadoop-3.2.0/sbin> jps
7028 ResourceManager
6663 SecondaryNameNode
6347 NameNode
7420 Jps
trs@node178:~/Hadoop/hadoop-3.2.0/sbin>
b. 从节点node179
命令:jps
效果如下:
trs@node179:~> jps
838 DataNode
1286 Jps
1049 NodeManager
trs@node179:~>
c. 从节点node180
命令:jps
效果如下:
trs@node180:~> jps
8689 DataNode
8868 NodeManager
9083 Jps
trs@node180:~>
d. 从节点node181
命令:jps
效果如下:
trs@node181:~> jps
11765 DataNode
11944 NodeManager
12203 Jps
trs@node181:~>
4.4 部署成功!说明:
【注意点】:a. 只需要在主节点上使用start-all.sh命令就可以启动hadoop,关闭使用stop-all.sh命令;
b. 关于改进上述配置,sbin路径加入到普通用户环境变量下,方便使用
c. 关于具体功能,再详细改进,后续给出说明
d. 待续……
六、交付测试,所需文件wc.jar a.txt b.txt
其中,wc.jar在/home/trs/resource路径下;a.txt和b.txt 在/home/trs/resource/test路径下
1、hadoop fs -put /home/trs/resource/test /testwc1
2、hadoop jar wc.jar com.trs.hadoop.mr.black.WCRunner /testwc1 /testwc1r
3、hadoop fs -cat /testwc1r/part-r-00000
|grep -v 4$ > /home/trs/vdb20181111/vdbids.txt
【路径:/home/trs/Hadoop/hadoop-3.2.0】
【服务器179】
一、
1. ssh免密码连接设置:
1.1 命令:ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
效果如下:
trs@node179:~/.ssh> ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Your identification has been saved in /home/trs/.ssh/id_rsa.
Your public key has been saved in /home/trs/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:6JOV5JlzLx/N25EGsWDjDV+ZUlRpk8N+D6sYA7T8zEk trs@node179
The key's randomart image is:
+---[RSA 2048]----+
| oo=|
| . .Bo|
| o.. = oo+o|
| +++E * *..|
| . S*.o = oo|
| . o oB. oo o|
| + .+o.o+ |
| . .o...o.|
| . . .|
+----[SHA256]-----+
trs@node179:~/.ssh>
1.2 命令:cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
1.3 命令:chmod 0600 ~/.ssh/authorized_keys
1.4 命令:ssh node179
1.5 正常结果:
trs@node179:~> cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
trs@node179:~> chmod 0600 ~/.ssh/authorized_keys
trs@node179:~> ssh node179
The authenticity of host 'node179 (10.50.144.179)' can't be established.
ECDSA key fingerprint is SHA256:5IMaWPLLwYck4zMXQFt4kU3JrGtdSdmwYSprlxGC2M8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node179,10.50.144.179' (ECDSA) to the list of known hosts.
Last login: Thu May 23 10:23:28 2019 from 10.75.12.2
trs@node179:~> exit
logout
Connection to node179 closed.
trs@node179:~> ssh node179
Last login: Thu May 23 10:25:22 2019 from 10.50.144.179
trs@node179:~>
2. 配置普通用户环境变量
2.1
##########
vim ~/.bashrc #设置用户环境参数,使得Hadoop的命令和java命令能够在根目录下使用
在文件中添加:
export JAVA_HOME=/home/trs/Hadoop/jdk1.8.0_11
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export HADOOP_HOME=/home/trs/Hadoop/hadoop-3.2.0
export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
##########
2.2
source ~/.bashrc #使修改生效
2.3
正常效果:
trs@node179:~> java -version
java version "1.8.0_11"
Java(TM) SE Runtime Environment (build 1.8.0_11-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode)
trs@node179:~> hadoop version
Hadoop 3.2.0
Source code repository https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
Compiled by sunilg on 2019-01-08T06:08Z
Compiled with protoc 2.5.0
From source with checksum d3f0795ed0d9dc378e2c785d3668f39
This command was run using /home/trs/Hadoop/hadoop-3.2.0/share/hadoop/common/hadoop-common-3.2.0.jar
trs@node179:~>
【服务器180】
一 、
1. ssh免密码连接设置:
1.1 命令:ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
效果如下:
trs@node180:~> ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Your identification has been saved in /home/trs/.ssh/id_rsa.
Your public key has been saved in /home/trs/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:VQMa30cayaBwe0gnhN08wE5W0ib5eTU90A4ucZKsubU trs@node180
The key's randomart image is:
+---[RSA 2048]----+
| .=OB*+=o+. |
| .+B@*BoBoo.|
| +=+*oB.+..|
| .=oo.o . |
| S o.o |
| . E |
| |
| |
| |
+----[SHA256]-----+
trs@node180:~>
1.2 命令:cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
1.3 命令:chmod 0600 ~/.ssh/authorized_keys
1.4 命令:ssh node180
1.5 正常结果:
trs@node180:~> cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
trs@node180:~> chmod 0600 ~/.ssh/authorized_keys
trs@node180:~> ssh node180
Last login: Thu May 23 10:35:00 2019 from 10.50.144.180
trs@node180:~>
2. 配置普通用户环境变量
2.1
##########
vim ~/.bashrc #设置用户环境参数,使得Hadoop的命令和java命令能够在根目录下使用
在文件中添加:
export JAVA_HOME=/home/trs/Hadoop/jdk1.8.0_11
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export HADOOP_HOME=/home/trs/Hadoop/hadoop-3.2.0
export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
##########
2.2
source ~/.bashrc #使修改生效
2.3
正常效果:
trs@node180:~> java -version
java version "1.8.0_11"
Java(TM) SE Runtime Environment (build 1.8.0_11-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode)
trs@node180:~> hadoop version
Hadoop 3.2.0
Source code repository https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
Compiled by sunilg on 2019-01-08T06:08Z
Compiled with protoc 2.5.0
From source with checksum d3f0795ed0d9dc378e2c785d3668f39
This command was run using /home/trs/Hadoop/hadoop-3.2.0/share/hadoop/common/hadoop-common-3.2.0.jar
trs@node180:~>
【服务器181】
一 、
1. ssh免密码连接设置:
1.1 命令:ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
效果如下:
trs@node181:~> ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Your identification has been saved in /home/trs/.ssh/id_rsa.
Your public key has been saved in /home/trs/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:cECieGX7F/xaMlcsvezNkoBu5XFq24Lj4bDnYRaz/CM trs@node181
The key's randomart image is:
+---[RSA 2048]----+
| +.o |
| . + o o o |
|. o . . + . + |
| . . o + + . |
| . S B + |
| + % * + |
| . @.+ + o |
| BE=oo . |
| .++ooo. |
+----[SHA256]-----+
trs@node181:~>
1.2 命令:cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
1.3 命令:chmod 0600 ~/.ssh/authorized_keys
1.4 命令:ssh node181
1.5 正常结果:
trs@node181:~> cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
trs@node181:~> chmod 0600 ~/.ssh/authorized_keys
trs@node181:~> ssh node181
Last login: Thu May 23 10:42:31 2019 from 10.50.144.181
trs@node181:~>
2. 配置普通用户环境变量
2.1
##########
vim ~/.bashrc #设置用户环境参数,使得Hadoop的命令和java命令能够在根目录下使用
在文件中添加:
export JAVA_HOME=/home/trs/Hadoop/jdk1.8.0_11
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export HADOOP_HOME=/home/trs/Hadoop/hadoop-3.2.0
export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
##########
2.2
source ~/.bashrc #使修改生效
2.3
正常效果:
trs@node181:~> java -version
java version "1.8.0_11"
Java(TM) SE Runtime Environment (build 1.8.0_11-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode)
trs@node181:~> hadoop version
Hadoop 3.2.0
Source code repository https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
Compiled by sunilg on 2019-01-08T06:08Z
Compiled with protoc 2.5.0
From source with checksum d3f0795ed0d9dc378e2c785d3668f39
This command was run using /home/trs/Hadoop/hadoop-3.2.0/share/hadoop/common/hadoop-common-3.2.0.jar
trs@node181:~>