Hadoop 分布式集群配置安装
Hadoop部署
集群部署规划
NameNode 和 SecondaryNameNode 不要安装在同一台服务器
ResourceManager 也很消耗内存,不要和 NameNode、SecondaryNameNode 配置在同一台机器上
cpu101 | cpu102 | cpu103 | |
---|---|---|---|
HDFS | NameNode | SecondaryNameNode | |
DataNode | DataNode | DataNode | |
YARN | ResourceManager | ||
NodeManager | NodeManager | NodeManager |
下载地址 :
https://archive.apache.org/dist/hadoop/common/hadoop-3.1.3/
将 hadoop-3.1.3.tar.gz
导入到 /opt/software
文件夹下面
解压安装文件到 /opt/module
下面
tar -zxvf hadoop-3.1.3.tar.gz -C /opt/module/
查看是否解压成功
将 Hadoop 添加到环境变量
打开 /etc/profile.d/my_env.sh
文件
sudo vim /etc/profile.d/my_env.sh
#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
有效
source /etc/profile
分发环境变量文件
sudo ~/bin/xsync /etc/profile.d/my_env.sh
常用端口号说明
端口名称 | Hadoop2.x | Hadoop3.x |
---|---|---|
NameNode 内部通信端口 | 8020 / 9000 | 8020 / 9000 / 9820 |
NameNode HTTP UI | 50070 | 9870 |
MapReduce 查看执行任务端口 | 8088 | 8088 |
历史服务器通信端口 | 19888 | 19888 |
配置集群
常用的配置文件
Hadoop2.x | Hadoop3.x | |
---|---|---|
核心配置文件 | core-site.xml | core-site.xml |
HDFS 配置文件 | hdfs-site.xml | hdfs-site.xml |
YARN 配置文件 | yarn-site.xml | yarn-site.xml |
MapReduce 配置文件 | mapred-site.xml | mapred-site.xml |
slaves | workers |
核心配置文件 core-site.xml
vim core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定 NameNode 的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://cpu101:8020</value>
</property>
<!-- 指定 hadoop 数据的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.1.3/data</value>
</property>
<!-- 配置 HDFS 网页登录使用的静态用户为 cpu -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>cpu</value>
</property>
<!-- 配置该cpu(superUser)允许通过代理访问的主机节点 -->
<property>
<name>hadoop.proxyuser.cpu.hosts</name>
<value>*</value>
</property>
<!-- 配置该cpu(superUser)允许通过代理用户所属组 -->
<property>
<name>hadoop.proxyuser.cpu.groups</name>
<value>*</value>
</property>
<!-- 配置该cpu(superUser)允许通过代理的用户-->
<property>
<name>hadoop.proxyuser.cpu.users</name>
<value>*</value>
</property>
</configuration>
HDFS配置文件 hdfs-site.xml
vim hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- nn web 端访问地址-->
<property>
<name>dfs.namenode.http-address</name>
<value>cpu101:9870</value>
</property>
<!-- 2nn web 端访问地址-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>cpu103:9868</value>
</property>
<!--指定 HDFS 的元信息存储目录-->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/module/hadoop-3.1.3/dfs/name</value>
</property>
<!--指定 HDFS 的数据存储目录-->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///opt/module/hadoop-3.1.3/dfs/data</value>
</property>
<!-- datanode通信通过域名 -->
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
<!-- 测试环境指定HDFS副本的数量 1 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
YARN配置文件 yarn-site.xml
vim yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<!-- 指定 MR 走 shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定 ResourceManager 的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>cpu102</value>
</property>
<!-- 环境变量的继承 -->
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<!-- 开启日志聚集功能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 设置日志聚集服务器地址 -->
<property>
<name>yarn.log.server.url</name>
<value>http://cpu102:19888/jobhistory/logs</value>
</property>
<!-- 设置日志保留时间为 7 天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<!--是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<!--
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>9000</value>
<discription>每个任务最多可用内存,默认8182MB</discription>
</property>
-->
<!--
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>3072</value>
<discription>每个任务最小可用内存</discription>
</property>
-->
<!--
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>3</value>
<discription>物理内存和虚拟内存比率</discription>
</property>
-->
</configuration>
MapReduce配置文件 mapred-site.xml
vim mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定 MapReduce 程序运行在 Yarn 上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 找到mapreduce -->
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3</value>
</property>
<!-- 找到mapreduce -->
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3</value>
</property>
<!-- 找到mapreduce -->
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3</value>
</property>
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>cpu102:10020</value>
</property>
<!-- 历史服务器 web 端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>cpu102:19888</value>
</property>
</configuration>
配置 workers
vim workers
cpu101
cpu102
cpu103
该文件中添加的内容结尾不允许有空格,文件中不允许有空行
分发Hadoop
xsync /opt/module/hadoop-3.1.3/
群起集群
如果集群是第一次启动,需要在 cpu101 节点格式化 NameNode(注意格式化之前,一定要先停止上次启动的所有 namenode 和 datanode 进程,然后再删除 data 和 log 数据)
hdfs namenode -format
Hadoop群起脚本
vim myhadoop.sh
#!/bin/bash
if [ $# -lt 1 ]
then
echo "No Args Input..."
exit ;
fi
case $1 in
"start")
echo " =================== 启动 hadoop 集群 ==================="
echo " --------------- 启动 hdfs ---------------"
ssh cpu101 "/opt/module/hadoop-3.1.3/sbin/start-dfs.sh"
echo " --------------- 启动 yarn ---------------"
ssh cpu102 "/opt/module/hadoop-3.1.3/sbin/start-yarn.sh"
echo " --------------- 启动 historyserver ---------------"
ssh cpu102 "/opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver"
;;
"stop")
echo " =================== 关闭 hadoop 集群 ==================="
echo " --------------- 关闭 historyserver ---------------"
ssh cpu102 "/opt/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver"
echo " --------------- 关闭 yarn ---------------"
ssh cpu102 "/opt/module/hadoop-3.1.3/sbin/stop-yarn.sh"
echo " --------------- 关闭 hdfs ---------------"
ssh cpu101 "/opt/module/hadoop-3.1.3/sbin/stop-dfs.sh"
;;
*)
echo "Input Args Error..."
;;
esac
权限 :
chmod 777 myhadoop.sh
启动
myhadoop.sh start
停止
myhadoop.sh stop
查看页面
Web端查看HDFS的Web页面:
Web 查看 Job
http://cpu102:19888/jobhistory
Web端查看 SecondaryNameNode
http://cpu103:9868/status.html
Web端查看 All Applications
测试 MR
创建目录
hadoop fs -mkdir /input
上传文件
hadoop fs -put word.txt /input
删除 HDFS 上已经存在的输出文件
hadoop fs -rm -r /output
执行 wordcount 程序
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input /output