环境:
版本:hadoop-3.2.4
master节点:192.168.0.107
系统 | 主机名 | ip | 线程 |
---|---|---|---|
centos7 | hmaster | 192.168.0.107 | ResourceManager,NodeManager,SecondaryNameNode,NameNode,DataNode,JobHistoryServer |
centos7 | hnode1 | 192.168.0.108 | DataNode、NodeManager |
centos7 | hnode2 | 192.168.0.109 | DataNode、NodeManager |
常用端口号
hdfs web端口号: 9870
yarn集群访问界面:8088
hadoop历史日志: 19888
1、修改主机名
# 修改每台机器的名字
hostnamectl set-hostname hmaster
hostnamectl set-hostname hnode1
hostnamectl set-hostname hnode2
# 设置hostname后每台都重启
reboot
# 配置hosts文件
vim /etc/hosts
192.168.0.107 hmaster
192.168.0.108 hnode1
192.168.0.109 hnode2
2、配置ssh免密登录
# 每台机器都执行,一路回车
[root@hmaster ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:rqnaxnC+7yLxIrVMw5TE3YDAvrH2vMTXt4aMMhBSIQg root@hmaster
The key's randomart image is:
+---[RSA 2048]----+
|Eo.+.o |
|.o= . . |
|.o . |
|.+o |
|.o= S |
| =O . .. |
|.++% .o.o. |
|. **B. =... |
| ..*B*= .. |
+----[SHA256]-----+
[root@hmaster ~]#
将hmaster公钥拷贝到三台机器(hmaster上执行)
ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.0.107
ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.0.108
ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.0.109
3、下载hadoop压缩包
https://dlcdn.apache.org/hadoop/common/hadoop-3.2.4/hadoop-3.2.4.tar.gz
将压缩包放在/usr/local/lib下,并创建需要的目录
cd /usr/local/lib
tar -zxvf hadoop-3.2.4.tar.gz
# 每台都执行
mkdir -p /usr/local/lib/hadoop-3.2.4/data/tmp
mkdir -p /usr/local/lib/hadoop-3.2.4/data/hdfs/nameNodeData
mkdir -p /usr/local/lib/hadoop-3.2.4/data/hdfs/dataNodeData
mkdir -p /usr/local/lib/hadoop-3.2.4/data/hdfs/nn/edits
mkdir -p /usr/local/lib/hadoop-3.2.4/data/hdfs/snn/nam
mkdir -p /usr/local/lib/hadoop-3.2.4/data/hdfs/nn/snn/edits
4、配置修改
需要修改的配置文件有:core-site.xml、hdfs-site.xml、yarn-site.xml、mappred-site.xml、workers、hadoop-env.sh
4.1 详细配置
core-site.xml 详细配置
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hmaster:8020</value>
<description> 设定 namenode 的 主机名 及 端口 </description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
<description> 设置缓存大小 </description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/lib/hadoop-3.2.4/data/tmp</value>
<description> 存放临时文件的目录 </description>
</property>
<!--hive链接hadoop,访问hdfs需要的配置-->
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
</configuration>
hdfs-site.xml 详细配置
<configuration>
<property>
<name>dfs.permissions.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/lib/hadoop-3.2.4/data/hdfs/nameNodeData</value>
<description> namenode 用来持续存放命名空间和交换日志的本地文件系统路径 </description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/lib/hadoop-3.2.4/data/hdfs/dataNodeData</value>
<description> DataNode 在本地存放块文件的目录列表,用逗号分隔 </description>
</property>
<property>
<name>dfs.namenode.edits.dir</name>
<value>file:/usr/local/lib/hadoop-3.2.4/data/hdfs/nn/edits</value>
</property>
<!-- secondarynamenode保存待合并的fsimage -->
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:/usr/local/lib/hadoop-3.2.4/data/hdfs/snn/name</value>
</property>
<!-- secondarynamenode保存待合并的editslog -->
<property>
<name>dfs.namenode.checkpoint.edits.dir</name>
<value>file:/usr/local/lib/hadoop-3.2.4/data/hdfs/nn/snn/edits</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<description> 设定 HDFS 存储文件的副本个数,默认为3 </description>
</property>
<!--namenode的hdfs-site.xml是必须将dfs.webhdfs.enabled属性设置为true,否则就不能使用webhdfs的LISTSTATUS、LISTFILESTATUS等需要列出文件、文件夹状态的命令,因为这些信息都是由namenode来保存的。-->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<!--hdfs数据块大小设置 134217728k 是 128 MB,hadoop3版本数据块大小默认是128M-->
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
</configuration>
yarn-site.xml详细配置
配置运行mapReduce任务时,资源分配的大小,根据自己的虚拟机每个节点的内存和cpu个数配置
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hmaster</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 如果vmem、pmem资源不够,会报错,此处将资源监察置为false -->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://hmaster:19888/usr/jacksun</value>
</property>
<!--一次申请分配内存资源的最小值-->
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>
<!--默认值为8192M,节点所在物理主机的可用物理内存总量 -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
<!--NodeManager总的可用虚拟CPU个数,根据硬件配置设定,简单可以配置为CPU超线程个数-->
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
<!--单个任务可申请的最小虚拟CPU个数,默认是1-->
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>2</value>
</property>
<!--单个任务可申请的最多虚拟CPU个数,默认是32-->
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
</property>
</configuration>
mapred-site.xml 详细配置
<configuration>
<!--指定MapReduce框架名称。默认情况下,它设置为yarn。-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.job.ubertask.enable</name>
<value>true</value>
</property>
<!--指定历史服务器的ip和端口-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hmaster:10020</value>
</property>
<!--指定历史服务器web界面的访问地址和端口-->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hmaster:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
</configuration>
workers,将三台机器都配置成了工作节点
hmaster
hnode1
hnode2
hadoop-env.sh
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export JAVA_HOME=/usr/local/lib/jdk1.8.0_333
4.2 配置分发
将配置文件拷贝到其他节点
cd /usr/local/lib/hadoop-3.2.4/etc/hadoop
scp /usr/local/lib/hadoop-3.2.4/etc/hadoop/* root@hnode1:$PWD
scp /usr/local/lib/hadoop-3.2.4/etc/hadoop/* root@hnode2:$PWD
# 配置环境变量
vim /etc/profile
export HADOOP_HOME=/usr/local/lib/hadoop-3.2.4
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_CLASSPATH=`hadoop classpath`
source /etc/profile
#初始化hdfs集群
hdfs namenode -format
5、启动集群
启动后的线程数,hmaster节点带历史日志线程
hmaster:6个
hnode1:2个
hnode2:2个
start-all.sh
#启动历史服务
mr-jobhistory-daemon.sh start historyserver
[root@hmaster ~]# jps
59451 Jps
23826 DataNode
24435 ResourceManager
56182 JobHistoryServer
24062 SecondaryNameNode
23663 NameNode
24607 NodeManager
[root@hnode1 ~]# jps
71747 Jps
58485 DataNode
58600 NodeManager
[root@hnode2 ~]# jps
58928 NodeManager
58805 DataNode
72183 Jps