集群规划
Host Name | IP Address | Node Type |
---|---|---|
master.hadoop | 192.168.136.10 | |
slave1.hadoop | 192.168.136.11 | |
slave2.hadoop | 192.168.136.12 |
环境准备
- 安装master虚拟机
- 配置NAT网络
#vi /etc/sysconfig/network-scripts/ifcfg-ens33
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens33
UUID=d051a32b-c399-4f06-8771-601b78d58a74
DEVICE=ens33
ONBOOT=yes
IPADDR=192.168.136.10
NETMASK=255.255.255.0
GETWAY=192.168.136.2
// TODO
虚拟机没法连接外网,宿主机没法ping通NAT网关
- 配置hosts
#vi /etc/hosts
192.168.136.10 master.hadoop
192.168.136.11 slave1.hadoop
192.168.136.12 slave2.hadoop
- 关闭防火墙
#systemctl stop firewalld.service // 关闭防火墙(重启会开启)
#systemctl disable firewalld.service // 禁用防火墙(重启也不开启)
#firewall-cmd --state // 查看防火墙状态
- 安装JDK8到/usr/local/jdk1.8.0_201
#vi /etc/profile
export JAVA_HOME=/usr/local/jdk1.8.0_201
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
hadoop
安装hadoop到/usr/local/hadoop-2.9.2
配置
- $HADOOP_HOME/etc/hadoop/core-site.xml
<!-- The name of the default file system -->
<property>
<name>fs.default.name</name>
<value>hdfs://master.hadoop:9000</value>
</property>
<!-- A base for other temporary directories -->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/tmp/hadoop</value>
</property>
<!-- The size of buffer for use in sequence files -->
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
- $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop-2.9.2/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop-2.9.2/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master.hadoop:50090</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
- $HADOOP_HOME/etc/hadoop/mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>master.hadoop:50030</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master.hadoop:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master.hadoop:19888</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>http://master.hadoop:9001</value>
</property>
- $HADOOP_HOME/etc/hadoop/yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master.hadoop:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master.hadoop:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master.hadoop:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master.hadoop:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master.hadoop:8088</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master.hadoop</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
- $HADOOP_HOME/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/home/java/jdk1.8.0_11
- $HADOOP_HOME/etc/hadoop/yarn-env.sh
export JAVA_HOME=/home/java/jdk1.8.0_11
- $HADOOP_HOME/etc/hadoop/slaves
slave1.hadoop
slave2.hadoop
集群
克隆两份虚拟机slave1.hadoop、slave2.hadoop,更改网络配置
配置SSH免登录
- 在master机器上输入 ssh-keygen -t dsa -P ‘’ -f ~/.ssh/id_dsa 创建一个无密码的公钥,-t是类型的意思,dsa是生成的密钥类型,-P是密码,’’表示无密码,-f后是秘钥生成后保存的位置
- 在master机器上输入 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys 将公钥id_dsa.pub添加进keys,这样就可以实现无密登陆ssh
- 在master机器上输入 ssh master 测试免密码登陆
- 在slave1.hadoop、slave2.hadoop主机上执行 mkdir ~/.ssh
- 在master机器上输入 scp ~/.ssh/authorized_keys root@slave1.hadoop:~/.ssh/authorized_keys 将主节点的公钥信息导入slave1.hadoop节点
- 在master机器上输入 scp ~/.ssh/authorized_keys root@slave2.hadoop:~/.ssh/authorized_keys 将主节点的公钥信息导入slave2.hadoop节点,导入时要输入一下slave2.hadoop机器的登陆密码
- 在三台机器上分别执行 chmod 600 ~/.ssh/authorized_keys 赋予密钥文件权限
- 在master节点上分别输入 ssh slave1.hadoop和 ssh slave2.hadoop测试是否配置ssh成功
运行
初始化
#$HADOOP_HOME/bin/hdfs namenode -format
启动
- 在master.hadoop启动并查看java进程
#$HADOOP_HOME/sbin/start-all.sh
#jps
2996 SecondaryNameNode
6117 Jps
3163 ResourceManager
2796 NameNode
- 在slave1.hadoop/slave2.hadoop上查看java进程
#jps
2256 NodeManager
2137 DataNode
3164 Jps
关闭
#$HADOOP_HOME/sbin/stop-all.sh
参考资料
[1]: centos7 下搭建hadoop2.9 分布式集群
[2]: CentOS7.5下搭建Hadoop2.9.1完全分布式集群