前言
本文搭建了一个由三节点(master、slave1、slave2)构成的Hadoop完全分布式集群,集群三个节点基于三台虚拟机进行搭建,节点安装的操作系统为Centos7,Hadoop版本选取为2.6.0。
环境准备
master:192.168.233.137
slave1:192.168.233.133
slave2:192.168.233.138
下载hadoop安装包 官网链接
- 集群ssh免秘登录
- hosts文件解析
[root@master hadoop]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.233.137 master
192.168.233.138 slave2
192.168.233.133 slave1
- hostname修改
- centos7 可用hostnamectl命令
[root@master /]# hostnamectl set-hostname master
- java环境安装
- jdk安装略
- java_home环境变量配置
- 关闭防火墙和selinux
[root@master /]# systemctl stop firewalld
[root@master /]# systemctl disable firewalld
[root@master /]# setenforce 0
[root@master /]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
- 集群时间同步,ntpdate
yum install -y ntpdate
echo '0 5 * * * /usr/sbin/ntpdate -u ntp.api.bz ' >> /etc/crontab
开始安装Hadoop
新建hadoop工作目录
[root@master /]# mkdir /data && cd /data
[root@master data]# tar xf hadoop-2.6.0.tar.gz
[root@master data]# mv hadoop-2.6.0 hadoop
在/etc/profile中添加环境变量
export JAVA_HOME=/opt/jdk1.8.0_112
export PATH=$JAVA_HOME/bin:$PATH
export HADOOP_HOME=/data/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
更改hadoop配置文件
配置 core-site.xml 添加一下内容 hadoop-env.sh
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/data/tmp</value>
</property>
fs.default.name:为NameNode的地址
hadoop.tmp.dir:是hadoop临时文件存储地址,默认是存放在tmp下,关机之后会清空
配置hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/hadoop/data/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/hadoop/data/tmp/dfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
dfs.replication:是存放几个副本,本次是两个
配置mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
配置yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
配置hadoop-env.sh
export JAVA_HOME=/opt/jdk1.8.0_112
export HADOOP_PREFIX=/data/hadoop/hadoop-2.6.0
同步hadoop目录到两台从机
scp -r /data slave1:/
scp -r /data slave2:/
然后在master主机上执行格式化操作及启动Hadoop集群
[root@master data]# hadoop namenode -format
[root@master data]# start-all.sh
[root@master data]# jps
在从机用jps命令查看启动是否成功