集群如下:
192.168.188.111 master
192.168.188.112 slave1
192.168.188.113 slave2
一、环境配置
1.修改hosts和hostname
以master为例:
修改hosts
[root@master ~]# vim /etc/hosts
192.168.188.111 master
192.168.188.112 slave1
192.168.188.113 slave2
修改hostname[root@master ~]# vim /etc/hostname
同样地,在slave1和slave2做相同的hostname操作,分别命名为slave1和slave2.然后分别把slave1和slave2的hosts文件更改为和master一样。
2.配免密登录
次文章重点不在配免密登录,所有略,可以看其他博客。
3.配置环境变量
[root@master ~]# vim /etc/profile
#java
export JAVA_HOME=/root/package/jdk1.8.0_121
export PATH=$PATH:$JAVA_HOME/bin
#spark
export SPARK_HOME=/root/package/spark-2.1.0-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin
#ANACONDA
export ANACONDA=/root/anaconda2
export PATH=$PATH:$ANACONDA/bin
#HADOOP
export HADOOP_HOME=/root/package/hadoop-2.7.3
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_INSTALL=$HADOOP_HOME
输入 source /etc/profile 使配置文件生效。
[root@master ~]# source /etc/profile
查看配置是否成功:
输入 java -version
出现以上信息则Java配置成功。
二、hadoop配置
1.master配置
首先,安装hadoop-2.7.3,我是直接在要安装的目录下解压,所有在tar -zxvf 后边没有输入其他的路径。
[root@master package]# tar -zxvf hadoop-2.7.3
2.hadoop-env.sh配置
hadoop-2.7.3 的配置文件都在 /root/package/hadoop-2.7.3/etc/hadoop 下
/root/package/hadoop-2.7.3/etc/hadoop
[root@master hadoop]# vim hadoop-env.sh
修改JAVA_HOME值
# The java implementation to use.
export JAVA_HOME=/root/package/jdk1.8.0_121
3.yarn-env.sh配置
[root@master hadoop]# vim yarn-env.sh
# some Java parameters
export JAVA_HOME=/root/package/jdk1.8.0_121
4.修改slaves
[root@master hadoop]# vim slaves
将内容修改为
slave1
slave2
5.core-site.xml配置
<configuration>
<!-- 指定hdfs的nameservice为ns1 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.188.111:9000</value>
</property>
<!-- Size of read/write buffer used in SequenceFiles. -->
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<!-- 指定hadoop临时目录,自行创建 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>
</configuration>
6.hdfs-site.xml配置
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>192.168.188.111:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/root/hadoop/hdfs/namenode/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/root/hadoop/hdfs/datanode/dfs/data</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
7.配置mapred-site.xml
先复制再修改
[root@master hadoop]# cp mapred-site.xml.template mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>192.168.188.111:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>192.168.188.111:19888</value>
</property>
</configuration>
8.yarn-site.xml配置
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
9.然后把在master的配置拷贝到slave1和slave2节点上
scp -r hadoop-2.7.3 root@192.168.188.112:/root/package
三、启动hadoop
1.格式化命令。因为已经配置了hadoop的环境变量,所以不一定非要在hadoop的安装目录输入格式化命令。
[root@master sbin]# hdfs namenode -format
2.启动
[root@master sbin]# start-all.sh
四、用jps查看结果
启动后分别在master和slave1和slave2下查看进程。
master如下:
slave1如下:
slave2如下:
则表示成功。
五、界面查看验证
输入http://192.168.188.111:8088/
输入http://192.168.188.111:50070/
到此,hadoop-2.7.3完全分布式集群搭建成功。
六、关于nodes显示节点数量问题
6.1slaves配置中加入master
有人问在http://192.168.188.111:8088/ 查看nodes时没有显示 namenode节点,只有slave1和slave2节点,这是因为在配置slaves节点的时候只加入了slave1和slave2,没有加入master,加入master就会显示3个节点,包括了namenode节点,如下图所示:
slaves的路径为:
/root/package/hadoop-2.7.3/etc/hadoop
下面有slaves文件
slaves内容如下:
先关掉所有任务,然后重启服务。
[root@master sbin]# stop-all.sh
结果如下:[root@master sbin]# stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
17/04/26 13:12:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [master]
master: stopping namenode
master: no datanode to stop
slave2: stopping datanode
slave1: stopping datanode
Stopping secondary namenodes [master]
master: stopping secondarynamenode
17/04/26 13:12:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
stopping yarn daemons
stopping resourcemanager
slave1: stopping nodemanager
master: no nodemanager to stop
slave2: stopping nodemanager
no proxyserver to stop
然后重启所有服务:
[root@master sbin]# start-all.sh
结果如下所示:
[root@master sbin]# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
17/04/26 13:13:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [master]
master: starting namenode, logging to /root/package/hadoop-2.7.3/logs/hadoop-root-namenode-master.out
master: starting datanode, logging to /root/package/hadoop-2.7.3/logs/hadoop-root-datanode-master.out
slave2: starting datanode, logging to /root/package/hadoop-2.7.3/logs/hadoop-root-datanode-slave2.out
slave1: starting datanode, logging to /root/package/hadoop-2.7.3/logs/hadoop-root-datanode-slave1.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /root/package/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-master.out
17/04/26 13:13:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /root/package/hadoop-2.7.3/logs/yarn-root-resourcemanager-master.out
master: starting nodemanager, logging to /root/package/hadoop-2.7.3/logs/yarn-root-nodemanager-master.out
slave1: starting nodemanager, logging to /root/package/hadoop-2.7.3/logs/yarn-root-nodemanager-slave1.out
slave2: starting nodemanager, logging to /root/package/hadoop-2.7.3/logs/yarn-root-nodemanager-slave2.out
[root@master sbin]#
注意:此处多了
master: starting datanode, logging to /root/package/hadoop-2.7.3/logs/hadoop-root-datanode-master.out
以前master只有
master: starting namenode, logging to /root/package/hadoop-2.7.3/logs/hadoop-root-namenode-master.out
因为我刚在slaves文件里加了master
截屏如下:注意stop后服务后重启服务,圈红的master信息,此时master有两个身份,一个是namenode,一个是datanode。
6.2slaves配置中不加入master
此时slaves只有slave1和slave2,查看输出信息:
[root@master sbin]# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
17/04/26 13:36:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [master]
master: starting namenode, logging to /root/package/hadoop-2.7.3/logs/hadoop-root-namenode-master.out
slave2: starting datanode, logging to /root/package/hadoop-2.7.3/logs/hadoop-root-datanode-slave2.out
slave1: starting datanode, logging to /root/package/hadoop-2.7.3/logs/hadoop-root-datanode-slave1.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /root/package/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-master.out
17/04/26 13:36:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /root/package/hadoop-2.7.3/logs/yarn-root-resourcemanager-master.out
slave2: starting nodemanager, logging to /root/package/hadoop-2.7.3/logs/yarn-root-nodemanager-slave2.out
slave1: starting nodemanager, logging to /root/package/hadoop-2.7.3/logs/yarn-root-nodemanager-slave1.out
[root@master sbin]#
此时master只显示namenode,没有datanode。