hadoop 2.8.3 集群环境搭建

hadoop 2.8.3集群环境搭建

Linux : CentOS 7
hadoop版本: hadoop-2.8.3
JDK: 1.8.0_161

这里打算直接root用户,搭建3个节点的hadoop集群环境。先关闭防火墙,安装成功后,启动哪里报错再针对性开放端口。

/etc/hosts

192.168.247.129 master
192.168.247.131 slave1
192.168.247.132 slave2

配置JDK环境

hadoop-2.8.3 需要JDK1.8的环境,这里使用的是 JDK 1.8.0_161,三个系统都使用同一个版本的JDK.。配置过程略。

配置ssh免密码登录

详细可以参考上一篇文章 CentOS 7 配置ssh免密码登录

分别在三台机器执行执行以下命令

生成公钥、密码

cd /root/.ssh/
ssh-keygen -t rsa

复制公钥到其它机器

实际上把生成公钥id_rsa.pub 追加到其它机器上的 authorized_keys 末尾。

ssh-copy-id master
ssh-copy-id slave1
ssh-copy-id slave2

验证

看看首次输入密码,exit后,看看再次登录到其它机器用不用密码。
确认三台机器之间两两登录都不需要密码

ssh master
ssh slave1
ssh slave2

如果ssh公钥不生效,尝试保证下面两个条件:1、ssh目录的权限必须是700 2、 .ssh/authorized_keys文件权限必须是600

配置hadoop

这里, 我打算把hadoop程序包、及数据文件夹放到 /opt 下,在 /opt目录下创建hadoop文件夹。把安装上传(下载)到这里来,解压 hadoop-2.8.3.tar.gz。在 /opt/hadoop目录下,创建数据目录 hdfs,然后在hdfs目录下,分别创建 data、name、tmp三个目录,用于存放hadoop 文件系统的数据。

 cd /opt
 mkdir hadoop
 cd hadoop
 tar -xzvf hadoop-2.8.3.tar.gz

 mkdir hdfs
 cd hdfs
 mkdir data name tmp

目录结构信息
目录结构信息

添加HADOOP环境变量到系统

/etc/profile

set hadoop
export HADOOP_HOME=/opt/hadoop/hadoop-2.8.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

这里配置了 $HADOOP_HOME/sbin 是方便直接在命令窗口直接执行启动、停业集群的脚本,不用手动定位到sbin执行脚本。

添加JAVA_HOME到hadoop

/opt/hadoop/hadoop-2.8.3/etc/hadoop/hadoop-env.sh 末尾添加

export  JAVA_HOME=/usr/local/java/jdk1.8.0_161

貌似 /etc/profile 配置了 export JAVA_HOME=/usr/local/java/jdk1.8.0_16, hadoop-env.sh默认直接引用里面的变量JAVA_HOME

# The java implementation to use.
export JAVA_HOME=${JAVA_HOME}

配置集群情况

涉及core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、slaves五个配置文件,对应各个组件的配置。
位于 /opt/hadoop/hadoop-2.8.3/etc/hadoop/ 目录下。

文件说明
core-site.xmlCommon组件
hdfs-site.xmlHDFS组件
mapred-site.xmlMapReduce组件
yarn-site.xmlYARN组件
slavesslaves节点
core-site.xml

/opt/hadoop/hadoop-2.8.3/etc/hadoop/ core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>file:/opt/hadoop/hdfs/tmp</value>
                <description>A base for other temporary directories.</description>
        </property>
        <property>
                <name>io.file.buffer.size</name>
                <value>131072</value>
        </property>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://master:9000</value>
        </property>
        <property>
                <name>hadoop.proxyuser.root.hosts</name>
                <value>*</value>
        </property>
        <property>
                <name>hadoop.proxyuser.root.groups</name>
                <value>*</value>
        </property>
</configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:/opt/hadoop/hdfs/name</value>
                <final>true</final>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/opt/hadoop/hdfs/data</value>
                <final>true</final>
        </property>
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>master:9001</value>
        </property>
        <property>
                <name>dfs.webhdfs.enabled</name>
                <value>true</value>
        </property>
        <property>
                <name>dfs.permissions</name>
                <value>false</value>
        </property>
</configuration>
mapred-site.xml

/opt/hadoop/hadoop-2.8.3/etc/hadoop/ 目录下,MapReduce组件的配置文件原来是 mapred-site.xml.template,把它复制或重命名为 mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <!-- 通知框架MapReduce使用YARN -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
yarn-site.xml
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->


<configuration>

        <!-- Site specific YARN configuration properties -->
        <property>
                <name>yarn.resourcemanager.address</name>
                <value>master:18040</value>
        </property>
        <property>
                <name>yarn.resourcemanager.scheduler.address</name>
                <value>master:18030</value>
        </property>
        <property>
                <name>yarn.resourcemanager.webapp.address</name>
                <value>master:18088</value>
        </property>
        <property>
                <name>yarn.resourcemanager.resource-tracker.address</name>
                <value>master:18025</value>
        </property>
        <property>
                <name>yarn.resourcemanager.admin.address</name>
                <value>master:18141</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
</configuration>

yarn.nodemanager.aux-services 的默认值为 mapreduce.shuffle ,hadoop 2.x配置了默认值,启动nodemanager会报错,从节点nodemanager服务不运行。

编辑slaves文件,添加从节点信息

去掉原本的localhost,换成以下内容。配置slaves的目录,是把所有节点连在一起,构成一个相连的集群,启动时,整个集群一起启动。

slave1
slave2

运行hadoop

先格式hadoop文件系统,再启动集群。

格式化namenode

hadoop namenode -format

格式化成功的话,会在命令窗口看到

 common.Storage: Storage directory /opt/hadoop/hdfs/name has been successfully formatted.

启动hadoop

这里可以忽略,往 start-all.sh下面看。
启动过程: namenode —> datanode —> HDFS—> YARN
到${HADOOP_HOME}/sbin目录下

namenode启动命令:
./hadoop-daemon.sh start namenode

datanode启动命令:
./hadoop-daemons.sh start datanode

HDFS启动命令
./start-dfs.sh

YARN启动命令:
./start-yarn.sh

这里我用start-all.sh启动集群所有服务的脚本,不用像上面那样一一启动。 启动过程如下。
启动集群:start-all.sh

[root@master sbin]# start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-namenode-master.out
slave2: starting datanode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-datanode-slave2.out
slave1: starting datanode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-datanode-slave1.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/hadoop-2.8.3/logs/yarn-root-resourcemanager-master.out
slave2: starting nodemanager, logging to /opt/hadoop/hadoop-2.8.3/logs/yarn-root-nodemanager-slave2.out
slave1: starting nodemanager, logging to /opt/hadoop/hadoop-2.8.3/logs/yarn-root-nodemanager-slave1.out

停止集群命令 stop-all.sh

[root@master sbin]# stop-all.sh 
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master]
master: stopping namenode
slave1: stopping datanode
slave2: stopping datanode
Stopping secondary namenodes [master]
master: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
slave1: stopping nodemanager
slave2: stopping nodemanager
slave1: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
slave2: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop

我配置了 ${HADOOP_HOME}/sbin 可以直接系统全局窗口下使用sbin下所有的脚本。

各节点进程情况

  • master
[root@master sbin]# jps
5793 Jps
5512 ResourceManager
5165 NameNode
5359 SecondaryNameNode
  • slave1、slave2
[root@slave1 ~]# jps
1700 DataNode
1830 NodeManager
1961 Jps
[root@slave2 ~]# jps
1971 Jps
1798 NodeManager
1689 DataNode

logs目录

${HADOOP_HOME}/logs 目录存放相关日志信息,启动报错可以从这里查看日志,排查问题。

WEB UI界面

访问master WEB UI界面,可以看另外2个节点都正常运行。

http://master:50070/

Datanode Information

http://master:18088/cluster/nodes

cluster/nodes

  • 1
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值