(一)hadoop集群搭建——1.3hadoop安装与配置

1.jdk安装

在机器h2上传jdk安装包到/opt/soft目录下,然后解压到/opt/module目录中。

[hzhao@h2 ~]$ tar -zxvf /opt/soft/jdk-8u121-linux-x64.tar.gz /opt/module/

2.hadoop安装

在机器h2上传hadoop安装包到/opt/soft目录下,然后解压到/opt/module目录中。

[hzhao@h2 ~]$ tar -zxvf /opt/soft/hadoop-2.7.2.tar.gz /opt/module/

3.hadoop配置

  1. 配置hadoop的core-site文件
[hzhao@h2 ~]$ vim /opt/module/hadoop-2.7.2/etc/hadoop/core-site.xml

配置文件内容为

<configuration>
		<!--指定HDFS中namenode的地址-->
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://h1:9000</value>
        </property>
        <!--指定Hadoop运行时产生文件的存储目录-->
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/opt/module/hadoop-2.7.2/data/tmp</value>
        </property>
</configuration>
  1. 配置hadoop的hdfs-site文件
[hzhao@h2 ~]$ vim /opt/module/hadoop-2.7.2/etc/hadoop/hdfs-site.xml

配置文件内容为

<configuration>
        <!--指定Hadoop辅助名称节点主机配置-->
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>h3:50090</value>
        </property>
</configuration>
  1. 配置hadoop的yarn-site文件
[hzhao@h2 ~]$ vim /opt/module/hadoop-2.7.2/etc/hadoop/yarn-site.xml

配置文件内容为

<configuration>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <!--指定YARN的resourceManager的地址-->
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>h2</value>
        </property>
</configuration>
  1. 配置hadoop的mapred-site文件
[hzhao@h2 ~]$ mv /opt/module/hadoop-2.7.2/etc/hadoop/mapred-site.xml.template /opt/module/hadoop-2.7.2/etc/hadoop/mapred-site.xml
[hzhao@h2 ~]$ vim /opt/module/hadoop-2.7.2/etc/hadoop/mapred-site.xml

配置文件内容为

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>

上述配置完成后使用xsync命令将hadoop和java分发到机器h1和h3。

[hzhao@h2 ~]$ xsync /opt/module/hadoop-2.7.2/
[hzhao@h2 ~]$ xsync /opt/module/jdk1.8.0_121/

分发完成后配置三台机器上hadoop和jdk的环境变量。

[hzhao@h2 ~]$ sudo vim /etc/profile

在最后面添加如下

JAVA_HOME=/opt/module/jdk1.8.0_121
HADOOP_HOME=/opt/module/hadoop-2.7.2

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
export PATH JAVA_HOME HADOOP_HOME
[hzhao@h2 ~]$ source /etc/profile
[hzhao@h2 ~]$ hdfs -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

看到如上版本至此hadoop安装完成。

4.修改~/.bashrc文件

分别在h1、h2、h3上执行命令并在文件最后加上如下内容。

[hzhao@h1 ~]$ vim ~/.bashrc
source /etc/profile

5.编写群起脚本

[hzhao@h2 ~]$ vim ~/bin/xcall
#!/bin/bash
#在集群的所有机器上批量执行同一条命令
if(($#==0))
then
        echo 请输入您要操作的命令!
        exit;
fi

echo 要执行的命令是$*
#循环执行此命令
for((i=1;i<=3;i++))
do
        echo ----------------h$i---------------
        ssh h$i $*
done
[hzhao@h2 ~]$ chmod 755 ~/bin/xcall

6.启动HDFS

  1. 首先需要namenode所配置的节点上进行格式化,这里在h1上执行。
[hzhao@h1 ~]$ hadoop namenode -format

看到如下表示格式化成功

20/12/18 09:33:11 INFO namenode.FSImage: Allocated new BlockPoolId:
BP-1052576260-192.168.41.31-1608312791695 20/12/18 09:33:11 INFO
common.Storage: Storage directory
/opt/module/hadoop-2.7.2/data/tmp/dfs/name has been successfully
formatted. 20/12/18 09:33:11 INFO namenode.NNStorageRetentionManager:
Going to retain 1 images with txid >= 0 20/12/18 09:33:11 INFO
util.ExitUtil: Exiting with status 0 20/12/18 09:33:11 INFO
namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at h1/192.168.31.31
************************************************************/

  1. 在h1上启动namenode
[hzhao@h1 ~]$ hadoop-daemon.sh start namenode
starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-namenode-h1.out
  1. 群起datanode
[hzhao@h2 ~]$ xcall hadoop-daemon.sh start datanode
要执行的命令是hadoop-daemon.sh start datanode
----------------h1---------------
starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-datanode-h1.out
----------------h2---------------
starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-datanode-h2.out
----------------h3---------------
starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-datanode-h3.out
  1. 在机器h3上启动secondnamenode
[hzhao@h3 ~]$ hadoop-daemon.sh start secondarynamenode
starting secondarynamenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-secondarynamenode-h3.out

7.启动yarn

  1. 在机器h2上启动yarn的resourcemanager
[hzhao@h2 ~]$ yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-hzhao-resourcemanager-h2.out
  1. 群起nodemanager
[hzhao@h2 ~]$ xcall yarn-daemon.sh start nodemanager
要执行的命令是yarn-daemon.sh start nodemanager
----------------h1---------------
starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-hzhao-nodemanager-h1.out
----------------h2---------------
starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-hzhao-nodemanager-h2.out
----------------h3---------------
starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-hzhao-nodemanager-h3.out
[hzhao@h2 ~]$ xcall jps
要执行的命令是jps
----------------h1---------------
4608 DataNode
4503 NameNode
5133 NodeManager
5247 Jps
----------------h2---------------
6242 DataNode
7236 Jps
6840 ResourceManager
7118 NodeManager
----------------h3---------------
4163 DataNode
4277 SecondaryNameNode
4791 NodeManager
4907 Jps

8. 关闭防火墙

[hzhao@h2 ~]$ xcall sudo service iptables stop
要执行的命令是sudo service iptables stop
----------------h1---------------
iptables: Setting chains to policy ACCEPT: filter [  OK  ]
iptables: Flushing firewall rules: [  OK  ]
iptables: Unloading modules: [  OK  ]
----------------h2---------------
iptables: Setting chains to policy ACCEPT: filter [  OK  ]
iptables: Flushing firewall rules: [  OK  ]
iptables: Unloading modules: [  OK  ]
----------------h3---------------
iptables: Setting chains to policy ACCEPT: filter [  OK  ]
iptables: Flushing firewall rules: [  OK  ]
iptables: Unloading modules: [  OK  ]
[hzhao@h2 ~]$ xcall sudo chkconfig iptables off
要执行的命令是sudo chkconfig iptables off
----------------h1---------------
----------------h2---------------
----------------h3---------------

9.修改本机hosts

在Windows–>System32–>drivers–>etc–>hosts文件最下方加入

192.168.31.31 h1
192.168.31.32 h2
192.168.31.33 h3

这样,可以通过主机名直接访问hadoop集群。

10.浏览器访问

访问http://h1:50070/,点击datanodes查看
在这里插入图片描述
访问http://h2:8088/,点击nodes查看
在这里插入图片描述

11.群起hadoop集群

1. 群起脚本的原理是获取集群中所有的节点的主机名,默认读取当前机器HADOOP_HOME/etc/hadoop/slaves,获取集群中所有的节点的主机名。
2. 循环执行 ssh 主机名 hadoop-dameon.sh start XXX
	保证当前机器到其他节点,已经配置了ssh免密登录;
	保证集群中所有当前用户的家目录/.bashrc中,已经配置了source/etc/profile
  1. 配置执行群起机器h2中HADOOP_HOME/etc/hadoop/slaves文件
[hzhao@h2 ~]$ vim /opt/module/hadoop-2.7.2/etc/hadoop/slaves
h1
h2
h3    
  1. 停止之前启动的所有进程
[hzhao@h2 ~]$ stop-yarn.sh
stopping yarn daemons
stopping resourcemanager
h1: stopping nodemanager
h3: stopping nodemanager
h2: stopping nodemanager
no proxyserver to stop
[hzhao@h2 ~]$ stop-dfs.sh
Stopping namenodes on [h1]
h1: stopping namenode
h1: stopping datanode
h3: stopping datanode
h2: stopping datanode
Stopping secondary namenodes [h3]
h3: stopping secondarynamenode
[hzhao@h2 ~]$ xcall jps
要执行的命令是jps
----------------h1---------------
5592 Jps
----------------h2---------------
7916 Jps
----------------h3---------------
5499 Jps
  1. 执行群起命令
[hzhao@h2 ~]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [h1]
h1: starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-namenode-h1.out
h1: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-datanode-h1.out
h3: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-datanode-h3.out
h2: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-datanode-h2.out
Starting secondary namenodes [h3]
h3: starting secondarynamenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-secondarynamenode-h3.out
starting yarn daemons
starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-hzhao-resourcemanager-h2.out
h2: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-hzhao-nodemanager-h2.out
h3: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-hzhao-nodemanager-h3.out
h1: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-hzhao-nodemanager-h1.out

注意:start-all.sh时,其实分别调用了start-dfs.sh和start-yarn.sh
start-dfs.sh可以在集群的任意一台机器使用,可以启动dfs的所有进程!
start-yarn.sh在集群的非resourcemanager所在的机器使用,不会启动resourcemanager
即start-all.sh在h2中启动。

12.配置历史日志服务聚集

  1. 配置mapred-site.xml文件
[hzhao@h2 ~]$ vim /opt/module/hadoop-2.7.2/etc/hadoop/mapred-site.xml
<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.address</name>
                <value>h1:10020</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.webapp.address</name>
                <value>h1:19888</value>
        </property>
        <!--第三方框架使用yarn计算的日志聚集,-->
        <property>
                <name>yarn.log.server.url</name>
                <value>http://h1:19888/jobhistory/logs</value>
        </property>
</configuration>
  1. 配置yarn-site.xml文件
[hzhao@h2 ~]$ vim /opt/module/hadoop-2.7.2/etc/hadoop/yarn-site.xml
<configuration>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <!--指定YARN的resourceManager的地址-->
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>h2</value>
        </property>
        <!--日志聚集功能使用-->
        <property>
                <name>yarn.log-aggregation-enable</name>
                <value>true</value>
        </property>
        <!--日志保留时间设置7天-->
        <property>
                <name>yarn.log-aggregation.retain-seconds</name>
                <value>604800</value>
        </property>
</configuration>
  1. 分发修改的文件
[hzhao@h2 ~]$ xsync /opt/module/hadoop-2.7.2/etc/hadoop/
要分发的文件的路径是:/opt/module/hadoop-2.7.2/etc/hadoop
-------------------------h1---------------------------
sending incremental file list
hadoop/
hadoop/mapred-site.xml
hadoop/slaves
hadoop/yarn-site.xml

sent 2139 bytes  received 103 bytes  4484.00 bytes/sec
total size is 78316  speedup is 34.93
-------------------------h2---------------------------
sending incremental file list

sent 797 bytes  received 13 bytes  540.00 bytes/sec
total size is 78316  speedup is 96.69
-------------------------h3---------------------------
sending incremental file list
hadoop/
hadoop/mapred-site.xml
hadoop/slaves
hadoop/yarn-site.xml

sent 2139 bytes  received 103 bytes  4484.00 bytes/sec
total size is 78316  speedup is 34.93
  1. 重启hdfs和yarn服务
[hzhao@h2 ~]$ stop-all.sh
[hzhao@h2 ~]$ start-all.sh
  1. 在机器h1上启动jobhistory
[hzhao@h2 ~]$ mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /opt/module/hadoop-2.7.2/logs/mapred-hzhao-historyserver-h2.out
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值