(一)hadoop集群搭建——1.3hadoop安装与配置
1.jdk安装
在机器h2上传jdk安装包到/opt/soft目录下,然后解压到/opt/module目录中。
[hzhao@h2 ~]$ tar -zxvf /opt/soft/jdk-8u121-linux-x64.tar.gz /opt/module/
2.hadoop安装
在机器h2上传hadoop安装包到/opt/soft目录下,然后解压到/opt/module目录中。
[hzhao@h2 ~]$ tar -zxvf /opt/soft/hadoop-2.7.2.tar.gz /opt/module/
3.hadoop配置
- 配置hadoop的core-site文件
[hzhao@h2 ~]$ vim /opt/module/hadoop-2.7.2/etc/hadoop/core-site.xml
配置文件内容为
<configuration>
<!--指定HDFS中namenode的地址-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://h1:9000</value>
</property>
<!--指定Hadoop运行时产生文件的存储目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>
</configuration>
- 配置hadoop的hdfs-site文件
[hzhao@h2 ~]$ vim /opt/module/hadoop-2.7.2/etc/hadoop/hdfs-site.xml
配置文件内容为
<configuration>
<!--指定Hadoop辅助名称节点主机配置-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>h3:50090</value>
</property>
</configuration>
- 配置hadoop的yarn-site文件
[hzhao@h2 ~]$ vim /opt/module/hadoop-2.7.2/etc/hadoop/yarn-site.xml
配置文件内容为
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--指定YARN的resourceManager的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>h2</value>
</property>
</configuration>
- 配置hadoop的mapred-site文件
[hzhao@h2 ~]$ mv /opt/module/hadoop-2.7.2/etc/hadoop/mapred-site.xml.template /opt/module/hadoop-2.7.2/etc/hadoop/mapred-site.xml
[hzhao@h2 ~]$ vim /opt/module/hadoop-2.7.2/etc/hadoop/mapred-site.xml
配置文件内容为
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
上述配置完成后使用xsync命令将hadoop和java分发到机器h1和h3。
[hzhao@h2 ~]$ xsync /opt/module/hadoop-2.7.2/
[hzhao@h2 ~]$ xsync /opt/module/jdk1.8.0_121/
分发完成后配置三台机器上hadoop和jdk的环境变量。
[hzhao@h2 ~]$ sudo vim /etc/profile
在最后面添加如下
JAVA_HOME=/opt/module/jdk1.8.0_121
HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
export PATH JAVA_HOME HADOOP_HOME
[hzhao@h2 ~]$ source /etc/profile
[hzhao@h2 ~]$ hdfs -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
看到如上版本至此hadoop安装完成。
4.修改~/.bashrc文件
分别在h1、h2、h3上执行命令并在文件最后加上如下内容。
[hzhao@h1 ~]$ vim ~/.bashrc
source /etc/profile
5.编写群起脚本
[hzhao@h2 ~]$ vim ~/bin/xcall
#!/bin/bash
#在集群的所有机器上批量执行同一条命令
if(($#==0))
then
echo 请输入您要操作的命令!
exit;
fi
echo 要执行的命令是$*
#循环执行此命令
for((i=1;i<=3;i++))
do
echo ----------------h$i---------------
ssh h$i $*
done
[hzhao@h2 ~]$ chmod 755 ~/bin/xcall
6.启动HDFS
- 首先需要namenode所配置的节点上进行格式化,这里在h1上执行。
[hzhao@h1 ~]$ hadoop namenode -format
看到如下表示格式化成功
20/12/18 09:33:11 INFO namenode.FSImage: Allocated new BlockPoolId:
BP-1052576260-192.168.41.31-1608312791695 20/12/18 09:33:11 INFO
common.Storage: Storage directory
/opt/module/hadoop-2.7.2/data/tmp/dfs/name has been successfully
formatted. 20/12/18 09:33:11 INFO namenode.NNStorageRetentionManager:
Going to retain 1 images with txid >= 0 20/12/18 09:33:11 INFO
util.ExitUtil: Exiting with status 0 20/12/18 09:33:11 INFO
namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at h1/192.168.31.31
************************************************************/
- 在h1上启动namenode
[hzhao@h1 ~]$ hadoop-daemon.sh start namenode
starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-namenode-h1.out
- 群起datanode
[hzhao@h2 ~]$ xcall hadoop-daemon.sh start datanode
要执行的命令是hadoop-daemon.sh start datanode
----------------h1---------------
starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-datanode-h1.out
----------------h2---------------
starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-datanode-h2.out
----------------h3---------------
starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-datanode-h3.out
- 在机器h3上启动secondnamenode
[hzhao@h3 ~]$ hadoop-daemon.sh start secondarynamenode
starting secondarynamenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-secondarynamenode-h3.out
7.启动yarn
- 在机器h2上启动yarn的resourcemanager
[hzhao@h2 ~]$ yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-hzhao-resourcemanager-h2.out
- 群起nodemanager
[hzhao@h2 ~]$ xcall yarn-daemon.sh start nodemanager
要执行的命令是yarn-daemon.sh start nodemanager
----------------h1---------------
starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-hzhao-nodemanager-h1.out
----------------h2---------------
starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-hzhao-nodemanager-h2.out
----------------h3---------------
starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-hzhao-nodemanager-h3.out
[hzhao@h2 ~]$ xcall jps
要执行的命令是jps
----------------h1---------------
4608 DataNode
4503 NameNode
5133 NodeManager
5247 Jps
----------------h2---------------
6242 DataNode
7236 Jps
6840 ResourceManager
7118 NodeManager
----------------h3---------------
4163 DataNode
4277 SecondaryNameNode
4791 NodeManager
4907 Jps
8. 关闭防火墙
[hzhao@h2 ~]$ xcall sudo service iptables stop
要执行的命令是sudo service iptables stop
----------------h1---------------
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Flushing firewall rules: [ OK ]
iptables: Unloading modules: [ OK ]
----------------h2---------------
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Flushing firewall rules: [ OK ]
iptables: Unloading modules: [ OK ]
----------------h3---------------
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Flushing firewall rules: [ OK ]
iptables: Unloading modules: [ OK ]
[hzhao@h2 ~]$ xcall sudo chkconfig iptables off
要执行的命令是sudo chkconfig iptables off
----------------h1---------------
----------------h2---------------
----------------h3---------------
9.修改本机hosts
在Windows–>System32–>drivers–>etc–>hosts文件最下方加入
192.168.31.31 h1
192.168.31.32 h2
192.168.31.33 h3
这样,可以通过主机名直接访问hadoop集群。
10.浏览器访问
访问http://h1:50070/,点击datanodes查看
访问http://h2:8088/,点击nodes查看
11.群起hadoop集群
1. 群起脚本的原理是获取集群中所有的节点的主机名,默认读取当前机器HADOOP_HOME/etc/hadoop/slaves,获取集群中所有的节点的主机名。
2. 循环执行 ssh 主机名 hadoop-dameon.sh start XXX
保证当前机器到其他节点,已经配置了ssh免密登录;
保证集群中所有当前用户的家目录/.bashrc中,已经配置了source/etc/profile
- 配置执行群起机器h2中HADOOP_HOME/etc/hadoop/slaves文件
[hzhao@h2 ~]$ vim /opt/module/hadoop-2.7.2/etc/hadoop/slaves
h1
h2
h3
- 停止之前启动的所有进程
[hzhao@h2 ~]$ stop-yarn.sh
stopping yarn daemons
stopping resourcemanager
h1: stopping nodemanager
h3: stopping nodemanager
h2: stopping nodemanager
no proxyserver to stop
[hzhao@h2 ~]$ stop-dfs.sh
Stopping namenodes on [h1]
h1: stopping namenode
h1: stopping datanode
h3: stopping datanode
h2: stopping datanode
Stopping secondary namenodes [h3]
h3: stopping secondarynamenode
[hzhao@h2 ~]$ xcall jps
要执行的命令是jps
----------------h1---------------
5592 Jps
----------------h2---------------
7916 Jps
----------------h3---------------
5499 Jps
- 执行群起命令
[hzhao@h2 ~]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [h1]
h1: starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-namenode-h1.out
h1: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-datanode-h1.out
h3: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-datanode-h3.out
h2: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-datanode-h2.out
Starting secondary namenodes [h3]
h3: starting secondarynamenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hzhao-secondarynamenode-h3.out
starting yarn daemons
starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-hzhao-resourcemanager-h2.out
h2: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-hzhao-nodemanager-h2.out
h3: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-hzhao-nodemanager-h3.out
h1: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-hzhao-nodemanager-h1.out
注意:start-all.sh时,其实分别调用了start-dfs.sh和start-yarn.sh
start-dfs.sh可以在集群的任意一台机器使用,可以启动dfs的所有进程!
start-yarn.sh在集群的非resourcemanager所在的机器使用,不会启动resourcemanager
即start-all.sh在h2中启动。
12.配置历史日志服务聚集
- 配置mapred-site.xml文件
[hzhao@h2 ~]$ vim /opt/module/hadoop-2.7.2/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>h1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>h1:19888</value>
</property>
<!--第三方框架使用yarn计算的日志聚集,-->
<property>
<name>yarn.log.server.url</name>
<value>http://h1:19888/jobhistory/logs</value>
</property>
</configuration>
- 配置yarn-site.xml文件
[hzhao@h2 ~]$ vim /opt/module/hadoop-2.7.2/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--指定YARN的resourceManager的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>h2</value>
</property>
<!--日志聚集功能使用-->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!--日志保留时间设置7天-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
- 分发修改的文件
[hzhao@h2 ~]$ xsync /opt/module/hadoop-2.7.2/etc/hadoop/
要分发的文件的路径是:/opt/module/hadoop-2.7.2/etc/hadoop
-------------------------h1---------------------------
sending incremental file list
hadoop/
hadoop/mapred-site.xml
hadoop/slaves
hadoop/yarn-site.xml
sent 2139 bytes received 103 bytes 4484.00 bytes/sec
total size is 78316 speedup is 34.93
-------------------------h2---------------------------
sending incremental file list
sent 797 bytes received 13 bytes 540.00 bytes/sec
total size is 78316 speedup is 96.69
-------------------------h3---------------------------
sending incremental file list
hadoop/
hadoop/mapred-site.xml
hadoop/slaves
hadoop/yarn-site.xml
sent 2139 bytes received 103 bytes 4484.00 bytes/sec
total size is 78316 speedup is 34.93
- 重启hdfs和yarn服务
[hzhao@h2 ~]$ stop-all.sh
[hzhao@h2 ~]$ start-all.sh
- 在机器h1上启动jobhistory
[hzhao@h2 ~]$ mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /opt/module/hadoop-2.7.2/logs/mapred-hzhao-historyserver-h2.out