一、Hadoop运行模式
Hadoop通常有三种运行模式:本地(独立)模式、伪分布式(Pseudo-distributed)模式和完全分布式(Fully distributed)模式。
Hadoop的默认配置即为本地模式,此时Hadoop使用本地文件系统而非分布式文件系统,而且其也不会启动任何Hadoop守护进程,Map和Reduce任务都作为同一进程的不同部分来执行。因此,本地模式下的Hadoop仅运行于本机,此模式仅用于开发或调试MapReduce应用程序但却避免了复杂的后续操作。
伪分布式模式下,Hadoop将所有进程运行于同一台主机上,但此时Hadoop将使用分布式文件系统,而且各jobs也是由JobTracker服务管理的独立进程。同时,由于伪分布式的Hadoop集群只有一个节点,因此HDFS的块复制将限制为单个副本,其secondary-master和slave也都将运行于本地主机。此种模式除了并非真正意义的分布式之外,其程序执行逻辑完全类似于完全分布式,因此,常用于开发人员测试程序执行。
要真正发挥Hadoop的威力,就得使用完全分布式模式。由于ZooKeeper实现高可用等依赖于奇数法定数目(an odd-numbered quorum),因此,完全分布式环境需要至少三个节点。
二、环境准备
主机名称 | IP | 角色 | 系统版本 |
master | 192.168.22.128 | 名称节点 | CentOS release 6.8 (Final) x86_64 |
slave1 | 192.168.22.129 | 数据节点 | CentOS release 6.8 (Final) x86_64 |
slave2 | 192.168.22.130 | 数据节点 | CentOS release 6.8 (Final) x86_64 |
# cat /etc/redhat-release
CentOS release 6.8 (Final)
# uname -i
x86_64
以下操作在三台机器上都操作:
(1)关闭防火墙
# iptables -F
# /etc/init.d/iptables save
iptables: Saving firewall rules to /etc/sysconfig/iptables:[ OK ]
# chkconfig iptables off
# chkconfig --list |grep iptables
iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off
# iptables -nvL
Chain INPUT (policy ACCEPT 231K packets, 407M bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 252K packets, 1419M bytes)
pkts bytes target prot opt in out source destination
(2)关闭selinux
1)永久关闭selinux,需重启生效
# vim /etc/selinux/config
SELINUX=disabled
2)临时关闭selinux
# setenforce 0
(3)修改主机名
以master节点为例:
# vim /etc/sysconfig/network
HOSTNAME=master #将HOSTNAME修改为master
# bash
(4)配置hosts
# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.22.128 master
192.168.22.129 slave1
192.168.22.130 slave2
(5)安装jdk
# tar xzvf jdk-8u181-linux-x64.tar.gz
# mv jdk1.8.0_181 /usr/local/jdk1.8.0_181
# cd /usr/local/
# chown -R root:root jdk1.8.0_181
# ln -s jdk1.8.0_181 jdk
# vim /etc/profile #在文件最后增加
export JAVA_HOME=/usr/local/jdk1.8.0_181
export CLASSPATH=$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
# source /etc/profile
# java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
(6)配置互信
# ssh-keygen #一直回车
# cd /root/.ssh/
# ll
total 12
-rw------- 1 root root 1675 Aug 26 18:25 id_rsa
-rw-r--r-- 1 root root 393 Aug 26 18:25 id_rsa.pub
-rw-r--r-- 1 root root 806 Aug 26 18:22 known_hosts
# cat id_rsa.pub > authorized_keys
# cat authorized_keys #互相将公钥内容复制到authorized_keys文件
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAn4mcF3ptqYzpfJVL+ei+bEu6FYMnh27jemcs/VJQzIFGlFIOM93FnkaJxUHOuHhfTN2GLPxvl0Ru6dXRsq38iCeBUxMBb9nU6GhECb3V40qz85J493yzoab3HtvB8ldbBnEh9W1V9SfT31DhAN3d8MFYGpOVbxUSfOV7huV/P+9r+o/Z7pVNQV8LO33xoEKIAPKueCbNac6mWMGfBX2NIdLB9gDck4x+lVMUN4wl0sOQxsHSHI7FLEIdKDR8SePUJ1M9pgaRSFOE54Iu726FWfQyDVuUzFii5AibaGK8iRpqVfGhtMXxhApKxi+Y2VsRoP3+iZzCKq1L1xO3s8uC+Q== root@master
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAre4RrLwB2u/+hyBn21bogNIL1io7lpXNncq88ZIpv8KenYOZO5MAQlUz91eGxtJUL0EGqEMC/hOly8BQY/VNg6PgVif89b1XWvN8Xiyq5Y3meY4aoHYwjkWwAkbyisd1d7lTuuKxFvX8fFG9XyUVlPS7LU0Hn0gJ1ehk2n5L/u0QMS57TqN/Q2jKCX9eQLSbAGpRmi13tAassC0sPJoATZYvwSobksFPRKtR7EqX6WC5tjvvkWqHnyz77Yy6klQ16o50IHKfgHxhsdqNZybigDi3Ydy0NLXB2Md8exr6e9eP2TsJ4MJrxlS7uX0+ZtDIrgN3KcxRPk+z7M5tp9IZ0Q== root@slave1
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAtK9Jec3GpWJsbMySXJcZuuRR1rbM3MW8H1SOlQ7oElWHGb+y1Yeh1sl3KgB52zSkJkVEXwFZIgNqkUG8rCvJG6rh/j22dKhwRFseFLWgY0Co+HavsFhXKixgDXY6Pw7Nl03djnsoxs0aKVWDJ8Nzg0LkgMpT04sy3UokVGJ9cHIbwrlG4lzNJDvuBJ2WT+Kw46rnihECpATI/KsXbse2Nrm/qANki9qtwQDZjm19yldGsyNGNHtph2XoXBnCG5MFrmFhrTFC7976vFCy3PvQsDjpCo1mR/dIMyRizhvYElZcNsQdWDsjKwdLWdI71DJW3Z0hUlaFwAMowkBq5+YWkQ== root@slave2
测试免秘钥登陆:
[root@master .ssh]# ssh slave1
Last login: Mon Aug 26 18:03:24 2019 from 192.168.22.1
[root@slave1 ~]# logout #按Ctrl+D退出
Connection to slave1 closed.
[root@master .ssh]# ssh slave2
Last login: Mon Aug 26 18:03:29 2019 from 192.168.22.1
[root@slave2 ~]# logout
Connection to slave2 closed.
三、安装并配置hadoop
hadoop-2.6.5下载地址:http://archive.apache.org/dist/hadoop/common/hadoop-2.6.5/
# tar xzvf /usr/local/src/hadoop-2.6.5.tar.gz
# mv hadoop-2.6.5 /usr/local/
# cd /usr/local
# chown -R root:root hadoop-2.6.5
# ln -s hadoop-2.6.5 hadoop
修改配置文件(路径/usr/local/hadoop/etc/hadoop),建议修改前对需要修改的配置文件备份
1、hadoop-env.sh
# vim hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_181
2、core-site.xml
# vim core-site.xml
<configuration>
<!-- 指定HDFS老大(namenode)的通信地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储路径 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>
3、hdfs-site.xml
# vim hdfs-site.xml
<configuration>
<!-- 设置namenode的http通讯地址 -->
<property>
<name>dfs.namenode.http-address</name>
<value>master:50070</value>
</property>
<!-- 设置secondarynamenode的http通讯地址 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave1:50090</value>
</property>
<!-- 设置namenode存放的路径 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/name</value>
</property>
<!-- 设置hdfs副本数量 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- 设置datanode存放的路径 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/data</value>
</property>
</configuration>
4、mapred-site.xml
# mv mapred-site.xml.template mapred-site.xml
# vim mapred-site.xml
<configuration>
<!-- 通知框架MR使用YARN -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
5、yarn-site.xml
# vim yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<!-- 设置 resourcemanager 在哪个节点-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<!-- reducer取数据的方式是mapreduce_shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
6、slaves
# vim slaves
slave1
slave2
7、创建相关目录
# mkdir /usr/local/hadoop/tmp
# mkdir /usr/local/hadoop/name
# mkdir /usr/local/hadoop/data
8、将hadoop-2.6.5目录远程复制至从节点
# scp -r hadoop-2.6.5 slave1:/usr/local/hadoop-2.6.5
# scp -r hadoop-2.6.5 slave2:/usr/local/hadoop-2.6.5
9、编辑/etc/profile文件
# vim /etc/profile #三个节点都操作
export HADOOP_HOME=/usr/local/hadoop-2.6.5
export PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
# source /etc/profile
四、启动hadoop
1、格式化名称节点
# hadoop namenode -format
2、启动dfs及yarn
# start-dfs.sh
# start-yarn.sh
也可以使用执行start-all.sh代替上面的启动方式
# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/hadoop-2.6.5/logs/hadoop-root-namenode-master.out
slave2: starting datanode, logging to /usr/local/hadoop-2.6.5/logs/hadoop-root-datanode-slave2.out
slave1: starting datanode, logging to /usr/local/hadoop-2.6.5/logs/hadoop-root-datanode-slave1.out
Starting secondary namenodes [slave1]
slave1: starting secondarynamenode, logging to /usr/local/hadoop-2.6.5/logs/hadoop-root-secondarynamenode-slave1.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.6.5/logs/yarn-root-resourcemanager-master.out
slave1: starting nodemanager, logging to /usr/local/hadoop-2.6.5/logs/yarn-root-nodemanager-slave1.out
slave2: starting nodemanager, logging to /usr/local/hadoop-2.6.5/logs/yarn-root-nodemanager-slave2.out
查看结果
master上
# jps -m
28627 Jps -m
28380 ResourceManager
28126 NameNode
slave1上
# jps -m
26752 Jps -m
26626 NodeManager
26557 SecondaryNameNode
26462 DataNode
slave2上
# jps -m
25937 Jps -m
25810 NodeManager
25709 DataNode
五、查看集群状态
1、查看YARN
可以访问YARN的管理界面http://192.168.22.128:8088,验证YARN,如下图所示:
2、查看HDFS
可以访问hdfs的管理界面http://192.168.22.128:50070,验证hdfs,如下图所示:
3、在hdfs上创建文件验证
[root@master native]# hadoop fs -ls /
[root@master native]# hadoop fs -mkdir /test
[root@master native]# hadoop fs -ls /
Found 1 items
drwxr-xr-x - root supergroup 0 2019-08-26 19:17 /test
[root@master native]# hadoop fs -put /etc/passwd /test
[root@master native]# hadoop fs -ls /test
Found 1 items
-rw-r--r-- 2 root supergroup 1336 2019-08-26 19:19 /test/passwd
slave1上查看
[root@slave1 local]# hadoop fs -ls /test
Found 1 items
-rw-r--r-- 2 root supergroup 1336 2019-08-26 19:19 /test/passwd
slave2上查看
[root@slave2 local]# hadoop fs -ls /test
Found 1 items
-rw-r--r-- 2 root supergroup 1336 2019-08-26 19:19 /test/passwd
至此,3台机器的分布式集群搭建完成。
搭建过程中的问题:
# hadoop fs -ls /
19/08/26 19:11:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
在执行的命令时,有如上提示,原因是原来系统预装的glibc库是2.12版本,而hadoop期望是2.14版本
# cd /usr/local/hadoop/lib/native
# ldd libhadoop.so.1.0.0
./libhadoop.so.1.0.0: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by ./libhadoop.so.1.0.0)
linux-vdso.so.1 => (0x00007ffd537f8000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fde0eef8000)
libc.so.6 => /lib64/libc.so.6 (0x00007fde0eb63000)
/lib64/ld-linux-x86-64.so.2 (0x00000030e6600000)
# ldd --version
ldd (GNU libc) 2.12
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
解决方法:
两个办法,一是重新编译glibc.2.14版本,安装后专门给hadoop使用,但是该方法有点危险,而是直接在log4j日志中去除告警信息
# vim /usr/local/hadoop/etc/hadoop/log4j.properties #最后增加如下内容
log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR