Hadoop集群搭建

目录

一.单机版

1.下载hadoop

http://archive.apache.org/dist/

2.配置hadoop 修改5个配置文件

第一个:hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_79
第二个:core-site.xml
<configuration>
	<!-- 指定HDFS老大(namenode)的通信地址 -->
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://niu:9000</value>
	</property>
	<!-- 指定hadoop运行时产生文件的存储路径 -->
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/utils/app/hadoop-2.7.4/tmp</value>
	</property>
</configuration>

第三个:hdfs-site.xml
<configuration>
<!-- 设置hdfs副本数量 -->
        <property>                
				<name>dfs.replication</name>
                <value>1</value>
        </property>
</configuration>

第四个:mapred-site.xml

mapred-site.xml.template 需要重命名:

mv mapred-site.xml.template mapred-site.xml
<configuration>
		<!-- 通知框架MR使用YARN -->
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>
第五个:yarn-site.xml
<configuration>
	<!-- Site specific YARN configuration properties -->
	<property>
		<name>yarn.resourcemanager.hostname</name>
		<value>niu</value>
	</property>
	<!-- reducer取数据的方式是mapreduce_shuffle -->
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
</configuration>

3.将hadoop添加到环境变量中

export JAVA_HOME=/usr/java/jdk1.7.0_79
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
export HADOOP_HOME=/utils/app/hadoop-2.7.1

4.初始化HDFS 格式化文件系统

[root@niu hadoop-2.7.1]# hdfs namenode -format

5.启动Hadoop

[root@niu sbin]# ./start-dfs.sh

[root@niu sbin]# ./start-yarn.sh

6.访问管理页面

192.168.108.100:50070 访问hdfs的管理界面
192.168.108.100:8088 (YARN的管理界面)

二.伪分布式

三.完全分布式

1.集群的规划

准备条件

1.修改Linux主机名
[root@new01 ~]# vim /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=new01
2.修改IP
[root@new01 ~]# ifconfig eth0 192.168.108.201
//或者
[root@new01 ~]# vim /etc/sysconfig/network-scripts/ifcfg-eth0 
DEVICE="eth0"
BOOTPROTO=none
IPV6INIT="yes"
NM_CONTROLLED="yes"
ONBOOT="yes"
IPADDR=192.168.108.201
GATEWAY=192.168.108.1
DNS1=8.8.8.8
3.修改主机名和IP的映射关系

如果你们公司是租用的服务器或是使用的云主机(如华为用主机、阿里云主机等)

/etc/hosts里面要配置的是内网IP地址和主机名的映射关系

[root@new01 ~]# vim /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.108.201 new01
192.168.108.202 new02
192.168.108.203 new03
192.168.108.204 new04
192.168.108.205 new05
192.168.108.206 new06
192.168.108.207 new07
4.关闭防火墙
[root@new01 ~]# chkconfig iptables off
[root@new01 ~]# chkconfig iptables --list
iptables       	0:off	1:off	2:off	3:off	4:off	5:off	6:off
5.ssh免登陆
[root@new01 .ssh]# cd ~/.ssh
[root@new01 .ssh]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
1a:88:7d:e4:3b:aa:87:7e:10:6b:50:b3:ca:6d:65:c9 root@new01
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|  o              |
| . o. o          |
|. oo E           |
|.ooo= + S        |
|..+o . +         |
| ..o  +          |
|  . o. .         |
| .o+.            |
+-----------------+
[root@new01 .ssh]# ssh-copy-id new01
root@new01's password: 
Now try logging into the machine, with "ssh 'new01'", and check in:

  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.


//或者
[root@new01 .ssh]# cp id_rsa.pub authorized_keys

//niunode10想要免密码登录到 niunode11 只需要将niunode10的公钥拷贝到niunode11上就可以了;
[root@niunode10 .ssh]# ssh-copy-id niunode11

6.安装JDK,配置环境变量等
export JAVA_HOME=/usr/java/jdk1.7.0_79
export HADOOP_HOME=/utils/app/hadoop-2.7.1
export PATH=.:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:

集群规划:

主机名IP安装的软件运行的进程
new01192.168.108.201jdk、hadoopNameNode、DFSZKFailoverController(zkfc)
new02192.168.108.202jdk、hadoopNameNode、DFSZKFailoverController(zkfc)
new03192.168.108.203jdk、hadoopResourceManager
new04192.168.108.204jdk、hadoopResourceManager
new05192.168.108.205jdk、hadoop、zookeeperDataNode、NodeManager、JournalNode、QuorumPeerMain
new06192.168.108.206jdk、hadoop、zookeeperDataNode、NodeManager、JournalNode、QuorumPeerMain
new07192.168.108.207jdk、hadoop、zookeeperDataNode、NodeManager、JournalNode、QuorumPeerMain
说明
1.在hadoop2.0中通常由两个NameNode组成,一个处于active状态,另一个处于standby状态。Active NameNode对外提供服务,而Standby NameNode则不对外提供服务,仅同步active namenode的状态,以便能够在它失败时快速进行切换。
2. hadoop2.0官方提供了两种HDFS HA的解决方案,一种是NFS,另一种是QJM。这里我们使用简单的QJM。在该方案中,主备NameNode之间通过一组JournalNode同步元数据信息,一条数据只要成功写入多数JournalNode即认为写入成功。通常配置奇数个JournalNode。
3.这里还配置了一个zookeeper集群,用于ZKFC(DFSZKFailoverController)故障转移,当Active NameNode挂掉了,会自动切换Standby NameNode为standby状态。
4.hadoop-2.2.0中依然存在一个问题,就是ResourceManager只有一个,存在单点故障,hadoop-2.7.1解决了这个问题,有两个ResourceManager,一个是Active,一个是Standby,状态由zookeeper进行协调。

2.先安装zookeeper集群

1.解压

[root@new05 ~]# tar -zxvf zookeeper-3.4.5.tar.gz -C /utils/app/

2.修改配置

[root@new05 ~]# cd /utils/app/zookeeper-3.4.5/conf/
[root@new05 conf]# cp zoo_sample.cfg zoo.cfg
[root@new05 conf]# vim zoo.cfg
//修改:
dataDir=/utils/app/zookeeper-3.4.5/tmp

//在最后添加:
server.1=new05:2888:3888
server.2=new06:2888:3888
server.3=new07:2888:3888
//保存退出
//然后创建一个tmp文件夹
[root@new05 zookeeper-3.4.5]# mkdir /utils/app/zookeeper-3.4.5/tmp
//再创建一个空文件
[root@new05 zookeeper-3.4.5]# touch /utils/app/zookeeper-3.4.5/tmp/myid
//最后向该文件写入ID
[root@new05 zookeeper-3.4.5]# echo 1 > /utils/app/zookeeper-3.4.5/tmp/myid

3.拷贝至其他节点

将配置好的zookeeper拷贝到其他节点

首先分别在new06、new07根目录下创建一个目录:

mkdir /utils/app
[root@new05 ~]# scp -r /utils/app/zookeeper-3.4.5/ new06:/utils/app/
[root@new05 ~]# scp -r /utils/app/zookeeper-3.4.5/ new07:/utils/app/
			
//注意:修改new06、new07对应/utils/app/zookeeper-3.4.5/tmp/myid内容
//new06:
[root@new06 app]# echo 2 > /utils/app/zookeeper-3.4.5/tmp/myid
//new07:
[root@new07 app]# echo 3 > /utils/app/zookeeper-3.4.5/tmp/myid

3.在安装hadoop集群

1.解压

[root@new01 app]# tar -zxvf hadoop-2.7.1.tar.gz -C /utils/app/

2.将hadoop添加至环境变量中/etc/profile

[root@new01 ~]# vim /etc/profile
//添加HADOOP_HOME
export JAVA_HOME=/usr/java/jdk1.7.0_55
export HADOOP_HOME=/new/hadoop-2.7.1
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin

3.配置文件

配置HDFS
(hadoop2.0所有的配置文件都在/etc/hadoop目录下)
$HADOOP_HOME/etc/hadoop目录的下配置文件需要6个配置文件

1 .修改hadoop-env.sh
[root@new01 hadoop]# vim hadoop-env.sh 

# The java implementation to use.
export JAVA_HOME=/usr/java/jdk1.7.0_79
2.修改core-site.xml

[root@new01 hadoop]# vim core-site.xml

<configuration>
        <!-- 指定hdfs的nameservice为ns1 -->
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://ns1</value>
        </property>
        <!-- 指定hadoop临时目录 -->
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/utils/app/hadoop-2.7.1/tmp</value>
        </property>
        <!-- 指定zookeeper地址 -->
        <property>
                <name>ha.zookeeper.quorum</name>
                <value>new05:2181,new06:2181,new07:2181</value>
        </property>
</configuration>
3.修改hdfs-site.xml

[root@new01 hadoop]# vim hdfs-site.xml

<configuration>
	<!--指定hdfs的nameservice为ns1,需要和core-site.xml中的保持一致 -->
	<property>
		<name>dfs.nameservices</name>
		<value>ns1</value>
	</property>
	<!-- ns1下面有两个NameNode,分别是nn1,nn2 -->
	<property>
		<name>dfs.ha.namenodes.ns1</name>
		<value>nn1,nn2</value>
	</property>
	<!-- nn1的RPC通信地址 -->
	<property>
		<name>dfs.namenode.rpc-address.ns1.nn1</name>
		<value>new01:9000</value>
	</property>
	<!-- nn1的http通信地址 -->
	<property>
		<name>dfs.namenode.http-address.ns1.nn1</name>
		<value>new01:50070</value>
	</property>
	<!-- nn2的RPC通信地址 -->
	<property>
		<name>dfs.namenode.rpc-address.ns1.nn2</name>
		<value>new02:9000</value>
	</property>
	<!-- nn2的http通信地址 -->
	<property>
		<name>dfs.namenode.http-address.ns1.nn2</name>
		<value>new02:50070</value>
	</property>
	<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
	<property>
		<name>dfs.namenode.shared.edits.dir</name>
		<value>qjournal://new05:8485;new06:8485;new07:8485/ns1</value>
	</property>
	<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
	<property>
		<name>dfs.journalnode.edits.dir</name>
		<value>/utils/app/hadoop-2.7.1/journal</value>
	</property>
	<!-- 开启NameNode失败自动切换 -->
	<property>
		<name>dfs.ha.automatic-failover.enabled</name>
		<value>true</value>
	</property>
	<!-- 配置失败自动切换实现方式 -->
	<property>
		<name>dfs.client.failover.proxy.provider.ns1</name>
		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
	</property>
	<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
	<!-- sshfence:NN出现故障,线程还在运行-->
	<!-- shell(/bin/true):namenode断电-->
	<property>
		<name>dfs.ha.fencing.methods</name>
		<value>
			sshfence
			shell(/bin/true)
		</value>
	</property>
	<!-- 使用sshfence隔离机制时需要ssh免登陆 私钥-->
	<property>
		<name>dfs.ha.fencing.ssh.private-key-files</name>
		<value>/root/.ssh/id_rsa</value>
	</property>
	<!-- 配置sshfence隔离机制超时时间 -->
	<property>
		<name>dfs.ha.fencing.ssh.connect-timeout</name>
		<value>30000</value>
	</property>
</configuration>
4.修改mapred-site.xml

[root@new01 hadoop]# vim mapred-site.xml

<configuration>
        <!-- 指定mr框架为yarn方式 -->
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>
5.修改yarn-site.xml

[root@new01 hadoop]# vim yarn-site.xml

<configuration>
        <!-- 指定resourcemanager地址 -->
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>new03</value>
        </property>
        <!-- 指定nodemanager启动时加载server的方式为shuffle server -->
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
</configuration>

以下是指定多个RM

<configuration>
		<!-- 开启RM高可靠 -->
		<property>
		   <name>yarn.resourcemanager.ha.enabled</name>
		   <value>true</value>
		</property>
		<!-- 指定RM的cluster id -->
		<property>
		   <name>yarn.resourcemanager.cluster-id</name>
		   <value>yrc</value>
		</property>
		<!-- 指定RM的名字 -->
		<property>
		   <name>yarn.resourcemanager.ha.rm-ids</name>
		   <value>rm1,rm2</value>
		</property>
		<!-- 分别指定RM的地址 -->
		<property>
		   <name>yarn.resourcemanager.hostname.rm1</name>
		   <value>new03</value>
		</property>
		<property>
		   <name>yarn.resourcemanager.hostname.rm2</name>
		   <value>new04</value>
		</property>
		<!-- 指定zk集群地址 -->
		<property>
		   <name>yarn.resourcemanager.zk-address</name>
		   <value>new05:2181,new06:2181,new07:2181</value>
		</property>
		<property>
		   <name>yarn.nodemanager.aux-services</name>
		   <value>mapreduce_shuffle</value>
		</property>
</configuration>
6.修改slavers

slaves是指定子节点的位置
因为要在new01上启动HDFS、在new03启动yarn,
所以

  • new01上的slaves文件指定的是datanode的位置,
  • new03上的slaves文件指定的是nodemanager的位置
[root@new01 hadoop]# vim slaves 
new05
new06
new07

4. 配置免登陆

#首先要配置new01到new02、new03、new04、new05、new06、new07的免密码登陆
#在new01上生产一对钥匙
ssh-keygen -t rsa
#将公钥拷贝到其他节点,包括自己
ssh-copy-id new01
ssh-copy-id new02
ssh-copy-id new03
ssh-copy-id new04
ssh-copy-id new05
ssh-copy-id new06
ssh-copy-id new07
#配置new03到new04、new05、new06、new07的免密码登陆
#在new03上生产一对钥匙
ssh-keygen -t rsa
#将公钥拷贝到其他节点
ssh-copy-id new04
ssh-copy-id new05
ssh-copy-id new06
ssh-copy-id new07
#注意:两个namenode之间要配置ssh免密码登陆,别忘了配置new02到new01的免登陆
在new02上生产一对钥匙
ssh-keygen -t rsa
ssh-copy-id -i new01	
将配置好的hadoop拷贝到其他节点
[root@new01 hadoop-2.7.1]# pwd
/utils/app/hadoop-2.7.1
[root@new01 hadoop-2.7.1]# scp -r /utils/app/ root@new02:/
[root@new01 hadoop-2.7.1]# scp -r /utils/app/ root@new03:/
[root@new01 hadoop-2.7.1]# scp -r /utils/app/hadoop-2.7.1/ root@new04:/utils/app/
[root@new01 hadoop-2.7.1]# scp -r /utils/app/hadoop-2.7.1/ root@new05:/utils/app/
[root@new01 hadoop-2.7.1]# scp -r /utils/app/hadoop-2.7.1/ root@new06:/utils/app/
[root@new01 hadoop-2.7.1]# scp -r /utils/app/hadoop-2.7.1/ root@new07:/utils/app/

5.初始化集群(严格按照下面的步骤)

1.首先保证

/utils/app/hadoop-2.7.1/tmp文件夹下面内容为空,不为空都删除

[root@new01 tmp]# pwd
/utils/app/hadoop-2.7.1/tmp

还有zookeeper的客户端中没有[hadoop-ha]这一项,存在的话移除rmr

2.启动zookeeper和journalnode

启动zookeeper集群:分别在new05、new06、new07上启动zk

[root@new05 bin]# pwd
/utils/app/zookeeper-3.4.5/bin
[root@new05 bin]# ./zkServer.sh start
[root@new06 bin]# ./zkServer.sh start
[root@new07 bin]# ./zkServer.sh start

启动journalnode(分别在在new05、new06、new07上执行)

[root@new05 sbin]# pwd
/utils/app/hadoop-2.7.1/sbin
[root@new05 sbin]# hadoop-daemon.sh start journalnode
[root@new06 sbin]# hadoop-daemon.sh start journalnode
[root@new07 sbin]# hadoop-daemon.sh start journalnode
3.格式化HDFS

格式化之前要删除存在的tmp文件夹

[root@new01 bin]# pwd
/utils/app/hadoop-2.7.1/bin
[root@new01 bin]# hdfs namenode -format

格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,
这里我配置的是/utils/app/hadoop-2.7.1/tmp,
然后将/utils/app/hadoop-2.7.1/tmp拷贝到new02的/utils/app/hadoop-2.7.1/下

[root@new01 hadoop-2.7.1]# scp -r tmp/ new02:/utils/app/hadoop-2.7.1/
4.格式化ZK

在new01上执行;

[root@new01 hadoop-2.7.1]# hdfs zkfc -formatZK

然后会在zookeeper上生成这个目录[hadoop-ha]

[zk: localhost:2181(CONNECTED) 0] ls /
[hadoop-ha, zookeeper]
5.启动HDFS
[root@new01 sbin]# ./start-dfs.sh 
Starting namenodes on [new01 new02]
new01: starting namenode, logging to /utils/app/hadoop-2.7.1/logs/hadoop-root-namenode-new01.out
new02: starting namenode, logging to /utils/app/hadoop-2.7.1/logs/hadoop-root-namenode-new02.out
new06: starting datanode, logging to /utils/app/hadoop-2.7.1/logs/hadoop-root-datanode-new06.out
new05: starting datanode, logging to /utils/app/hadoop-2.7.1/logs/hadoop-root-datanode-new05.out
new07: starting datanode, logging to /utils/app/hadoop-2.7.1/logs/hadoop-root-datanode-new07.out
Starting journal nodes [new05 new06 new07]
new05: starting journalnode, logging to /utils/app/hadoop-2.7.1/logs/hadoop-root-journalnode-new05.out
new06: starting journalnode, logging to /utils/app/hadoop-2.7.1/logs/hadoop-root-journalnode-new06.out
new07: starting journalnode, logging to /utils/app/hadoop-2.7.1/logs/hadoop-root-journalnode-new07.out
Starting ZK Failover Controllers on NN hosts [new01 new02]
new01: starting zkfc, logging to /utils/app/hadoop-2.7.1/logs/hadoop-root-zkfc-new01.out
new02: starting zkfc, logging to /utils/app/hadoop-2.7.1/logs/hadoop-root-zkfc-new02.out
[root@new01 sbin]# jps
6875 Jps
6536 NameNode
6827 DFSZKFailoverController
[root@new01 sbin]# 
6.启动YARN
[root@new03 sbin]# ./start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /utils/app/hadoop-2.7.1/logs/yarn-root-resourcemanager-new03.out
new06: starting nodemanager, logging to /utils/app/hadoop-2.7.1/logs/yarn-root-nodemanager-new06.out
new07: starting nodemanager, logging to /utils/app/hadoop-2.7.1/logs/yarn-root-nodemanager-new07.out
new05: starting nodemanager, logging to /utils/app/hadoop-2.7.1/logs/yarn-root-nodemanager-new05.out
[root@new03 sbin]# jps
2336 ResourceManager
2407 Jps
[root@new03 sbin]# 
7.重新启动Namenode
//先启动
[root@new02 sbin]# ./hadoop-daemon.sh start namenode
//再启动
[root@new02 sbin]# ./hadoop-daemon.sh start zkfc

6.测试连接

访问HDFS界面:Active Namenode:192.168.108.201:50070

Active Namenode

DataNode节点

DataNode节点

Standy Namenode:192.168.108.202:50070

Standy Namenode

访问YARN界面:192.168.108.203:8088

Apps

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值