Hadoop-2.6.2 HA + Federation 生产环境集群搭建实例以及源码编译案例分享

原创 2016年06月01日 04:30:05

Hadoop HA + Federation 生产环境集群搭建实例

core-site.xml

---

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Do not modify this file directly.  Instead, copy entries that you -->
<!-- wish to modify from this file into core-site.xml and change them -->
<!-- there.  If core-site.xml does not already exist, create it.      -->

<configuration>

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://HTY-1:8020</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

</configuration>

---


fairscheduler.xml

---
<?xml version="1.0"?>
<allocations>

  <queue name="infrastructure">
    <minResources>102400 mb, 50 vcores </minResources>
    <maxResources>153600 mb, 100 vcores </maxResources>
    <maxRunningApps>200</maxRunningApps>
    <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
    <weight>1.0</weight>
    <aclSubmitApps>root,yarn,search,hdfs</aclSubmitApps>
  </queue>

   <queue name="tool">
      <minResources>102400 mb, 30 vcores</minResources>
      <maxResources>153600 mb, 50 vcores</maxResources>
   </queue>

   <queue name="sentiment">
      <minResources>102400 mb, 30 vcores</minResources>
      <maxResources>153600 mb, 50 vcores</maxResources>
   </queue>

</allocations>

---
hadoop-env.sh//加入下面配置行

---

export JAVA_HOME=/root/hadoop/jdk1.7.0_67


---
hdfs-site.xml
---

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Do not modify this file directly.  Instead, copy entries that you -->
<!-- wish to modify from this file into hdfs-site.xml and change them -->
<!-- there.  If hdfs-site.xml does not already exist, create it.      -->

<configuration>

<property>
  <name>dfs.nameservices</name>
  <value>hadoop-cluster1,hadoop-cluster2</value>
  <description>
    Comma-separated list of nameservices.
  </description>
</property>

<!-- 
   hadoop cluster1
-->

<property>
  <name>dfs.ha.namenodes.hadoop-cluster1</name>
  <value>nn1,nn2</value>
  <description>
    The prefix for a given nameservice, contains a comma-separated
    list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
  </description>
</property>

<property>
  <name>dfs.namenode.rpc-address.hadoop-cluster1.nn1</name>
  <value>HTY-1:8020</value>
  <description>
    RPC address for nomenode1 of hadoop-cluster1
  </description>
</property>

<property>
  <name>dfs.namenode.rpc-address.hadoop-cluster1.nn2</name>
  <value>HTY-2:8020</value>
  <description>
    RPC address for nomenode2 of hadoop-test
  </description>
</property>

<property>
  <name>dfs.namenode.http-address.hadoop-cluster1.nn1</name>
  <value>HTY-1:50070</value>
  <description>
    The address and the base port where the dfs namenode1 web ui will listen on.
  </description>
</property>

<property>
  <name>dfs.namenode.http-address.hadoop-cluster1.nn2</name>
  <value>HTY-2:50070</value>
  <description>
    The address and the base port where the dfs namenode2 web ui will listen on.
  </description>
</property>

<!-- 
   hadoop cluster2
-->
<property>
  <name>dfs.ha.namenodes.hadoop-cluster2</name>
  <value>nn3,nn4</value>
  <description>
    The prefix for a given nameservice, contains a comma-separated
    list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
  </description>
</property>

<property>
  <name>dfs.namenode.rpc-address.hadoop-cluster2.nn3</name>
  <value>HTY-3:8020</value>
  <description>
    RPC address for nomenode1 of hadoop-cluster1
  </description>
</property>

<property>
  <name>dfs.namenode.rpc-address.hadoop-cluster2.nn4</name>
  <value>HTY-4:8020</value>
  <description>
    RPC address for nomenode2 of hadoop-test
  </description>
</property>

<property>
  <name>dfs.namenode.http-address.hadoop-cluster2.nn3</name>
  <value>HTY-3:50070</value>
  <description>
    The address and the base port where the dfs namenode1 web ui will listen on.
  </description>
</property>

<property>
  <name>dfs.namenode.http-address.hadoop-cluster2.nn4</name>
  <value>HTY-4:50070</value>
  <description>
    The address and the base port where the dfs namenode2 web ui will listen on.
  </description>
</property>

<property>
  <name>dfs.namenode.name.dir</name>
  <value>file:///root/hadoop/hadoop-2.6.2/hdfs/name</value>
  <description>Determines where on the local filesystem the DFS name node
      should store the name table(fsimage).  If this is a comma-delimited list
      of directories then the name table is replicated in all of the
      directories, for redundancy. </description>
</property>

<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://HTY-2:8485;HTY-3:8485;HTY-4:8485/hadoop-cluster2</value>
  <description>A directory on shared storage between the multiple namenodes
  in an HA cluster. This directory will be written by the active and read
  by the standby in order to keep the namespaces synchronized. This directory
  does not need to be listed in dfs.namenode.edits.dir above. It should be
  left empty in a non-HA cluster.
  </description>
</property>

<property>
  <name>dfs.datanode.data.dir</name>
  <value>file:///root/hadoop/hadoop-2.6.2/hdfs/data</value>
  <description>Determines where on the local filesystem an DFS data node
  should store its blocks.  If this is a comma-delimited
  list of directories, then data will be stored in all named
  directories, typically on different devices.
  Directories that do not exist are ignored.
  </description>
</property>

<property>
  <name>dfs.ha.automatic-failover.enabled</name>
  <value>false</value>
  <description>
    Whether automatic failover is enabled. See the HDFS High
    Availability documentation for details on automatic HA
    configuration.
  </description>
</property>

<property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/root/hadoop/hadoop-2.6.2/hdfs/journal/</value>
</property>

</configuration>

---


mapred-site.xml


---
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Do not modify this file directly.  Instead, copy entries that you -->
<!-- wish to modify from this file into mapred-site.xml and change them -->
<!-- there.  If mapred-site.xml does not already exist, create it.      -->

<configuration>

<!-- MR YARN Application properties -->

<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
  <description>The runtime framework for executing MapReduce jobs.
  Can be one of local, classic or yarn.
  </description>
</property>

<!-- jobhistory properties -->
<property>
  <name>mapreduce.jobhistory.address</name>
  <value>HTY-2:10020</value>
  <description>MapReduce JobHistory Server IPC host:port</description>
</property>

<property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>HTY-2:19888</value>
  <description>MapReduce JobHistory Server Web UI host:port</description>
</property>

</configuration>


---
slaves //非master节点之外的其它三个节点
---
HTY-2
HTY-3
HTY-4


---
yarn-site.xml
---
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Do not modify this file directly.  Instead, copy entries that you -->
<!-- wish to modify from this file into yarn-site.xml and change them -->
<!-- there.  If yarn-site.xml does not already exist, create it.      -->

<configuration>

  <!-- Resource Manager Configs -->
  <property>
    <description>The hostname of the RM.</description>
    <name>yarn.resourcemanager.hostname</name>
    <value>HTY-1</value>
  </property>    

  <property>
    <description>The address of the applications manager interface in the RM.</description>
    <name>yarn.resourcemanager.address</name>
    <value>${yarn.resourcemanager.hostname}:8032</value>
  </property>

  <property>
    <description>The address of the scheduler interface.</description>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>${yarn.resourcemanager.hostname}:8030</value>
  </property>

  <property>
    <description>The http address of the RM web application.</description>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>${yarn.resourcemanager.hostname}:8088</value>
  </property>

  <property>
    <description>The https adddress of the RM web application.</description>
    <name>yarn.resourcemanager.webapp.https.address</name>
    <value>${yarn.resourcemanager.hostname}:8090</value>
  </property>

  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>${yarn.resourcemanager.hostname}:8031</value>
  </property>

  <property>
    <description>The address of the RM admin interface.</description>
    <name>yarn.resourcemanager.admin.address</name>
    <value>${yarn.resourcemanager.hostname}:8033</value>
  </property>

  <property>
    <description>The class to use as the resource scheduler.</description>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
  </property>

  <property>
    <description>fair-scheduler conf location</description>
    <name>yarn.scheduler.fair.allocation.file</name>
    <value>${yarn.home.dir}/etc/hadoop/fairscheduler.xml</value>
  </property>

  <property>
    <description>List of directories to store localized files in. An 
      application's localized file directory will be found in:
      ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
      Individual containers' work directories, called container_${contid}, will
      be subdirectories of this.
   </description>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/root/hadoop/hadoop-2.6.2/yarn/local</value>
  </property>

  <property>
    <description>Whether to enable log aggregation</description>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>

  <property>
    <description>Where to aggregate logs to.</description>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/tmp/logs</value>
  </property>

  <property>
    <description>Amount of physical memory, in MB, that can be allocated 
    for containers.</description>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>30720</value>
  </property>

  <property>
    <description>Number of CPU cores that can be allocated 
    for containers.</description>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>12</value>
  </property>

  <property>
    <description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>

</configuration>
  • 下载安装jdk7 //每台机器都要操作
    1. jdk7-linux-64位下载
    2. 修改环境变量 vim /etc/profile
    3. export JAVA_HOME=/root/hadoop/jdk1.7.0_6
    4. export CLASSPATH=.:JAVAHOME/lib/dt.jar:JAVA_HOME/lib/tools.jar
    5. 使修改生效source /etc/profile
  • 配置ssh(免密码登陆,不配置此项集群启动时每一项都会要求重新输入密码)
如何建立SSH互信:假设有三台主机host1,host2,host3
(1) 关闭防火墙和SELinux
#/sbin/service iptables stop
该命令可以关闭防火墙,但是当重启后,防火墙会重新开启,输入下面的命令,让防火墙在重启后也不会开启。
#chkconfig --level 35 iptables off                                          
关闭SELINUX
#vim /etc/selinux/config  
编辑,令SELINUX=disabled。保存退出。 
立即生效。
分别在host1,host2和host3上执行上述命令。
(2) 修改SSH配置文件
#vim /etc/ssh/sshd_config 
找到下列行 去掉注释井号#
 RSAAuthentication yes //字面意思..允许RSA认证
 PubkeyAuthentication yes //允许公钥认证
 AuthorizedKeysFile .ssh/authorized_keys //公钥存放在.ssh/au..文件中
保存退出。
修改后需要重启ssh
#/etc/init.d/sshd restart //或者 service sshd restart         
分别在host1,host2和host3上执行上述命令。
(3) 生成密码对
$ ssh-keygen -t rsa     
直接回车几次,可在默认路径~/.ssh/下生成私钥idrsa公钥idrsa.pub。
分别在host1,host2和host3上执行上述命令
(4) 生成authorized_keys
将host2、host3的公钥传到host1上。
在host2上输入
$scp /home/hadoop/.ssh/id_rsa.pub hadoop@host1:~/.ssh/id_rsa.pub.host2
在host3上输入
$scp /home/hadoop/.ssh/id_rsa.pub hadoop@host1:~/.ssh/id_rsa.pub.host3
以上命令目前还需要输入目标机用户密码。
在host1上输入
$cd ~/.ssh/  
$ls  
查看idrsa.pub.host2、idrsa.pub.host3是否已经传输过来。
$ cat id_rsa.pub >> authorized_keys
$ cat id_rsa.pub.host2 >> authorized_keys 
$ cat id_rsa.pub.host3 >> authorized_keys     
生成authorized_keys。
给authorized_keys修改权限
#chmod 644 authorized_keys 
利用scp把该文件传送到host2、host3的.ssh/下
#scp authorized_keys hadoop@host2:~/.ssh/
#scp authorized_keys hadoop@host3:~/.ssh/
(5) 验证
测试在host1下输入
$ssh host2   
$exit
$ssh host3   
$exit  
应该都不需要密码。 这三个互联都应该不需要密码。
上面是三台机器,更多台机器也可以这样操作。
版权声明:本文为博主原创文章,未经博主允许不得转载。

【Hadoop】HDFS笔记(二):HDFS的HA机制和Federation机制

HA解决了HDFS的NameNode的单点问题; Federation解决了整个HDFS集群中只有一个名字空间,并且只有单独的一个NameNode管理所有DataNode的问题。   一、HA机...
  • DianaCody
  • DianaCody
  • 2014年09月22日 19:59
  • 2087

部署hadoop2.7.2 集群 基于zookeeper配置HDFS HA+Federation

hadoop1的核心组成是两部分,即HDFS和MapReduce。在hadoop2中变为HDFS和Yarn。新的HDFS中的NameNode不再是只有一个了,可以有多个(目前只支持2个)。每一个都有相...
  • gao634209276
  • gao634209276
  • 2016年05月19日 14:16
  • 8666

HDFS2的新特性:HA和Federation

一、HDFS的新特性HA (一)HDFS的HA机制         Hadoop 2.2.0 版本之前,NameNode是HDFS集群的单点故障点,每一个集群只有一个NameNode ,如果这个机...
  • converoscar
  • converoscar
  • 2017年08月29日 09:47
  • 188

HDFS-2.0社区版的HA+Federation的实现解析

在Hadoop的1.x版本中,NN的单点处理能力成为HDFS的主要(容量扩展/性能/可用性)瓶颈,主要表现在:一.NN在管理大规模的命名空间时,所消耗的内存堆必定在10GB/100GB级别,无论是触发...
  • xhh198781
  • xhh198781
  • 2015年03月29日 17:01
  • 1912

Hadoop2.2.0--Hadoop Federation、Automatic HA、Yarn完全分布式集群搭建

Hadoop1玩了有不少时间了,随着系统上线,手头事情略微少些。So,抓紧时间走通了一遍Hadoop2下的Hadoop联盟(Federation)、Hadoop2高可用(HA)及Yarn的完全分布式配...
  • wl101yjx
  • wl101yjx
  • 2014年05月02日 07:08
  • 2773

微信分享jssdk实例

雅姿hydra水润保湿系列-手机瑞丽网 /* * 注意: * 1. 所有的JS接口只能在公众号绑定的域...
  • zhaoxuejie
  • zhaoxuejie
  • 2015年04月02日 16:39
  • 7609

MOSFET驱动电路应用实例

HOMSEMI POWER MOSFET驱动电路应用实例 1.主要参数及特性 MOSFET是由电压控制型器件,输入栅极电压VG控制着漏极电流ID,即一定条件下,漏极电流ID取决于栅极电压VG。...
  • zhoujk0520
  • zhoujk0520
  • 2011年12月14日 15:51
  • 1274

heartbeat(一)heartbeat v2 haresource配置高可用集群

下面将会用heartbeat v2及版本中的haresource配置简单的高可用WEB集群,先要配置节点名称及相互解析、SSH互信通信和集群时间同步,然后详细说明配置三个配置文件authkeys、ha...
  • tjiyu
  • tjiyu
  • 2016年09月25日 20:00
  • 2565

Spark搭建HA详解

实验环境: zookeeper-3.4.6 Spark:1.6.0 简介: 本篇博客将从以下几点组织文章: 一:Spark 构建高可用HA架构 二:动手实战构建高可用HA 三...
  • snail_gesture
  • snail_gesture
  • 2016年05月07日 11:30
  • 4161

Hadoop HA高可用集群搭建(2.7.2)

1.集群规划: 2.前期准备: 3.搭建zookeeper集群(drguo3/drguo4/drguo5)
  • Dr_Guo
  • Dr_Guo
  • 2016年03月24日 22:33
  • 12848
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:Hadoop-2.6.2 HA + Federation 生产环境集群搭建实例以及源码编译案例分享
举报原因:
原因补充:

(最多只允许输入30个字)