Hadoop集群搭建之全分布式集群配置

Hadoop集群搭建之全分布式集群搭建
1. 准备工作

准备四台已经安装了CentOS7系统的虚拟机,并进行了相关设置。

系统配置可以参考Hadoop集群搭建之CentOS7系统配置这篇文章。

集群版本规划:

组件版本
hadoop3.3.1
hive3.1.2
hbase2.4.8
2. 集群规划
主机名称IP地址用户HDFSYARN
hadoop01192.168.21.211hadoopNameNode,DataNodeResourceManager,NodeManager
hadoop02192.168.21.212hadoopDataNode,SecondaryNamenodeNodeManager
hadoop03192.168.21.213hadoopDataNodeNodeManager
hadoop04192.168.21.214hadoopDataNodeNodeManager
3. Hadoop安装
1. 安装目录规划
统一安装路径:/opt/modules
统一软件存放路径:/opt/software
2. 上传压缩包
1. 将压缩包上传到[/opt/software]目录下,解压到[/opt/modules]目录下
2. 建立软链接
	ln -s hadoop-3.3.1 hadoop
3. 修改[/home/hadoop/.bash_profile]文件,增加以下内容:
	HADOOP_HOME=/opt/modules/hadoop
	PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
	export HADOOP_HOME PATH
4. 使用[source ~/.bash_profile使其生效
3. 集群配置

配置文件目录:【/opt/modules/hadoop/etc/hadoop/】

  • hadoop-env.sh

    修改第25行JAVA_HOME的路径为[/opt/modules/jdk]
    
  • core-site.xml

    <configuration>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/opt/modules/hadoop/tmp</value>
        </property>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://hadoop01:8020</value>
        </property>
    </configuration>
    
  • hdfs-site.xml

    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>3</value>
        </property>
        <property>
            <name>dfs.blocksize</name>
            <value>134217728</value>
        </property>
        <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>hadoop02:50090</value>
        </property>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>/opt/modules/hadoop/tmp/namenode</value>
        </property>
        <property>
          	<name>dfs.datanode.data.dir</name>
          	<value>/opt/modules/hadoop/tmp/datanode</value>
        </property>
    </configuration>
    
  • mapred-site.xml

    需要将mapred-site.xml.template复制一份为mapred-site.xml

    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>hadoop01:10020</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.weapp.address</name>
            <value>hadoop01:19888</value>
        </property>
    </configuration>
    
  • yarn-site.xml

    <configuration>
        <property>
            <name>yarn.resourcemanager.hostname</name>
        	<value>hadoop01</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.log-aggregation-enable</name>
            <value>true</value>
        </property>
        <property>
    		<name>yarn.log.server.url</name>
        	<value>http://hadoop01:19888/jobhistory/logs</value>
    	</property>
    </configuration>
    
  • slaves

    hadoop01
    hadoop02
    hadoop03
    

使用[scp -r /opt/modules/hadoop hadoop@hostname:/opt/modules]命令将hadoop发送到三台服务器中

4. Hadoop集群测试
1. 集群格式化

使用[hadoop namenode -format]命令对集群进行格式化,格式化后会产生集群ID,块池ID等相关信息。

2. 启动集群
官方建议使用[start-dfs.sh]和[start-yarn.sh]分别启动hdfs和yarn集群,当然也可以使用[start-all.sh]启动集群
3. 集群测试
1. 访问[hadoop01:50070]查看HDFS集群WebUI
2. 访问[hadoop01:8088]查看YARN集群WebUI
5. Hadoop高可用集群
1. 集群规划
主机名称IP地址用户HDFSYARNZKZKFC
hadoop01192.168.21.211hadoopNameNode,DataNodeResourceManager,NodeManagerDFSZKFailoverController
hadoop02192.168.21.212hadoopNamenode,DataNode,JournalNodeResourceManager,NodeManagerQuorumPeerMainDFSZKFailoverController
hadoop03192.168.21.213hadoopDataNode,JournalNodeNodeManagerQuorumPeerMain
hadoop04192.168.21.214hadoopDataNode,JournalNodeNodeManagerQuorumPeerMain
2. Zookeeper安装

由于安装高可用集群需要Zookeeper的支持,所以我们先要安装Zookeeper集群

  • 上传压缩包并配置环境变量

    ZOOKEEPER_HOME=/opt/modules/zookeeper
    PATH=$ZOOKEEPER_HOME/bin:$PATH
    export ZOOKEEPER_HOME PATH
    
  • 修改zookeeper相关配置文件

    • zoo.cfg

      由于[conf]目录下只有[zoo_sample.cfg],所以需要复制一份重命名为[zoo.cfg]

      # The number of milliseconds of each tick
      tickTime=2000
      # The number of ticks that the initial
      # synchronization phase can take
      initLimit=10
      # The number of ticks that can pass between
      # sending a request and getting an acknowledgement
      syncLimit=5
      # the directory where the snapshot is stored.
      # do not use /tmp for storage, /tmp here is just
      # example sakes.
      dataDir=/opt/modules/zookeeper-3.6.1/data
      # the port at which the clients will connect
      clientPort=2181
      # the maximum number of client connections.
      # increase this if you need to handle more clients
      #maxClientCnxns=60
      #
      # Be sure to read the maintenance section of the
      # administrator guide before turning on autopurge.
      #
      # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
      #
      # The number of snapshots to retain in dataDir
      #autopurge.snapRetainCount=3
      # Purge task interval in hours
      # Set to "0" to disable auto purge feature
      #autopurge.purgeInterval=1
      
      ## Metrics Providers
      #
      # https://prometheus.io Metrics Exporter
      #metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
      #metricsProvider.httpPort=7000
      #metricsProvider.exportJvmInfo=true
      
      server.1=hadoop02:2888:3888
      server.2=hadoop03:2888:3888
      server.3=hadoop04:2888:3888
      
    • myid

      新建一个目录[data],在data中新建一个文件[myid],写上刚才IP地址所对应的[server.id]中的[id]值。(hadoop02填写1,其余的自行修改)

      1
      
  • 分发到三台服务器,并修改[myid]内容

  • 使用命令[zkServer.sh start]启动三台服务器中的zookeeper,如果出现进程名为[QuorumPeerMain]的进程表示zookeeper启动成功

3. HA集群配置

配置文件目录:【/opt/modules/hadoop/etc/hadoop/】

  • hadoop-env.sh

    修改第25行JAVA_HOME的路径为[/opt/modules/jdk]
    
  • core-site.xml

    <configuration>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/opt/modules/hadoop/data</value>
        </property>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://supercluster</value>
        </property>
        <property>
            <name>ha.zookeeper.quorum</name>
            <value>hadoop02:2181,hadoop03:2181,hadoop04:2181</value>
        </property>
        <property>
            <name>hadoop.proxyuser.hadoop.hosts</name>
            <value>*</value>
        </property>
        <property>
            <name>hadoop.proxyuser.hadoop.groups</name>
            <value>*</value>
        </property>
    </configuration>
    
  • hdfs-site.xml

    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>3</value>
        </property>
        <property>
            <name>dfs.blocksize</name>
            <value>134217728</value>
        </property>
        <property>
            <name>dfs.nameservices</name>
            <value>supercluster</value>
        </property>
        <property>
            <name>dfs.ha.namenodes.supercluster</name>
            <value>nn1,nn2</value>
        </property>
        <property>
            <name>dfs.namenode.rpc-address.supercluster.nn1</name>
            <value>hadoop01:8020</value>
        </property>
        <property>
            <name>dfs.namenode.rpc-address.supercluster.nn2</name>
            <value>hadoop02:8020</value>
        </property>
        <property>
            <name>dfs.namenode.http-address.supercluster.nn1</name>
            <value>hadoop01:50070</value>
        </property>
        <property>
            <name>dfs.namenode.http-address.supercluster.nn2</name>
            <value>hadoop02:50070</value>
        </property>
        <property>
            <name>dfs.namenode.shared.edits.dir</name>
            <value>qjournal://hadoop02:8485;hadoop03:8485;hadoop04:8485/supercluster</value>
        </property>
        <property>
            <name>dfs.journalnode.edits.dir</name>
            <value>/opt/modules/hadoop/data/journaldata</value>
        </property>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>/opt/modules/hadoop/data/namenode</value>
        </property>
        <property>
          	<name>dfs.datanode.data.dir</name>
          	<value>/opt/modules/hadoop/data/datanode</value>
        </property>
        <property>
            <name>dfs.ha.automatic-failover.enabled</name>
            <value>true</value>
        </property>
        <property>
            <name>dfs.client.failover.proxy.provider.supercluster</name>
            <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
        <property>
            <name>dfs.ha.fencing.methods</name>
            <value>sshfence</value>
        </property>
        <property>
            <name>dfs.ha.fencing.ssh.private-key-files</name>
            <value>/home/hadoop/.ssh/id_rsa</value>
        </property>
        <property>
            <name>dfs.ha.fencing.ssh.connect-timeout</name>
            <value>30000</value>
        </property>
    </configuration>
    
  • mapred-site.xml

    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>hadoop01:10020</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.weapp.address</name>
            <value>hadoop01:19888</value>
        </property>
    </configuration>
    
  • yarn-site.xml

    <configuration>
        <property>
            <name>yarn.resourcemanager.ha.enabled</name>
        	<value>true</value>
        </property>
        <property>
            <name>yarn.resourcemanager.cluster-id</name>
            <value>yarncluster</value>
        </property>
        <property>
            <name>yarn.resourcemanager.ha.rm-ids</name>
            <value>rm1,rm2</value>
        </property>
        <property>
            <name>yarn.resourcemanager.hostname.rm1</name>
            <value>hadoop01</value>
        </property>
        <property>
            <name>yarn.resourcemanager.hostname.rm2</name>
            <value>hadoop02</value>
        </property>
        <property>
            <name>yarn.resourcemanager.zk-address</name>
            <value>hadoop02:2181,hadoop03:2181,hadoop04:2181</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.log-aggregation-enable</name>
            <value>true</value>
        </property>
        <property>
    	<name>yarn.log.server.url</name>
        	<value>http://hadoop01:19888/jobhistory/logs</value>
        </property>
        <property>  
        	<name>yarn.resourcemanager.address.rm1</name>  
        	<value>hadoop01:8032</value>  
        </property> 
        <property>
        	<name>yarn.resourcemanager.scheduler.address.rm1</name>  
        	<value>hadoop01:8030</value>  
        </property>
        <property>
        	<name>yarn.resourcemanager.resource-tracker.address.rm1</name>  
        	<value>hadoop01:8031</value>  
        </property>
        <property>
        	<name>yarn.resourcemanager.address.rm2</name>
        	<value>hadoop02:8032</value>
        </property>
        <property>
        	<name>yarn.resourcemanager.scheduler.address.rm2</name>
        	<value>hadoop02:8030</value>
        </property>
        <property>
        	<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
        	<value>hadoop02:8031</value>
        </property>
    </configuration>
    
4. 启动HA集群
  • 首先停止原有集群,启动[journalnode](三台都要启动)

    hdfs --daemon start journalnode
    
  • 启动原有节点上的[namenode]

    hdfs --daemon start namenode
    
  • 在新的[namenode]上拉取集群镜像文件

    hdfs namenode -bootstrapStandby
    
  • 停止原有集群的[namenode]

    hdfs --daemon stop namenode
    
  • 格式化ZKFC集群

    hdfs zkfc -formatZK
    
  • 启动HA集群

    start-dfs.sh
    start-yarn.sh
    
5. 测试HA集群自动容灾

直接[kill]掉为[active]的[namenode],如果状态为[standby]的[namenode]切换为[active],说明自动容灾成功

6. Yarn历史服务器

【yarn】历史服务器配置属性已包含在上述内容,如果需要启动可以使用:

mapred --daemon start historyserver
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Yanko24

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值