hadoop3.2.2集群搭建

说明:本文以HDFS High Availability Using the Quorum Journal Manager模式搭建hadoop3.2.0集群。

环境

  • CentOS7.5.1804、jdk1.8.0_181、zookeeper3.6.2、hadoop3.2.2
  • 虚拟机安装CentOS7五台,如下:

hostname

IP

roles

node-1

192.168.56.129

QuorumPeerMain、NameNode、JournalNode、DFSZKFailoverController、ResourceManager

node-2

192.168.56.130

QuorumPeerMain、NameNode、JournalNode、DFSZKFailoverController、ResourceManager

node-3

192.168.56.131

QuorumPeerMain、NameNode、JournalNode、DFSZKFailoverController、DataNode、NodeManager、JobHistoryServer

node-4

192.168.56.132

DataNode、NodeManager

node-5

192.168.56.133

DataNode、NodeManager

注意:安装前关闭防火墙,执行如下命令:

systemctl stop firewalld

systemctl disable firewalld

systemctl statusfirewalld

  • 安装jdk,省略安装步骤。

  • 三台虚拟机配置免密登录,省略配置过程。

    配置免密登录可以使用如下命令:
    ssh-keygen -t rsa -P ‘’ -f ~/.ssh/id_rsa
    ssh-copy-id -p -i ~/.ssh/id_rsa.pub “@” // 根据自己的环境补全命令
    chmod 0600 ~/.ssh/authorized_keys

搭建zookeeper集群

  • 上传到虚拟机,解压后修改conf/zoo.cfg文件,配置如下:

    tickTime=3000
    initLimit=10
    syncLimit=5
    dataDir=/opt/env/zookeeper-3.6.2/data/data
    dataLogDir=/opt/env/zookeeper-3.6.2/data/logs
    clientPort=2181
    autopurge.snapRetainCount=3
    autopurge.purgeInterval=1
    server.1=node-1:2888:3888
    server.2=node-2:2888:3888
    server.3=node-3:2888:3888

在dataDir定义的目录下创建文件myid,把虚拟机各自对应的id存储到myid,即上面的server.id,每台虚拟机只存储自己的id值即可。

启动zookeeper:bin/zkServer.sh start。

搭建hadoop

  • 修改…/hadoop-3.2.2/etc/hadoop下的配置文件:

  • hdfs-site.xml文件配置如下:

    dfs.nameservices vmcluster dfs.ha.namenodes.vmcluster nn1,nn2,nn3 dfs.namenode.rpc-address.vmcluster.nn1 node-1:8020 dfs.namenode.rpc-address.vmcluster.nn2 node-2:8020 dfs.namenode.rpc-address.vmcluster.nn3 node-3:8020 dfs.namenode.http-address.vmcluster.nn1 node-1:9870 dfs.namenode.http-address.vmcluster.nn2 node-2:9870 dfs.namenode.http-address.vmcluster.nn3 node-3:9870 dfs.namenode.shared.edits.dir qjournal://node-1:8485;node-2:8485;node-3:8485/vmcluster dfs.client.failover.proxy.provider.vmcluster org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider dfs.ha.fencing.methods sshfence dfs.ha.fencing.ssh.private-key-files /home/bigdata/.ssh/id_rsa dfs.ha.fencing.ssh.connect-timeout 30000 dfs.namenode.handler.count 100 dfs.safemode.threshold.pct 1 dfs.journalnode.edits.dir /opt/env/hadoop-3.2.2/data/jn dfs.ha.automatic-failover.enabled true dfs.namenode.name.dir file://${hadoop.tmp.dir}/dfs/nn dfs.datanode.data.dir file://${hadoop.tmp.dir}/dfs/dn dfs.replication 3 ?? ??? ??? ??? ??? ??? ??? ??? ??? ??? ? dfs.permissions.enabled false ?? ??? ??? ??? ??? ??? ??? ??? ??? ??? ? dfs.blocksize 67108864
  • core-site.xml文件配置如下:

    fs.defaultFS hdfs://vmcluster ha.zookeeper.quorum node-1:2181,node-2:2181,node-3:2181 hadoop.http.staticuser.user bigdata hadoop.tmp.dir /opt/env/hadoop-3.2.2/data
  • yarn-site.xml文件配置如下:

    yarn.resourcemanager.ha.enabled true yarn.resourcemanager.cluster-id yarnCluster ??yarn.resourcemanager.ha.automatic-failover.enabled ??true ??yarn.resourcemanager.ha.automatic-failover.embedded ??true ??yarn.resourcemanager.connect.retry-interval.ms ??2000 yarn.resourcemanager.ha.rm-ids rm1,rm2 yarn.resourcemanager.hostname.rm1 node-1 yarn.resourcemanager.hostname.rm2 node-2 yarn.resourcemanager.webapp.address.rm1 node-1:8088 yarn.resourcemanager.webapp.address.rm2 node-2:8088 ??yarn.resourcemanager.address.rm1 ??node-1:8032 ??yarn.resourcemanager.address.rm2 ??node-2:8032 ??yarn.resourcemanager.scheduler.address.rm1 ??node-1:8030 ??yarn.resourcemanager.scheduler.address.rm2 ??node-2:8030?? ??? ??? ??? ??? ??? ??? ??? ??? ? yarn.resourcemanager.zk-address node-1:2181,node-2:2181,node-3:2181 yarn.nodemanager.aux-services mapreduce_shuffle ??yarn.nodemanager.aux-services.mapreduce_shuffle.class ??org.apache.hadoop.mapred.ShuffleHandler yarn.nodemanager.env-whitelist JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME yarn.log-aggregation-enable true yarn.log.server.url http://node-3:19888/jobhistory/logs yarn.log-aggregation.retain-seconds 604800 yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler ??yarn.resourcemanager.recovery.enabled ??true ??yarn.resourcemanager.store.class ??org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore ??yarn.resourcemanager.zk.state-store.address ??node-1:2181,node-2:2181,node-3:2181 ??yarn.application.classpath ?? $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/share/hadoop/common/*, $HADOOP_COMMON_HOME/share/hadoop/common/lib/*, $HADOOP_HDFS_HOME/share/hadoop/hdfs/*, $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*, $HADOOP_YARN_HOME/share/hadoop/yarn/*, $HADOOP_YARN_HOME/share/hadoop/yarn/lib/* yarn.nodemanager.pmem-check-enabled false yarn.nodemanager.vmem-check-enabled false
  • mapred-site.xml文件配置如下:

    mapreduce.framework.name yarn mapreduce.jobhistory.address node-3:10020 mapreduce.jobhistory.webapp.address node-3:19888 ? yarn.app.mapreduce.am.env ? HADOOP_MAPRED_HOME=/opt/env/hadoop-3.2.2 ? mapreduce.map.env ? HADOOP_MAPRED_HOME=/opt/env/hadoop-3.2.2 ? mapreduce.reduce.env ? HADOOP_MAPRED_HOME=/opt/env/hadoop-3.2.2 ??mapreduce.application.classpath ??$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*

workers文件件配置如下:

node-3
node-4
node-5
  • hadoop-env.sh文件配置如下:

    export JAVA_HOME=${JAVA_HOME}
    export HDFS_NAMENODE_USER=root
    export HDFS_DATANODE_USER=root
    export HDFS_JOURNALNODE_USER=root
    export HDFS_ZKFC_USER=root
    export YARN_RESOURCEMANAGER_USER=root
    export YARN_NODEMANAGER_USER=root

初始化与启动

五台虚拟机所有配置完成后,依次执行如下命令:

->  ${HADOOP_HOME}/bin/hdfs --daemon start journalnode  # journalnode的节点都执行该命令
执行完成后,查看HADOOP_HOME目录下的logs目录的journalnode日志,是否正常。


->  ${HADOOP_HOME}/bin/hdfs namenode -format  # 格式化,在其中一台namenode虚拟机执行即可
->  ${HADOOP_HOME}/bin/hdfs --daemon start namenode # 启动namenode
执行完成后,查看HADOOP_HOME目录下的logs目录的namenode日志,是否正常


->  ${HADOOP_HOME}/bin/hdfs namenode -bootstrapStandby # 副节点同步主节点格式化文件
其余namenode节点执行该命令

->  ${HADOOP_HOME}/bin/hdfs zkfc -formatZK   # 格式化zkfc,在其中一台namenode虚拟机执行即可
执行完成后,将在ZooKeeper中创建一个znode,自动故障转移系统存储数据。


->  ${HADOOP_HOME}/sbin/stop-dfs.sh

->  ${HADOOP_HOME}/sbin/start-dfs.sh

->  ${HADOOP_HOME}/sbin/start-yarn.sh

->  ${HADOOP_HOME}/bin/mapred --daemon start historyserver
启动配置historyserver的服务器, (node-3)

运行一个mapreduce的自带例子测试

hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar wordcount /input/README.txt /output/

查看historyserver

注意:安装centos7的时候如果是最小化安装(默认的选择就是最小化安装),是不安装psmisc包,此时hadoop的HA无法正常切换,需要安装yum install psmisc -y包后,重启。

说明一下:psmisc工具包含了pstree、killall、fuser

pstree:以树状图显示程序;

killall:用于kill指定名称的进程;

fuser:用来显示所有正在使用着指定的file, file system 或者 sockets的进程信息。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值