大数据技术 - 安装篇

一、系统准备

安装三台虚拟机,并安装操作系统,计划安装完服务后运行的进程如下表

系统名称IP用户发行版本
master192.168.1.151rootCentOS Linux release 7.9.2009
tony192.168.1.152rootCentOS Linux release 7.9.2009
tim192.168.1.153rootCentOS Linux release 7.9.2009
服务进程mastertonytim
zookeeperQuorumPeerMainYYY
hdfsJournalNodeYYY
NameNodeYYN
DataNodeNYY
DFSZKFailoverControllerYYY
yarnResource ManagerYYN
Node ManagerNYY

1. 系统配置

# 配置hosts文件
vi /etc/hosts

192.168.1.151 master
192.168.1.152 tony
192.168.1.153 tim

----------------------------------------------------------

# 配置防火墙
firewall-cmd --zone=public --add-port=9820/tcp --permanent
firewall-cmd --zone=public --add-port=9870/tcp --permanent
firewall-cmd --zone=public --add-port=8032/tcp --permanent
firewall-cmd --zone=public --add-port=8088/tcp --permanent
firewall-cmd --zone=public --add-port=8034/tcp --permanent
firewall-cmd --zone=public --add-port=8031/tcp --permanent
firewall-cmd --zone=public --add-port=2181/tcp --permanent
firewall-cmd --zone=public --add-port=7000/tcp --permanent
firewall-cmd --zone=public --add-port=2888/tcp --permanent
firewall-cmd --zone=public --add-port=3888/tcp --permanent
firewall-cmd --reload

# 注:8031 for yarn
----------------------------------------------------------

# 安装npt
yum install -y npt

# 安装psmic
yum install -y psmisc

----------------------------------------------------------

# 设置免密登录
ssh-keygen -t rsa
ssh-copy-id id_rsa.pub root@master
ssh-copy-id id_rsa.pub root@tony
ssh-copy-id id_rsa.pub root@tim

# 允许root登录
vi /etc/ssh/sshd_config

PermitRootLogin yes
systemctl restart sshd

------------------------------------------------------------

2. 软件安装

软件安装包安装位置
JDKjdk-11.0.12_linux-x64_bin.tar.gz/opt/jdk-11.0.12
Zookeeperapache-zookeeper-3.6.3-bin.tar.gz/opt/zookeeper3
Hadoophadoop-3.3.3.tar.gz/opt/hadoop-3.3.3

2.1 配置环境变量

# 配置环境变量
vi /etc/profile

export JAVA_HOME=/opt/jdk-11.0.12
export CLASSPATH=.:${JAVA_HOME}/lib:${JAVA_HOME}/lib
export HADOOP_HOME=/opt/hadoop-3.3.3
export ZOOKEEPER_HOME=/opt/zookeeper3
export PATH=$PATH:${JAVA_HOME}/bin:${ZOOKEEPER_HOME}/bin

2.2 安装zookeeper

# 创建数据和日志文件夹
mkdir /opt/zookeeper3/zkdata 
mkdir /opt/zookeeper3/zklog

--------------------------------------------

# master写入id文件
echo 1 > /opt/zookeeper3/zkdata/myid

# tony写入id文件
echo 2 > /opt/zookeeper3/zkdata/myid

# tim写入id文件
echo 3 > /opt/zookeeper3/zkdata/myid

--------------------------------------------

# 修改配置文件
mv /opt/zookeeper3/conf/zoo_sample.conf zoo.conf
vi /opt/zookeeper3/conf/zoo.cfg

dataDir=/opt/zookeeper3/data
dataLogDir=/opt/zookeeper3/log
clientPort=7000
server.1=master:2888:3888
server.2=tony:2888:3888
server.3=tim:2888:3888

启动zookeeper:zkServer.sh start

查看zookeeper状态:zkServer.sh status

两个服务器显示Mode:follower,一个服务器显示Mode: leader

2.2 安装hadoop

# 配置core-site.xml

vi /opt/hadoop-3.3.3/etc/hadoop/core-site.xml

<configuration>
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>hadoop</value>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://mycluster</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/hadoop-3.3.3/data/tmp</value>
    </property>
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>master:2181,tony:2181,tim:2181</value>
    </property>
</configuration>
# 配置hdfs-site.xml

vi /opt/hadoop-3.3.3/etc/hadoop/hdfs-site.xml

<configuration>
    <property>
      <name>dfs.nameservices</name>
      <value>mycluster</value>
    </property>

    <property>
      <name>dfs.permissions.enabled</name>
      <value>false</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.mycluster</name>
      <value>nn1,nn2</value>
    </property>

    <property>
      <name>dfs.namenode.rpc-address.mycluster.nn1</name>
      <value>master:9820</value>
    </property>

    <property>
      <name>dfs.namenode.http-address.mycluster.nn1</name>
      <value>master:9870</value>
    </property>

	<property>
      <name>dfs.namenode.rpc-address.mycluster.nn2</name>
      <value>tony:9820</value>
    </property>

    <property>
      <name>dfs.namenode.http-address.mycluster.nn2</name>
      <value>tony:9870</value>
    </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://master:8485;tony:8485;tim:8485/mycluster</value>
    </property>
    
    <property>
      <name>dfs.client.failover.proxy.provider.mycluster</name><value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>

    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/root/.ssh/id_rsa</value>
    </property>

    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/opt/hadoop-3.3.3/data/hadoop_repo/journalnode</value>
    </property>

    <property>
       <name>dfs.ha.automatic-failover.enabled</name>
       <value>true</value>
     </property>

	<!--修改core-site.xml中的ipc参数,防止出现连接journalnode服务ConnectException-->
	<property>
	    <name>ipc.client.connect.max.retries</name>
	    <value>100</value>
	</property>
	<property>
	    <name>ipc.client.connect.retry.interval</name>
	    <value>10000</value>
	</property>

</configuration>
# 配置worker
vi /opt/hadoop-3.3.3/etc/hadoop/worker

tony
tim
# 修改master上的start-dfs.sh 和 stop-dfs.sh
vi /opt/hadoop-3.3.3/sbin/start-dfs.sh

HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=root
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root

vi /opt/hadoop-3.3.3/sbin/stop-dfs.sh

HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=root
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root

namenode格式化

        在master, tony和tim上运行:hdfs --daemon start journalnode

        在master上运行: hdfs namenode -format

        在tony和tim上运行:hdfs namenode -bootstrapStandby

        在master上运行:hdfs namenode

        master上退出namenode:ctrl+c

        在master, tony和tim上运行:hdfs --daemon stop journalnode

启动hadoop服务

        ./start-dfs.sh

2.3 安装yarn

# 配置mapred-site.xml
vi /opt/hadoop-3.3.3/etc/hadoop/mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

    <property>
        <name>mapreduce.application.classpath</name>
        <value>
            /opt/hadoop-3.3.3/etc/hadoop,
            /opt/hadoop-3.3.3/share/hadoop/common/*,
            /opt/hadoop-3.3.3/share/hadoop/common/lib/*,
            /opt/hadoop-3.3.3/share/hadoop/hdfs/*,
            /opt/hadoop-3.3.3/share/hadoop/hdfs/lib/*,
            /opt/hadoop-3.3.3/share/hadoop/mapreduce/*,
            /opt/hadoop-3.3.3/share/hadoop/mapreduce/lib/*,
            /opt/hadoop-3.3.3/share/hadoop/yarn/*,
            /opt/hadoop-3.3.3/share/hadoop/yarn/lib/*
        </value>
    </property>
</configuration>
#配置yarn-site.xml
vi /opt/hadoop-3.3.3/etc/hadoop/yarn-site.xml

<configuration>

    <property>
        <name>yarn.resourcemanager.connect.retry-interval.ms</name>
        <value>2000</value>
    </property>

    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>

    <property>
        <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>

    <property>
        <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
        <value>true</value>
    </property>

    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>yarn-rm-cluster</value>
    </property>

    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>master</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>tony</value>
    </property>

    <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
    </property>

    <property>
       <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
     </property>

    <property>
        <name>yarn.resourcemanager.zk.state-store.address</name>
        <value>master:2181,tony:2181,tim:2181</value>
    </property>

    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>master:2181,tony:2181,tim:2181</value>
    </property>

    <property>
        <name>yarn.resourcemanager.address.rm1</name>
        <value>master:8032</value>
    </property>

    <property>
        <name>yarn.resourcemanager.scheduler.address.rm1</name>
        <value>master:8034</value>
    </property>

    <property>
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>master:8088</value>
    </property>

    <property>
        <name>yarn.resourcemanager.address.rm2</name>
        <value>tony:8032</value>
    </property>

    <property>
        <name>yarn.resourcemanager.scheduler.address.rm2</name>
        <value>tony:8034</value>
    </property>

    <property>
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>tony:8088</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>

</configuration>

复制两个配置文件到tony和tim上

scp mapred-site.xml yarn-site.xml root@tony:/opt/hadoop-3.3.3/etc/hadoop/

scp mapred-site.xml yarn-site.xml root@tony:/opt/hadoop-3.3.3/etc/hadoop/

在master和tony节点分别启动resourcemanager

        yarn --daemon start resourcemanager

在tony和tim节点分别启动nodemanager

        yarn --daemon start nodemanager

在master上检查ResourceManager状态

        yarn rmadmin -getServiceState rm1    #状态为active

        yarn rmadmin -getServiceState rm2    #状态为standby

访问管理界面:http://master:8088/cluster

JPS查看各个服务器进程

mastertonytim
19009 NameNode
10483 ResourceManager
14292 DFSZKFailoverController
20809 QuorumPeerMain
28651 JournalNode
20320 QuorumPeerMain
32688 NameNode
32054 NodeManager
23784 JournalNode
23673 DataNode
23881 DFSZKFailoverController
26621 ResourceManager
25697 JournalNode
2919 NodeManager
24490 QuorumPeerMain
25596 DataNode

 

3. 错误备注

如果你从一个no-HA更新到HA 
hdfs namenode -initializeSharedEdits

启动nodename报错,查看日志
namenode需要安装:yum install -y psmisc

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值