hadoop3.2.0,zookeeper3.6.3,spark3.2.0安装

  • 安装jdk

1.1下载jdk1.8版本,拷贝到/usr/local目录

1.2解压安装包,tar -zvxf  jdk-8u311-linux-x64.tar.gz

1.3 创建/usr/local/java目录,将解压后的jdk文件夹拷贝到java目下下

  Cd  /usr/local 

Mkdir  java 

mv  jdk1.8.0_311/  java/

1.4修改系统环境变量

   Vi  /etc/profile   //打开文件,在profile末尾新增下面几行,之后运行source /etc/profile 立即生效

export JAVA_HOME=/usr/local/java/jdk1.8.0_311

export JRE_HOME=/usr/local/java/jdk1.8.0_311/jre

export CLASSPATH=$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/jre/lib/rt.jar:$JRE_HOME/lib

export PATH=$JAVA_HOME/bin:$PATH

1.5测试java是否安装成功

  • 实现ssh无密登陆

rpm -qa | grep ssh 验证是否安装ssh,下面表示已经安装

如果没有安装 ,执行下面命令

yum -y install openssh openssh-server openssh-clients

   2.1在master机器上生成密码对

ssh-keygen  -t  rsa  //后面回车

cd  ~/.ssh

cat id_rsa.pub >>authorized_keys

chmod  600  authorized_keys

2.2设置ssh配置,修改文件etc/ssh/sshd_config

    Vi  etc/ssh/sshd_config

将下面注释的重新启用

RSAAuthentication yes

PubkeyAuthentication yes

AuthorizedKeysFile      .ssh/authorized_keys   // authorized_keys对应上面的文件名

配置完之后要重启ssh : service  sshd  restart

之后测试免密登录  ssh  localhost

 2.3复制文件到slaves机器

scp  authorized_keys  root@storeip62:/root/.ssh/   

scp  authorized_keys  root@storeip63:/root/.ssh/

之后重复修改etc/ssh/sshd_config配置文件,以及重启shh操作,最后master免密登录到其它slaves节点配置完成

2.4接下来实现slaves机器无密访问master,重复下面步骤

ssh-keygen -t rsa  //后面回车

cd  ~/.ssh

cat id_rsa.pub >>authorized_keys  把Slave1的公钥追加到masert复制来的文件上

chmod  600  authorized_keys

scp  authorized_keys  root@storeip63:/root/.ssh/

scp  authorized_keys  root@master:/root/.ssh/

在每个节点都重复上述操作,这样就实现了集群的每台服务器之间的免密访问

  • Zookeeper安装

tar -zvxf apache-zookeeper-3.6.3-bin.tar.gz

mv apache-zookeeper-3.6.3-bin/ zookeeper-3.6.3

cd zookeeper-3.6.3/conf

cp  zoo_sample.cfg  zoo.cfg

vi  zoo.cfg

dataDir=/usr/local/zookeeper-3.6.3/data

dataLogDir=/usr/local/zookeeper-3.6.3/logs

server.1=master:2888:3888

server.2=storeip62:2888:3888

server.3=storeip63:2888:3888

创建zookeeper-3.6.3下的data目录,在data目录下创建myid文件,向这个 myid 文件中写入 ID(ID 与前面 server.x 的 x 一致)

mkdir /usr/local/zookeeper/data

vi  /usr/local/zookeeper/data/myid

vi /etc/profile

加入环境变量

export ZOOKEEPER_HOME=/usr/local/zookeeper-3.6.3

export PATH=$PATH:$ZOOKEEPER_HOME/bin

scp -r /usr/local/zookeeper-3.6.3/  root@storeip62:/usr/local/

若启动失败,查看/usr/local/zookeeper-3.6.3/data下会新增一个version-2文件夹,修改文件夹下面所有文件的权限

cd  version-2/

chmod 777 acceptedEpoch

chmod 777 currentEpoch

chmod 777 snapshot.0

./bin/zkServer.sh start

./bin/zkServer.sh stop

  • hadoop3.2.0安装+zookeeper3.6.3  

安装前先关闭防火墙         service iptables stop    service iptables status

4.1服务器三台

192.168.7.61(master)

192.168.7.62(storeip62)

192.168.7.61(storeip63)

QuorumPeerMain

QuorumPeerMain

QuorumPeerMain

NameNode

NameNode

DFSZKFailoverController

DFSZKFailoverController

JournalNode

JournalNode

DataNode

DataNode

ResourceManager

ResourceManager

NodeManager

NodeManager

4.2解压安装包

tar –zvxf hadoop-3.2.0.tar.gz

安装目录如下

4.3修改环境变量

export HADOOP_HOME=/usr/local/hadoop-3.2.0

export PATH=$PATH:$HADOOP_HOME/bin

export PATH=$PATH:$HADOOP_HOME/sbin

4.4修改core-site.xml,新增以下内容:

<property>

        <name>fs.defaultFS</name>

        <value>hdfs://cluster</value>

    </property>

    <property>

        <name>dfs.nameservices</name>

        <value>cluster</value>

    </property>

    <property>

        <name>hadoop.tmp.dir</name>

        <value>/home/chudu/tmp</value>

    </property>

    <!-- zk 链接信息-->

    <property>

        <name>ha.zookeeper.quorum</name>

        <value>master:2181,storeip62:2181,storeip63:2181</value>

    </property>

    <!-- hadoop链接zookeeper的超时时长设置 -->

    <property>

        <name>ha.zookeeper.session-timeout.ms</name>

        <value>3000</value>

    </property>

4.5修改hdfs-site.xml,新增以下内容:

<property>

        <name>dfs.nameservices</name>

        <value>cluster</value>

    </property>

    <property>

          <name>dfs.namenode.name.dir</name>

          <value>/home/chudu/hdfs/name</value>

    </property>

    <property>

          <name>dfs.datanode.data.dir</name>

          <value>/home/chudu/hdfs/data</value>

    </property>

    <property>

          <name>dfs.permissions.enabled</name>

          <value>false</value>

    </property>

    <property>

          <name>dfs.ha.namenodes.cluster</name>

          <value>master,storeip62</value>

    </property>

    <property>

          <name>dfs.namenode.rpc-address.cluster.master</name>

          <value>master:9820</value>

    </property>

    <property>

          <name>dfs.namenode.rpc-address.cluster.storeip62</name>

          <value>storeip62:9820</value>

    </property>

    <property>

          <name>dfs.namenode.http-address.cluster.master</name>

          <value>master:9870</value>

    </property>

    <property>

          <name>dfs.namenode.http-address.cluster.storeip62</name>

          <value>storeip62:9870</value>

    </property>

    <property>

          <name>dfs.ha.automatic-failover.enabled</name>

          <value>true</value>

    </property>

    <property>

          <name>dfs.namenode.shared.edits.dir</name>

          <value>qjournal://master:8485;storeip62:8485/cluster</value>

    </property>

    <property>

          <name>dfs.journalnode.edits.dir</name>

          <value>/home/chudu/hdfs/journal</value>

    </property>

    <property>

          <name>dfs.client.failover.proxy.provider.cluster</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

    </property>

   <property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence

shell(/bin/true)</value>

</property>

<property>

      <name>dfs.ha.fencing.ssh.private-key-files</name>

      <value>/root/.ssh/id_rsa</value>

    </property>

    <property>

      <name>dfs.ha.fencing.ssh.connect-timeout</name>

      <value>30000</value>

    </property>

    <property>

      <name>dfs.namenode.handler.count</name>

      <value>100</value>

    </property>

    <property>

            <name>dfs.replication</name>

            <value>2</value>

    </property>

4.6修改mapred-site.xml,新增以下内容:

<property>

            <name>mapreduce.framework.name</name>

            <value>yarn</value>

    </property>

    <property>

        <name>mapreduce.application.classpath</name>

        <value>

          /usr/local/hadoop-3.2.0/etc/hadoop,

          /usr/local/hadoop-3.2.0/share/hadoop/common/*,

          /usr/local/hadoop-3.2.0/share/hadoop/common/lib/*,

          /usr/local/hadoop-3.2.0/share/hadoop/hdfs/*,

          /usr/local/hadoop-3.2.0/share/hadoop/hdfs/lib/*,

          /usr/local/hadoop-3.2.0/share/hadoop/mapreduce/*,

          /usr/local/hadoop-3.2.0/share/hadoop/mapreduce/lib/*,

          /usr/local/hadoop-3.2.0/share/hadoop/yarn/*,

          /usr/local/hadoop-3.2.0/share/hadoop/yarn/lib/*

        </value>

    </property>

    <property>

        <name>yarn.app.mapreduce.am.staging-dir</name>

        <value>/home/chudu/hdfs/staging</value>

    </property>

    <property>

        <name>mapreduce.jobhistory.address</name>

        <value>master:10020</value>

    </property>

    <property>

        <name>mapreduce.jobhistory.webapp.address</name>

        <value>master:19888</value>

    </property>

    <property>

        <name>mapreduce.jobhistory.joblist.cache.size</name>

        <value>15000</value>

    </property>

4.7修改yarn-site.xml,新增以下内容:

<!--开启 resourcemanager  HA, 默认不开起 -->

   <property>

      <name>yarn.resourcemanager.ha.enabled</name>

      <value>true</value>

   </property>

   <!-- 指定 RM 的  cluster id-->

   <property>

      <name>yarn.resourcemanager.cluster-id</name>

      <value>yarn-cluster</value>

   </property>

   <!--所有参与集群的 RM-->

   <property>

      <name>yarn.resourcemanager.ha.rm-ids</name>

      <value>yn111,yn112</value>

   </property>

<property>

   <name>yarn.resourcemanager.hostname.yn111</name>

   <value>storeip62</value>

 </property>

 <property>

   <name>yarn.resourcemanager.hostname.yn112</name>

   <value>storeip63</value>

 </property>

<!-- 每台主机配置的 yarn id 都不一样,所以最后拷贝到其他机器时需要修改,storeip63上改为yn112-->

   <property>

        <name>yarn.resourcemanager.ha.id</name>

        <value>yn111</value>

   </property>

   <!-- 配置 zk 地址-->

   <property>

        <name>yarn.resourcemanager.zk-address</name>

        <value>master:2181,storeip62:2181,storeip63:2181</value>

   </property>

<!--开启自动恢复功能-->

   <property>

       <name>yarn.resourcemanager.recovery.enabled</name>

       <value>true</value>

   </property>

   <!--====================================== 故障处理 =====================-->

   <!--rm失联后重新链接的时间-->

   <property>

       <name>yarn.resourcemanager.connect.retry-interval.ms</name>

       <value>2000</value>

   </property>

   <property>

<name>yarn.client.failover-proxy-provider</name>

        <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>

   </property>

 <!--==================================  分别配置 这 3 台 RM 地址 =================-->

    <!-- scheduler -->

    <property>

        <name>yarn.resourcemanager.scheduler.class</name>

<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>

    </property>

    <!--schelduler失联等待连接时间-->

    <property>

        <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>

        <value>5000</value>

    </property>

    <property>

        <name>yarn.resourcemanager.address.yn111</name>

        <value>storeip62:8032</value>

    </property>

    <property>

        <name>yarn.resourcemanager.webapp.address.yn111</name>

        <value>storeip62:8088</value>

    </property>

    <property>

        <name>yarn.resourcemanager.scheduler.address.yn111</name>

        <value>storeip62:8030</value>

    </property>

    <property>

        <name>yarn.resourcemanager.resource-tracker.address.yn111</name>

        <value>storeip62:8031</value>

    </property>

    <property>

        <name>yarn.resourcemanager.admin.address.yn111</name>

        <value>storeip62:8033</value>

    </property>

    <property>

        <name>yarn.resourcemanager.ha.admin.address.yn111</name>

        <value>storeip62:23142</value>

    </property>

<property>

        <name>yarn.resourcemanager.webapp.address.yn112</name>

        <value>storeip63:8088</value>

    </property>

    <property>

        <name>yarn.resourcemanager.scheduler.address.yn112</name>

        <value>storeip63:8030</value>

    </property>

    <property>

        <name>yarn.resourcemanager.resource-tracker.address.yn112</name>

        <value>storeip63:8031</value>

    </property>

    <property>

        <name>yarn.resourcemanager.admin.address.yn112</name>

        <value>storeip63:8033</value>

    </property>

    <property>

        <name>yarn.resourcemanager.ha.admin.address.yn112</name>

        <value>storeip63:23142</value>

    </property>

<!--================================================ NodeManager 配置 -->

    <property>

        <name>yarn.nodemanager.aux-services</name>

        <value>mapreduce_shuffle</value>

    </property>

    <property>

       <name>yarn.nodemanager.local-dirs</name>

       <value>/home/chudu/hdfs/nodemanager/local</value>

    </property>

    <property>

            <name>yarn.nodemanager.remote-app-log-dir</name>

            <value>/home/chudu/hdfs/nodemanager/remote-app-logs</value>

    </property>

    <property>

            <name>yarn.nodemanager.log-dirs</name>

            <value>/home/chudu/hdfs/nodemanager/logs</value>

    </property>

     <property>

            <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

            <value>org.apache.hadoop.mapred.ShuffleHandler</value>

     </property>

     <!-- 日志配置 -->

     <property>

            <name>yarn.log-aggregation-enable</name>

            <value>true</value>

     </property>

     <property>

            <name>yarn.log-aggregation.retain-seconds</name>

            <value>864000</value>

     </property>

     <property>

            <name>yarn.log-aggregation.retain-check-interval-seconds</name>

            <value>86400</value>

     </property>

4.8修改hadoop-env.sh,新增以下内容:

export JAVA_HOME=/usr/local/java/jdk1.8.0_311

export HDFS_NAMENODE_USER=root

export HDFS_DATANODE_USER=root

export HDFS_JOURNALNODE_USER=root

export HDFS_ZKFC_USER=root

export YARN_RESOURCEMANAGER_USER=root

export YARN_NODEMANAGER_USER=root

export HDFS_SECONDARYNAMENODE_USER=root

export HADOOP_SHELL_EXECNAME=root

   4.9修改mapred-env.sh,新增以下内容:

export JAVA_HOME=/usr/local/java/jdk1.8.0_311

4.10修改yarn-env.sh,新增以下内容:

export JAVA_HOME=/usr/local/java/jdk1.8.0_311

4.11修改workers,新增datanode服务器IP

 4.12启动zookeeper

Cd /usr/local/zookeeper3.6.3

./bin/zkServer.sh start

4.13格式化zk

hdfs zkfc -formatZk

4.14格式化namenode

sbin/hadoop-daemon.sh start journalnode 格式化前要启动journalnode

hadoop namenode –format

4.15格式化后将hadoop3.2.0拷贝到其他服务器上,并修改环境变量

scp -r /usr/local/hadoop3.2.0/  root@storeip62:/usr/local/

scp -r /usr/local/hadoop3.2.0/  root@storeip63:/usr/local/

 4.16启动hadoop

  Cd /usr/local/hadoop3.2.0

  ./sbin/start-all.sh  //能启动所有相关联的程序

./sbin/stop-all.sh

sbin/hadoop-daemon.sh start journalnode 单独启动journalnode

yarn-daemon.sh start resourcemanager单独启动resourcemanager

http://192.168.7.61:9870

https://blog.csdn.net/weixin_45025143/article/details/121757627

Transformations算子

Actions算子

http://192.168.7.62:8088

scp -r /usr/local/hadoop-3.2.0/etc/hadoop/hdfs-site.xml  root@storeip62:/usr/local/hadoop-3.2.0/etc/hadoop/

ntpdate pool.ntp.org

lsof -i  查看是否有端口9820

cd /usr/local/

./zookeeper-3.6.3/bin/zkServer.sh start

scp -r /usr/local/spark/conf/spark-env.sh  root@storeip63:/usr/local/spark/conf/

scp -r /usr/local/spark/conf/spark-env.sh  root@storeip63:/usr/local/spark/conf/

hadoop jar /usr/local/hadoop-3.2.0/share/hadoop/mapreduce/demo.jar Vodplay.Poms 20211105 20211111 大连 按周

cat /etc/centos-release

五、spark安装

  5.1解压spark-3.2.0-bin-without-hadoop.tgz

  Tar –zvxf spark-3.2.0-bin-without-hadoop.tgz

  Mv spark-3.2.0-bin-without-hadoop spark

 Cd /usr/local/spark/conf

 mv spark-env.sh.template spark-env.sh

mv workers.template workers

workers中加入spark节点服务器

storeip62

storeip63

在spark-evn.sh加入下面几行:

export JAVA_HOME=/usr/local/java/jdk1.8.0_311

export SPARK_MASTER_IP=master

export SPARK_MASTER_PORT=7077

export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop-3.2.0/bin/hadoop classpath)

加入环境变量,在/etc/profile中加入下面三行

export SPARK_HOME=/usr/local/spark

export PATH=$PATH:$SPARK_HOME/bin

export PATH=$PATH:$SPARK_HOME/sbin

启动spark

./sbin/start-all.sh

主节点多了Master进程

节点出现worker进程

测试

bin/run-example SparkPi 2>&1 | grep "Pi is"

打开vi /etc/rc.local  设置开机自动启动项

Export  JAVA_HOME=/usr/local/java/jdk1.8.0_311

/usr/local/zookeeper-3.6.3/bin/zkServer.sh start

/usr/local/hadoop-3.2.0/sbin/start-all.sh

/usr/local/spark/sbin/start-all.sh

Linux idea安装

系统必须是桌面模式,不能是服务器模式

修改/etc/ inittab文件,将id:3:initdefault:改为id:5:initdefault://将文字界面改为图形界面

下载版本 ,2021最新版无法安装,与现有操作系统版本不兼容

tar -zvxf ideaIC-2018.3.6.tar.gz

cd idea-IC-183.6156.11

./idea.sh  //安装,输入后会弹出框

./bin/spark-submit  --class SecondScala --master spark://master:7077 /root/IdeaProjects/SecondSpark/out/artifacts/SecondSpark_jar/SecondSpark.jar

./bin/spark-submit  --class SecondScala --master yarn  /root/IdeaProjects/SecondSpark/out/artifacts/SecondSpark_jar/SecondSpark.jar 

/usr/local/zookeeper-3.6.3/bin/zkServer.sh start

/usr/local/hadoop-3.2.0/sbin/start-all.sh

/usr/local/spark/sbin/start-all.sh

Hadoop fsck  /  -delete    //执行健康检查,删除损坏掉的block。  

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Hi~喜马拉雅

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值