Spark 分布式环境搭建

Spark 分布式环境搭建

准备

  1. 两台华为云服务器(一台Master,一台Worker)
  2. Ubantu20系统
  3. 软件 :
    jdk1.8
    hadoop-2.7.3.tar
    scala-2.13.0.tgz
    spark-2.1.0-bin-hadoop2.7.tgz

网络配置

vim /etc/hosts 配置网络别名

192.168.0.237 Master
192.168.0.14 Worker1
配置ssh无密码登录worker

为了避免缓存干扰,可以先删除Master和worker1的.ssh文件夹

rm -rf ~/.ssh
ssh worker1
#此时在worker1
rm -rf ~/.ssh

在服务器上使用ssh-keygen -t rsa重新生成.ssh文件夹

#在Master执行,生成公钥
scp /root/.ssh/id_rsa.pub root@worker1:/root/id_rsa.pub.master 
#从master节点拷贝id_rsa.pub到worker主机上,并且改名为id_rsa.pub.master
scp /etc/hosts root@worker1:/etc/hosts   
#在对应的主机下执行如下命令:
cat /root/id_rsa.pub.master >> /root/.ssh/authorized_keys 
#worker1主机

安装基础环境

我的所有软件都安装在/usr/local/目录下,重点是配置/etc/profile文件


export JAVA_HOME=/usr/local/jdk1.8
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=.:${JAVA_HOME}/bin:$PATH


export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin


export HADOOP_HOME=/usr/local/hadoop
export PATH="$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH"
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

export SPARK_HOME=/usr/local/spark

使用source /etc/profile 使其生效,再将配置scp拷贝到wroker,在source命令让worker节点配置生效

配置完后,可以用scp 把配置好的软件从Master拷贝到Worker

scp -r 软件目录 用户名@IP地址:拷贝目标 
scp -r /usr/local/hadoop root@Worker:/usr/local/

各种配置文件

1. Hadoop配置

1. $HADOOP_HOME/etc/hadoop/hadoop-env.sh

修改JAVA_HOME变量

export JAVA_HOME=/usr/java/jdk1.8.0_112/
2. $HADOOP_HOME/etc/hadoop/slaves
worker1
3. $HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://master:9000</value>
        </property>
        <property>
         <name>io.file.buffer.size</name>
         <value>131072</value>
       </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/usr/local/hadoop/tmp</value>
        </property>
</configuration>
                         
4. $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration>
    <property>
      <name>dfs.namenode.secondary.http-address</name>
      <value>master:50090</value>
    </property>
    <property>
      <name>dfs.replication</name>
      <value>2</value>
    </property>
    <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:/usr/local/hadoop/hdfs/name</value>
    </property>
    <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:/usr/local/hadoop/hdfs/data</value>
    </property>
</configuration>
                 
5. $HADOOP_HOME/etc/hadoop/mapred-site.xml

复制template,生成xml:
cp mapred-site.xml.template mapred-site.xml

<configuration>
 <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
          <name>mapreduce.jobhistory.address</name>
          <value>master:10020</value>
  </property>
  <property>
          <name>mapreduce.jobhistory.address</name>
          <value>master:19888</value>
  </property>
</configuration>
                       
6. $HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
         <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce_shuffle</value>
     </property>
     <property>
           <name>yarn.resourcemanager.address</name>
           <value>master:8032</value>
     </property>
     <property>
          <name>yarn.resourcemanager.scheduler.address</name>
          <value>master:8030</value>
      </property>
     <property>
         <name>yarn.resourcemanager.resource-tracker.address</name>
         <value>master:8031</value>
     </property>
     <property>
         <name>yarn.resourcemanager.admin.address</name>
         <value>master:8033</value>
     </property>
     <property>
         <name>yarn.resourcemanager.webapp.address</name>
         <value>master:8088</value>
     </property>

</configuration>
                           

至此master节点的hadoop搭建完毕,再启动之前我们需要格式化一下namenode

hadoop namenode -format

2. Spark

1. $SPARK_HOME/conf/spark-env.sh
cp spark-env.sh.template spark-env.sh
export SCALA_HOME=/usr/local/scala
export JAVA_HOME=/usr/local/jdk1.8/
export SPARK_MASTER_IP=master
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
2. $SPARK_HOME/conf/slaves
cp slaves.template slaves
master
worker1

再把配好的spark scp到worker1节点

启动服务

封装成启动脚本

#!/bin/bash
echo -e "\033[31m ========Start The Cluster======== \033[0m"
echo -e "\033[31m Starting Hadoop Now !!! \033[0m"
/usr/local/hadoop/sbin/start-all.sh
echo -e "\033[31m Starting Spark Now !!! \033[0m"
/usr/local/spark/sbin/start-all.sh
echo -e "\033[31m The Result Of The Command \"jps\" :  \033[0m"
jps  #打印正在运行的java进程
echo -e "\033[31m ========END======== \033[0m"

输入http://ip:8080 可以进入spark管理系统
输入http://ip:8088 可以进入hadoop的管理系统

3. 测试

上传测试文件到HDFS,创建目录时,可能会报namenode unsage错误,如果报错,运行hadoop dfsadmin -safemode leave

hadoop fs -mkdir -p /Hadoop/Input
hadoop fs -put 测试文件 /Hadoop/Input
测试Hadoop
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar 测试文件 /Hadoop/Input /Hadoop/Output
hadoop fs -cat /Hadoop/Output/*  #查看输出结果
测试Spark

spark-shell 开启交互式窗口

val file=sc.textFile("hdfs://master:9000/Hadoop/Input/测试文件")
val rdd = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
rdd.collect()
rdd.foreach(println)

查看workCount的结果

关闭脚本

#!/bin/bash
echo -e "\033[31m ===== Stoping The Cluster ====== \033[0m"
echo -e "\033[31m Stoping Spark Now !!! \033[0m"
/usr/local/spark/sbin/stop-all.sh
echo -e "\033[31m Stopting Hadoop Now !!! \033[0m"
/usr/local/hadoop/sbin/stop-all.sh
echo -e "\033[31m The Result Of The Command \"jps\" :  \033[0m"
jps
echo -e "\033[31m ======END======== \033[0m"
~                                                                                                                                                                                                  
~                                                        
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值