spark集群搭建

1.目的

搭建spark目的是为了做离线计算

2.基础

spark搭建基础:hadoop集群已经搭建成功

例如:我的hadoop集群在work用户下

/home/work/hadoop-2.9.2   这是hadoop目录

/home/work/jdk1.8.0_171   这是java目录

scala包:scala-2.10.4.tgz

spark包:spark-2.4.0-bin-hadoop2.7.tgz

(wget http://mirrors.shu.edu.cn/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz)

3.安装scala

spark使用scala语言,具体安装流程

将scala-2.10.4.tgz移至/home/work/

将spark-2.4.0-bin-hadoop2.7.tgz移至/home/work/

解压即可

[work@hserver1 ~]# tar -zxvf scala-2.10.4.tgz

[work@hserver1 ~]# vim .bashrc

export SCALA_HOME=/home/work/scala-2.10.4

export PATH=$PATH:$SCALA_HOME/bin



[work@hserver1 ~]# source ~/.bashrc

验证:

[work@hserver1 ~]# scala -version

2.安装spark

解压spark

[work@hserver1 ~]# tar -xxzf  spark-2.4.0-bin-hadoop2.7.tgz

[work@hserver1 ~]# cp -rf /home/work/spark-2.4.0-bin-hadoop2.7/conf/spark-env.sh.template  /home/work/spark-2.4.0-bin-hadoop2.7/conf/spark-env.sh

[work@hserver1 ~]# vim /home/work/spark-2.4.0-bin-hadoop2.7/conf/spark-env.sh

export SCALA_HOME=/home/work/scala-2.10.4

export JAVA_HOME=/home/work/jdk1.8.0_171

export HADOOP_HOME=/home/work/hadoop-2.9.2

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

SPARK_MASTER_IP=hserver1

SPARK_LOCAL_DIRS=/home/work/spark-2.4.0-bin-hadoop2.7

SPARK_DRIVER_MEMORY=1G



[work@hserver1 ~]# cp /home/work/spark-2.4.0-bin-hadoop2.7/conf/slaves.template /home/work/spark-2.4.0-bin-hadoop2.7/conf/slaves

[work@hserver1 ~]# vim /home/work/spark-2.4.0-bin-hadoop2.7/conf/slaves

hserver2

hserver3

 

3.将配置好的spark和scala分发到集群各机器

由于搭建hadoop集群时做了免密登陆,所有远程发送文件时,并不需要输入密码

[work@hserver1 ~]# scp -r /home/work/scala-2.10.4 work@hserver2:/home/work

[work@hserver1 ~]# scp -r /home/work/scala-2.10.4 work@hserver3:/home/work

[work@hserver1 ~]# scp -r /home/work/spark-2.4.0-bin-hadoop2.7 work@hserver2:/home/work

[work@hserver1 ~]# scp -r /home/work/spark-2.4.0-bin-hadoop2.7 work@hserver3:/home/work

[work@hserver1 ~]# scp -r ~/.bashrc work@hserver3:/home/work

[work@hserver1 ~]# scp -r ~/.bashrc work@hserver2:/home/work

[work@hserver1 ~]# ssh work@hserver3 "source ~/.bashrc"     //   将各个机器的包的scala加入环境变量中

[work@hserver1 ~]# ssh work@hserver2 "source ~/.bashrc"

4.验证

[work@hserver1 ~]# cd /home/work/spark-2.4.0-bin-hadoop2.7

[work@hserver1 spark-2.4.0-bin-hadoop2.7]# bin/spark-submit --master yarn --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.11-2.4.0.jar 10

结果:

2019-01-22 14:40:24 INFO DAGScheduler:54 - Job 0 finished: reduce at SparkPi.scala:38, took 1.095592 s

Pi is roughly 3.1414551414551415   // 计算PI的值为3.14 ......

 

 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值