Installing & Building Spark Environments

搭建步骤:

//20170810 周四下午 安装与部署spark环境
*******************************************************************************************
0.在之前Hadoop2的Yarn集群上继续搭建
• # wget http://mirror.bit.edu.cn/apache/spark/spark-1.3.0/spark-1.3.0.tgz
• 解压后,进入conf目录


1.vim ~/.bashrc
# .bashrc

# User specific aliases and functions

alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'

# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi

#java configuration
export JAVA_HOME=/usr/local/src/jdk1.7.0_45
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib

#scala configuration
export SCALA_HOME=/usr/local/src/scala-2.11.4

#hadoop configuration
export HADOOP_HOME=/usr/local/src/hadoop-2.6.1
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

#spark configuration
export SPARK_HOME=/usr/local/src/spark-1.6.0-bin-hadoop2.6
export PATH=$PATH:$SPARK_HOME/bin

#path configuration
export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin:$HADOOP_HOME/bin

2.在conf目录
• # cp spark-env.sh.template spark-env.sh# cp slaves.template slaves

3.vim spark-env.sh
#configuration
export SCALA_HOME=/usr/local/src/scala-2.11.4
export JAVA_HOME=/usr/local/src/jdk1.7.0_45
export HADOOP_HOME=/usr/local/src/hadoop-2.6.1
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
SPARK_MASTER_IP=master
SPARK_LOCAL_DIRS=/usr/local/src/spark-1.6.0-bin-hadoop2.6
SPARK_DRIVER_MEMORY=1G

4.vim slaves
# A Spark Worker will be started on each of the machines listed below.
slave1
slave2

5.最后将配置好的spark安装目录,分发到slave1/2节点上

6.启动Spark
# ./sbin/start-all.sh
[root@master spark-1.6.0-bin-hadoop2.6]# cd sbin
[root@master sbin]# ls
slaves.sh         start-history-server.sh         start-slave.sh          stop-master.sh                 stop-slaves.sh
spark-config.sh   start-master.sh                 start-slaves.sh         stop-mesos-dispatcher.sh       stop-thriftserver.sh
spark-daemon.sh   start-mesos-dispatcher.sh       start-thriftserver.sh   stop-mesos-shuffle-service.sh
spark-daemons.sh  start-mesos-shuffle-service.sh  stop-all.sh             stop-shuffle-service.sh
start-all.sh      start-shuffle-service.sh        stop-history-server.sh  stop-slave.sh
[root@master sbin]#  ./start-all.sh     
starting org.apache.spark.deploy.master.Master, logging to /usr/local/src/spark-1.6.0-bin-hadoop2.6/logs/spark-badou-org.apache.spark.deploy.master.Master-1-master.out
slave1: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/src/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.out
slave2: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/src/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.out
slave1: failed to launch org.apache.spark.deploy.worker.Worker:
slave1: full log in /usr/local/src/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.out
slave2: failed to launch org.apache.spark.deploy.worker.Worker:
slave2: full log in /usr/local/src/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.out
[root@master sbin]# jps
22484 ResourceManager
25597 Master
25653 Jps
13685 NameNode
13858 SecondaryNameNode
[root@master sbin]# 

7.Spark的验证
• 本地模式:
– # ./bin/run-example SparkPi 10 --master local[2]
• 集群模式 Spark Standalone:
– # ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://master:7077
lib/spark-examples-1.6.0-hadoop2.6.0.jar 100
• 集群模式 Spark on Yarn集群上yarn-cluster模式:
– # ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster lib/sparkexamples-1.6.0-hadoop2.6.0.jar 10

这里写图片描述

这里写图片描述

这里写图片描述


Spark standalone vs. Spark on Yarn

• Spark standalone:独立模式,类似MapReduce 1.0所采取的模式,完全由内部实现容错性和资源管理
• Spark on Yarn:让Spark运行在一个通用的资源管理系统之上,这样可以与其他计算框架共享资源

Yarn Client vs. Yarn Standlone vs. Yarn Cluster

• Yarn Client:适用于交互与调试
– Driver在任务提交机上执行
– ApplicationMaster只负责向ResourceManager申请executor需要的资源
– 基于yarn时,spark-shell和pyspark必须要使用yarn-client模式
• Yarn Cluster:适用于生产环境

Yarn Cluster vs. Yarn Client区别:

本质是AM进程的区别:cluster模式下,driver运行在AM中,负责向Yarn申请资源,并监督作业运行状况,当用户提交完作用后,就关掉Client,作业会继续在yarn上运行。然而cluster模式不适合交互类型的作业;而client模式,AM仅向yarn请求executor,client会和请求的container通信来调度任务,即client不能离开

source /etc/profile

使/etc/profile里的配置立即生效,比如你在/etc/profile里写了java的环境变量,如果不执行source 当前环境下是不生效的。

/etc/profile vs. ~/.bashrc

http://blog.csdn.net/qiao1245/article/details/44650929

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值