CentOS 7.0安装spark2.2

环境

CentOS 7.0

hadoop 2.7.3 CentOS 7.0+hadoop 2.7搭建集群

scala 2.12.4

spark 2.2.0

下载并安装

scala

下载地址:http://www.scala-lang.org/download/

百度云地址:https://pan.baidu.com/s/1kVFyb3p 密码:nffb

下载完上传的到集群,目录/data/software ,安装:

mkdir /opt/scala
cd /data/software
tar -zxvf scala-2.12.4.tgz -C /opt/scala/

spark

下载地址:http://spark.apache.org/downloads.html

百度云地址:https://pan.baidu.com/s/1pL0wEa3 密码:zyvw

wget地址:

wget http://mirrors.hust.edu.cn/apache/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz

下载到/data/software后,安装:

mkdir /opt/spark
cd /data/software
tar -zxvf spark-2.2.0-bin-hadoop2.7.tgz -C /opt/spark/
cd /opt/spark
mv spark-2.2.0-bin-hadoop2.7/ spark-2.2.0

配置

scala

  • 修改环境变量

    vi /etc/profile
    
    
    # 添加
    
    export SCALA_HOME=/opt/scala/scala-2.12.4
    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$ZK_HOME/bin:$HBASE_HOME/bin:$FLUME_HOME/bin:$KAFKA_HOME/bin:$SCALA_HOME/bin
    
    
    # 生效
    
    source /etc/profile
    
    
    # 验证
    
    scala -version
    
    
    # 输出
    
    Scala code runner version 2.12.4 -- Copyright 2002-2017, LAMP/EPFL and Lightbend, Inc.

spark

  • 修改环境变量

    vi /etc/profile
    
    
    # 添加
    
    export SPARK_HOME=/opt/spark/spark-2.2.0
    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$ZK_HOME/bin:$HBASE_HOME/bin:$FLUME_HOME/bin:$KAFKA_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin
    
    
    # 生效
    
    source /etc/profile
  • 修改spark-env.sh

    cd /opt/spark/spark-2.2.0/conf
    cp spark-env.sh.template spark-env.sh
    vi spark-env.sh
    
    
    # 添加
    
    export SCALA_HOME=/opt/scala/scala-2.12.4
    export JAVA_HOME=/opt/java/jdk1.8.0_60
    export HADOOP_HOME=/opt/hadoop/hadoop-2.7.3
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop  
    export SPARK_HOME=/opt/spark/spark-2.2.0
    export SPARK_MASTER_IP=master  
    export SPARK_EXECUTOR_MEMORY=2G
  • 修改slaves

    cp slaves.template slaves
    vi slaves
    
    # 修改 删除localhost 添加以下
    
    slave1
    slave2

启动

cd /opt/spark/spark-2.2.0/sbin

./start-all.sh

# 输出
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/spark-2.2.0/logs/spark-root-org.apache.spark.deploy.master.Master-1-slave1.out
slave1: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/spark-2.2.0/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.out
slave2: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/spark-2.2.0/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.out

测试

打开网页192.168.122.128:8080 ,能够成功显示spark的状态

注: 若之前安装过tomcat,且没有修改过tomcat的默认端口,那么spark的8080端口会被tomcat占用;所以为了保证tomcat和spark能够同时运行,在启动spark和tomcat前(若已启动,先关闭),提供两种修改方案(选择其一即可):

  • 修改tomcat的默认端口

    cd /opt/tomcat/apache-tomcat-8.5.24/conf
    vi server.xml
    
    
    # 查找到8080端口
    
    
    # /8080    找到如下 注意是否注释<!--    -->
    
    
    # <Connector port="8080" protocol="HTTP/1.1"
    
    
    #               connectionTimeout="20000"
    
    
    #               redirectPort="8443" />
    
    
    # 修改为 8081
    

    启动tomcat和spark,打开 192.168.122.128:8081192.168.122.128:8080 能够成功显示tomcat(8081端口)和spark(8080端口)的状态页面

  • 修改spark的默认端口

    cd /opt/spark/spark-2.2.0/sbin/
    vi start-master.sh
    
    
    # 找到8080端口
    
    
    # /8080
    
    
    # if [ "$SPARK_MASTER_WEBUI_PORT" = "" ]; then
    
    
    #  SPARK_MASTER_WEBUI_PORT=8080
    
    
    # 修改为 8081
    
    
    source start-master.sh

    启动tomcat和spark,打开 192.168.122.128:8080192.168.122.128:8081 能够成功显示tomcat(8080端口)和spark(8081端口)的状态页面

运行spark计算圆周率

cd /opt/spark/spark-2.2.0/
# 单机模式
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local ./examples/jars/spark-examples_2.11-2.2.0.jar 
# yarn-client
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client ./examples/jars/spark-examples_2.11-2.2.0.jar
# yarn-cluster
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster ./examples/jars/spark-examples_2.11-2.2.0.jar

均能成功输出,类似如下 Pi is roughly 3.1348556742783713

...
...
17/12/11 15:11:41 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 4044 ms on slave2 (executor 2) (1/2)
17/12/11 15:11:41 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 227 ms on slave2 (executor 2) (2/2)
17/12/11 15:11:41 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 6.993 s
17/12/11 15:11:41 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 
17/12/11 15:11:41 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 7.857732 s
Pi is roughly 3.1348556742783713
17/12/11 15:11:41 INFO server.AbstractConnector: Stopped Spark@1a15b789{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
17/12/11 15:11:41 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.122.128:4040
17/12/11 15:11:41 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
17/12/11 15:11:41 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
17/12/11 15:11:41 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
17/12/11 15:11:41 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
...
...

网页访问192.168.122.128:8088 ,点击applications标签,能够看到刚刚提交的spark任务的详细信息。

遇到的错误

在yarn上运行的模式,报内存错误

...
...
Warning: Master yarn-cluster is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
17/12/11 10:47:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/12/11 10:47:08 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.122.128:8032
17/12/11 10:47:08 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
17/12/11 10:47:09 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (2048 MB per container)
Exception in thread "main" java.lang.IllegalArgumentException: Required executor memory (2048+384 MB) is above the max threshold (2048 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.
        at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:302)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:166)
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1091)
        at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150)
        at org.apache.spark.deploy.yarn.Client.main(Client.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

根据报错信息,修改配置文件,增大yarn的yarn.scheduler.maximum-allocation-mbyarn.nodemanager.resource.memory-mb 的值。

步骤:

  • 关闭spark和hadoop,确定所有进程完全关闭

  • 修改hadoop的yarn配置文件

    vi /opt/hadoop/hadoop-2.7.3/etc/hadoop/yarn-site.xml
    
    
    # yarn.scheduler.maximum-allocation-mb和yarn.nodemanager.resource.memory-mb
    
    
    # 修改为 4096  具体值根据报错信息提示修改
    
    
    # <property>
    
    
    #         <name>yarn.scheduler.maximum-allocation-mb</name>
    
    
    #         <value>4096</value>
    
    
    #         <discription>每个节点可用内存,单位MB,默认8182MB</discription>
    
    
    #    </property>
    
    
    #    <property>
    
    
    #         <name>yarn.nodemanager.vmem-pmem-ratio</name>
    
    
    #         <value>2.1</value>
    
    
    #    </property>
    
    
    #    <property>
    
    
    #         <name>yarn.nodemanager.resource.memory-mb</name>
    
    
    #         <value>4096</value>
    
    
    #    </property>
    
  • 启动hadoop

  • 启动spark

  • 测试

    cd /opt/spark/spark-2.2.0
    
    # yarn-client
    
    ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client ./examples/jars/spark-examples_2.11-2.2.0.jar
    
    # yarn-cluster
    
    ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster ./examples/jars/spark-examples_2.11-2.2.0.jar

    能够成功输出PI的值


参考:

Linux安装Spark集群(CentOS7+Spark2.1.1+Hadoop2.8.0)

Spark2.1.1中用各种模式运行计算圆周率的官方Demo

修改spark或者hadoop master web ui端口

CentOS下安装Tomcat并配置环境变量(改默认端口8080为8081)

Spark Yarn-cluster与Yarn-client

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值