spark1.6.3配置

最新推荐文章于 2021-10-29 14:02:23 发布

鹭岛猥琐男

最新推荐文章于 2021-10-29 14:02:23 发布

阅读量677

点赞数 1

分类专栏：大数据文章标签： spark

本文链接：https://blog.csdn.net/zzw0221/article/details/85237951

版权

大数据专栏收录该内容

37 篇文章 1 订阅

订阅专栏

1.本来想写安装spark2.3的，但是由于配置Hadoop时候jdk用的是1.7，而Spark2.3只支持JDK1.8。如果spark和Hadoop安装的JDK版本不一样，在yarn上运行spark会报错。所以记录的是spark1.x的安装。

2.特别注明下，spark不要用CDH版本，有些jar包找不到，直接用Apache版本就好。

3.解压后的spark 目录如下：

[zuowei.zhang@master spark-1.6.3]$ ll
total 1380
drwxr-xr-x 2 zuowei.zhang zuowei.zhang    4096 Nov  3  2016 bin
-rw-r--r-- 1 zuowei.zhang zuowei.zhang 1343562 Nov  3  2016 CHANGES.txt
drwxr-xr-x 2 zuowei.zhang zuowei.zhang     212 Dec 24 08:35 conf
drwxr-xr-x 3 zuowei.zhang zuowei.zhang      19 Nov  3  2016 data
drwxr-xr-x 3 zuowei.zhang zuowei.zhang      79 Nov  3  2016 ec2
drwxr-xr-x 3 zuowei.zhang zuowei.zhang      17 Nov  3  2016 examples
drwxr-xr-x 2 zuowei.zhang zuowei.zhang     237 Nov  3  2016 lib
-rw-r--r-- 1 zuowei.zhang zuowei.zhang   17352 Nov  3  2016 LICENSE
drwxr-xr-x 2 zuowei.zhang zuowei.zhang    4096 Nov  3  2016 licenses
-rw-r--r-- 1 zuowei.zhang zuowei.zhang   23529 Nov  3  2016 NOTICE
drwxr-xr-x 6 zuowei.zhang zuowei.zhang     119 Nov  3  2016 python
drwxr-xr-x 3 zuowei.zhang zuowei.zhang      17 Nov  3  2016 R
-rw-r--r-- 1 zuowei.zhang zuowei.zhang    3359 Nov  3  2016 README.md
-rw-r--r-- 1 zuowei.zhang zuowei.zhang     120 Nov  3  2016 RELEASE
drwxr-xr-x 2 zuowei.zhang zuowei.zhang    4096 Nov  3  2016 sbin

4.配置conf 目录下slave文件和spark-env.sh文件

slave文件配置work节点：

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# A Spark Worker will be started on each of the machines listed below.
master.cn
slave1.cn
slave2.cn

spark-env.sh文件配置如下:

export JAVA_HOME=/opt/java/jdk1.7.0_67
export SPARK_MASTER_IP=master.cn
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=1
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=1g
#spark on yarn
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/opt/cdh5.15.0/spark-1.6.3
export SPARK_JAR=/opt/cdh5.15.0/spark-1.6.3/lib/spark-assembly-1.6.3-hadoop2.6.0.jar
export PATH=$SPARK_HOME/bin:$PATH

3.将spark文件分发到各个节点：

scp -r /opt/cdh5.15.0/spark-1.6.3/ slave1.cn:/opt/cdh5.15.0/

4.spark运行在yarn上例子：

client模式：结果在xshell上可见

bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --executor-memory 1G --num-executors 1 lib/spark-examples-1.6.3-hadoop2.6.0.jar 100

cluster模式：结果在8088界面可见

bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --executor-memory 1G --num-executors 1 lib/spark-examples-1.6.3-hadoop2.6.0.jar 100

5.spark配置HistoryServer,并将如下配置分发到各个节点。

配置conf 目录下的spark-defaults.conf

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.

# Example:
# spark.master                     spark://master:7077
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://master.cn:8020/sparklog
# spark.serializer                 org.apache.spark.serializer.KryoSerializer
# spark.driver.memory              5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

配置spark-env.sh:

#HistoryServer
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=30 -Dspark.history.fs.logDirectory=hdfs://master.cn:8020/sparklog"

6.在任意一台节点上启动start-history-server.sh，在对应节点的UI：http://master.cn:18080/即可查看