Spark编程指南之五:Spark集群相关参数配置

Spark配置优先级

代码优先级最高->提交时spark-submit次之->集群配置文件spark-defaults.conf最低
spark-defaults.conf默认内容如下:

# Example:
# spark.master                     spark://master:7077
# spark.eventLog.enabled           true
# spark.eventLog.dir               hdfs://namenode:8021/directory
# spark.serializer                 org.apache.spark.serializer.KryoSerializer
# spark.driver.memory              5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

下面从优先级从低到高的顺序讲解。

集群配置文件相关配置

spark-env.sh

spark-env.sh可以配置Work、Executor的相关配置
原始文件如下:

# Options read in YARN client/cluster mode
# - SPARK_CONF_DIR, Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - YARN_CONF_DIR, to point Spark towards YARN configuration files when you use YARN
# - SPARK_EXECUTOR_CORES, Number of cores for the executors (Default: 1).
# - SPARK_EXECUTOR_MEMORY, Memory per Executor (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_DRIVER_MEMORY, Memory for Driver (e.g. 1000M, 2G) (Default: 1G)
# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)

例如,可以配置如下信息:

SPARK_WORKER_CORES=1
SPARK_WORKER_MEMORY=800m
SPARK_WORKER_INSTANCES=2

Finally, note that the Java VM does not always behave well with more than 200 GB of RAM. If you purchase machines with more RAM than this, you can run multiple worker JVMs per node. In Spark’s standalone mode, you can set the number of workers per node with the SPARK_WORKER_INSTANCES variable in conf/spark-env.sh, and the number of cores per worker with SPARK_WORKER_CORES
单位如下:

1b (bytes)
1k or 1kb (kibibytes = 1024 bytes)
1m or 1mb (mebibytes = 1024 kibibytes)
1g or 1gb (gibibytes = 1024 mebibytes)
1t or 1tb (tebibytes = 1024 gibibytes)
1p or 1pb (pebibytes = 1024 tebibytes)

注:配置后要将文件发送到集群其他节点。

spark-defaults.conf

spark.executor.cores 核
spark.executor.memory 内存
spark.default.parallelism 并行度
spark.broadcast.blockSize 广播块大小,默认4m.

spark-submit相关配置

可以配置executor-memory、executor-cores等。
配置项前面都是两个-,或者使用–conf
–executor-memory 1200m 或者
–conf spark.executor.memory=1200m
举例

./bin/spark-submit --master local[2] --driver-memory 1g --executor-memory 1g --executor-cores 1 --num-executors 3 --class org.apache.spark.examples.JavaSparkPi  examples/jars/spark-examples_2.11-2.1.3.jar

代码相关配置

val conf=new SparkConf()
conf.setAppName("example")
conf.set("spark.executor.cores","1")
conf.set("spark.executor.memory","800m")

参考链接

http://spark.apache.org/docs/latest/configuration.html
http://spark.apache.org/docs/latest/hardware-provisioning.html

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值