spark和python的关系_spark程序与参数的关系

What is spark.python.worker.memory?

Spark on YARN resource manager: Relation between YARN Containers and Spark Executors?

When running Spark on YARN, each Spark executor runs as a YARN container

所以有,--executor-memory <= yarn.scheduler.maximum-allocation-mb(一个container的最大值)

同时有:

yarn.scheduler.maximum-allocation-mb <= yarn.nodemanager.resource.memory-mb (每个节点yarn可以使用的内存资源上线)

所以最终三者之间的关系为:

--executor-memory <= yarn.scheduler.maximum-allocation-mb(一个container的最大值) <= yarn.nodemanager.resource.memory-mb (每个节点yarn可以使用的内存资源上线)

可以启动的executor数量:

- execuoterNum = spark.cores.max/spark.executor.cores

每个executor上可以执行多少个task

- taskNum = spark.executor.cores/ spark.task.cpus

spark.python.worker.memory is a subset of the memory from spark.executor.memory

spark.python.worker.memory is used for Python worker in executor

spark.python.worker.memory <= spark.executor.memory(--executor-memory)

Because of GIL, pyspark use multiple python process in the executor, one for each task.

spark.python.worker.memory will tell the python worker to when to

spill the data into disk.

If you have enough memory in executor, increase spark.python.worker.memory will

let python worker to use more memory during shuffle.which will increase the performance.

综上, pyspark运行时会在executor中起多个python进程task,每个task多少内存由spark.python.worker.memory控制

那么什么参数控制一个executor中其多少个python-task?只需要控制spark.python.worker.memory就可以吗?会exector/worker?

一个Executor上同时运行多少个Task,就会有多少个对应的pyspark.worker进程

spark.yarn.executor.memoryoverhead 的内存从哪里去?与spark.executor.memory和container的关系是什么?

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值