并行度设置
spark.default.parallelism
Default number of partitions in RDDs returned by transformations like
join
,reduceByKey
, andparallelize
when not set by user.
默认值
For distributed shuffle operations like
reduceByKey
andjoin
, the largest number of partitions in a parent RDD. For operations likeparallelize
with no parent RDDs, it depends on the cluster manager:
- Local mode: number of cores on the local machine
- Mesos fine grained mode: 8
- Others: total number of cores on all executor nodes or 2, whichever is larger
参考官方文档 :