09-Partitioning

wangyanglongcc

于 2022-04-22 11:24:32 发布

阅读量1.5k

点赞数 1

分类专栏： Azure Databricks in Action 文章标签： Azure Spark databricks

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/qq_33246702/article/details/124341314

版权

Azure Databricks in Action 专栏收录该内容

22 篇文章 0 订阅

订阅专栏

Get partitions and cores

Use an rdd method to get the number of DataFrame partitions

df = spark.read.parquet(eventsPath)
df.rdd.getNumPartitions()

在这里插入图片描述

Access SparkContext through SparkSession to get the number of cores or slots

SparkContext is also provided in Databricks notebooks as the variable sc

print(spark.sparkContext.defaultParallelism)
# print(sc.defaultParallelism)
# return 8

Repartition DataFrame

repartition

Returns a new DataFrame that has exactly n partitions.

repartitionedDF = df.repartition(8)
repartitionedDF.rdd.getNumPartitions()

在这里插入图片描述

coalesce

Returns a new DataFrame that has exactly n partitions, when the fewer partitions are requested

If a larger number of partitions is requested, it will stay at the current number of partitions

coalesceDF = df.coalesce(8)
coalesceDF.rdd.getNumPartitions()

在这里插入图片描述

Configure default shuffle partitions

Use SparkConf to access the spark configuration parameter for default shuffle partitions

spark.conf.get("spark.sql.shuffle.partitions")

在这里插入图片描述

Configure default shuffle partitions to match the number of cores

spark.conf.set("spark.sql.shuffle.partitions", "8")

Adaptive Query Execution

https://databricks.com/blog/2020/05/29/adaptive-query-execution-speeding-up-spark-sql-at-runtime.html

Spark SQL can use spark.sql.adaptive.enabled to control whether AQE is turned on/off (disabled by default)

spark.conf.get("spark.sql.adaptive.enabled")

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论 1

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。