Spark SQL解析过程

1.Spark SQL中Join的分类

当前SparkSQL支持三种Join算法-shuffle hash join、broadcast hash join以及sort merge join。其中前两者归根到底都属于hash join,只不过在hash join之前需要先shuffle还是先broadcast。对于broadcast join模式,会将小于spark.sql.autoBroadcastJoinThreshold值(默认为10M)的表广播到其他计算节点,不走shuffle过程,所以会更加高效。

2.Using Catalyst in Spark SQL
Catalyst 优化器执行过程
We use Catalyst’s general tree transformation framework in four phases, as shown below:
(1) analyzing a logical plan to resolve references,
(2) logical plan optimization,
(3) physical planning
(4) code generation to compile parts of the query to Java bytecode. In the physical planning phase, Catalyst may generate multiple plans and compare them based on cost. All other phases are purely rule-based. Each phase uses different types of tree nodes; Catalyst includes libraries of nodes for expressions, data types, and logical and physical operators. We now describe each of these phases.

Spark SQL解析过程

http://hbasefly.com/2017/03/01/sparksql-catalyst/
https://spark-packages.org/package/neo4j-contrib/neo4j-spark-connector
http://blog.csdn.net/qq_21050291/article/details/78037883?locationNum=10&fps=1

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值