spark Join的中where的筛选与join的on 条件筛选探究

最新推荐文章于 2024-04-26 15:25:18 发布

芹菜学长

最新推荐文章于 2024-04-26 15:25:18 发布

阅读量2.1k

点赞数 2

分类专栏： Spark程序

本文链接：https://blog.csdn.net/OldDirverHelpMe/article/details/113533002

版权

问题背景

因为最近在利用spark SQL 进行数据的处理，在做两表进行join操作的时候，在join过程中，想探究数据的筛选方式是否会对执行速度有一定的影响。

探究过程

数据准备

create table stu(
    id int ,  --唯一id
    name string,  -- 姓名
    subject string, -- 学科
    score int   --分数
)

插入数据，数据如下

stu 表:
+---+----+-------+-----+
| id|name|subject|score|
+---+----+-------+-----+
|  1|张三|   math|   50|
|  2|张三|English|   70|
|  3|张三|Chinese|   80|
|  4|李四|   math|   80|
|  5|李四|English|   40|
|  6|李四|Chinese|   60|
+---+----+-------+-----+

准备好teacher表

create table teacher(
 id int,   -- 唯一id
 name string, -- 姓名
 subject string  -- 教授学科
)

插入数据

+---+--------+-------+
| id|    name|subject|
+---+--------+-------+
|  1|    jack|   math|
|  2|    tony|English|
|  3|ZhaoGang|Chinese|
+---+--------+-------+

提出问题

如果实现下述操作: 选取出每个学生的每个科目对应的老师
那么，针对这个问题的答案：

    select s.id,
          s.name,
          s.subject,
          s.score,
          t.name
     from stu s
left join teacher t
       on s.subject=t.subject

结果如下:

+---+----+-------+-----+--------+
| id|name|subject|score|    name|
+---+----+-------+-----+--------+
|  3|张三|Chinese|   80|ZhaoGang|
|  5|李四|English|   40|    tony|
|  2|张三|English|   70|    tony|
|  6|李四|Chinese|   60|ZhaoGang|
|  4|李四|   math|   80|    jack|
|  1|张三|   math|   50|    jack|
+---+----+-------+-----+--------+

查看这条sql的执行计划，我们使用explain函数，下同：

== Physical Plan ==
*(2) Project [id#1173, name#1174, subject#1175, score#1176, name#1310]
+- *(2) BroadcastHashJoin [subject#1175], [subject#1311], LeftOuter, BuildRight
  :- *(2) ColumnarToRow
  :  +- FileScan parquet default.stu[id#1173,name#1174,subject#1175,s

最低0.47元/天解锁文章

芹菜学长

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
spark Join的中where的筛选与join的on 条件筛选探究

问题背景因为最近在利用spark SQL 进行数据的处理，在做两表进行join操作的时候。在join过程中，想探究数据的筛选方式是否会对执行速度有一定的影响。探究过程数据准备create table stu( id int , --唯一id name string, -- 姓名 subject string, -- 学科 score int --分数)插入数据，数据如下stu 表:+---+----+-------+-----+| id|name
复制链接

扫一扫

专栏目录