数据表:
表名:default.class_info,分别是班级,姓名,成绩 3列
单列Explode:
需求:将student这一列中数据由一行变为多行(使用split和explode,并结合lateral view函数实现)
select
class,student_name
from
default.class_info
lateral view explode(split(student,',')) t as student_name
结果如下:
单列Posexplode
需求:想要给每班的每个同学按照顺序来一个编号(使用posexplode函数)
select
class,student_index + 1 as student_index,student_name
from
default.class_info
lateral view posexplode(split(student,',')) t as student_index,student_name;
注意:student_index+1 是因为index是从0开始的
结果如下:
多列Explode
需求:基于学生姓名和分数使其两两匹配,期望得到如下效果。
尝试: 先对两列进行explode
select
class,student_name,student_score
from
default.class_info
lateral view explode(split(student,',')) sn as student_name
lateral view explode(split(score,',')) sc as student_score
结果如下:
不太符合预期,如果对两列都进行explode的话,假设每列都有3个值,最终会变为3*3=9行,但我们实际只想要3行
解决办法:
我们进行两次posexplode,姓名和成绩都保留对应的序号,即便是变成了9行,通过where筛选只保留行号相同的index即可。
select
class,student_name,student_score
from
default.class_info
lateral view posexplode(split(student,',')) sn as student_index_sn,student_name
lateral view posexplode(split(score,',')) sc as student_index_sc,student_score
where
student_index_sn = student_index_sc;
结果如下:
需求:假设我们又想对同学的成绩进行一下排名(借助rank( ) 函数 )
select
class,
student_name,
student_score,
rank() over(partition by class order by student_score desc) as student_rank
from
default.class_info
lateral view posexplode(split(student,',')) sn as student_index_sn,student_name
lateral view posexplode(split(score,',')) sc as student_index_sc,student_score
where
student_index_sn = student_index_sc
order by class,student_rank;
结果如下:
补充:
若没有spilt函数,可能会用到array()
lateral view
posexplode (array()) as as f_keys,f_values
user_id | bus_ | bike_ | taxi_ | train_ |
1001 | 503 | 89 | 708 | 2054 |
1002 | 24 | 17 | 1008 | 500 |
1003 | 80 | 50 | 500 | 400 |
user_id | pay_type | Amount |
1001 | 1 | 503 |
1001 | 2 | 89 |
1001 | 3 | 708 |
1001 | 4 | 2054 |
select
user_id,
index_+1 as pay_type,
value as amount
from pay_record_log
lateral view
posexplode(ARRAY(bus_,bike_,taxi_,train_))t as index_,value