DataWorks的使用到的函数

最新推荐文章于 2023-03-09 17:29:49 发布

唐三水

最新推荐文章于 2023-03-09 17:29:49 发布

阅读量2.6k

点赞数

文章标签：大数据

本文链接：https://blog.csdn.net/Yuli_li/article/details/118381427

版权

常用函数：
coalesce（expr1，expr2）返回列表中第一个非null的值
concat（string A，string B）将参数中的所有字符串连接在一起如果没有参数或者有某个参数为null就返回null
cast（expr as ）将表达式的结果转换成目标类型
ROUND(column_name,decimals) 函数用于把数值字段（column_name）舍入为指定的小数位数（decimals）
trans_cols(num_keys,key1,key2…,col1,col2) 将一行数据转化多行数据，将不同的列转化为行属于UDTF（输入一行数据，输出多行数据）例如：一行数据A B C D转化为ABC ABD 则写成trans_cols(2,A,B,C,D) as (idx,A,B,key)
ascii(expr) 将参数字符串的第一个字符转化成ascii码值，ascii(’’) = 0
decode(expr1,expr2,expr3,[expr4,expr5]expr6) 代替if-else if-else结构，if expr1 = expr2 then expr3 else if expr1=expr4 then expr5… else expr6
rpad(string str, int len, string pad) :
返回指定长度（len）的字符串，给定字符串（str）长度小于指定长度时，由指定字符（pad）从右侧填补。例如：rpad(‘12345’, 6, ‘0’)返回为’123456’

常用关键字：
having：MaxCompute SQL的WHERE关键字无法与合计函数一起使用，此时您可以使用HAVING子句来实现。例如：
SELECT Customer,SUM(OrderPrice) FROM Orders
GROUP BY Customer
HAVING SUM(OrderPrice)<2000
Left Outer Join：左连接。返回左表中的所有记录，即使右表中没有与之匹配的记录。
Right Outer Join：右连接。返回右表中的所有记录，即使左表中没有与之匹配的记录。
Full Outer Join：全连接。返回左右表中的所有记录。
Inner Join：内连接，关键字inner可以省略。表中存在至少一个匹配时，inner join返回行。
delete：删除表的内容，表的结构还存在，不释放空间，可以回滚恢复；
drop：删除表内容和结构，释放空间，没有备份表之前要慎用；
truncate：删除表的内容，表的结构存在，可以释放空间,没有备份表之前要慎用；drop > truncate > delete

原文链接：https://blog.csdn.net/yuuuu1214/article/details/105082242

后续补充：

PARTITION 分区；

eg:

insert OVERWRITE table xx_portrait PARTITION(type="goods",entity_portrait="sex",data_type=0)

select

lg.id,lg.entity_portrait_id,dic.remark as entity_portrait_name,lg.entity_portrait_num,lgt.entity_portrait_total,su.portrait_num,sut.portrait_total

,round((lg.entity_portrait_num/lgt.entity_portrait_total)/(su.portrait_num/sut.portrait_total),1) as tgi

,round((lg.entity_portrait_num/lgt.entity_portrait_total)*100,1) as percentage

,${time_type} as time_type

from

with as 相当于新建临时表

--相当于建了个e临时表 with e as (select * from scott.emp e where e.empno=7499)

WITH AS短语，也叫做子查询部分，定义一个SQL片断后，该SQL片断可以被整个SQL语句所用到。
With查询语句不是以select开始的，而是以“WITH”关键字开头，可认为在真正进行查询之前预先构造了一个临时表，之后便可多次使用它做进一步的分析和处理

2.WITH 方法的优点
1）增加了SQL的易读性，如果构造了多个子查询，结构会更清晰；更重要的是：“一次分析，多次使用”，这也是为什么会提供性能的地方，达到了“少读”的目标。
2）第一种使用子查询的方法表被扫描了两次，而使用WITH 方法，表仅被扫描一次。这样可以大大的提高数据分析和查询的效率。另外，观察WITH Clause方法执行计划，其中“SYS_TEMP_XXXX”便是在运行过程中构造的中间统计结果临时表。

使用范例：

爆炸函数:

explode就是将hive一行中复杂的array或者map结构拆分成多行。

其中每个商品都有自己的ingredient_ids,claim_ids,但是为了按照商品的品牌维度来汇总，同一个brand_id拥有不同的goods_id,合并的成分ingredient_ids和claim_ids聚合之后有重复的ingredient_id，claim_id 导致合并的数据重复，为了解决这个重复数组的问题，使用了爆炸函数，

SELECT goods_temp.user_id,

concat_ws(',',collect_set(tag)) as ingredient_ids,

concat_ws(',',collect_set(tag1)) as claim_ids

FROM (

select

COALESCE(goods_day.user_id,0) user_id,----聚合

COALESCE(bebd_goods.brand_id,0) brand_id,----聚合

COALESCE(bebd_goods.category2_id,0) category2_id,---拼接

COALESCE(bebd_goods.category3_id,0) category3_id,

COALESCE(bebd_goods.claim_ids,"") claim_ids,--拼接

COALESCE(bebd_goods.ingredient_ids,"") ingredient_ids,--拼接

goods_day.sum_hit_count as good_sum_count,

COALESCE(bebd_goods.sda_approval_valid,0) sda_approval_valid, ----聚合

goods_day.visit_week ----聚合

from (

select id,user_id,visit_week,sum(hit_count) sum_hit_count

from (

select id,

user_id,

CONCAT(year(DATEADD(to_date(next_day(TO_DATE(pt,'yyyymmdd'),'MO'),'yyyy-mm-dd'),-4,'dd')) ,if(weekofyear(to_date(pt, "yyyymmdd"))<10,concat('0',weekofyear(to_date(pt, "yyyymmdd"))),weekofyear(to_date(pt, "yyyymmdd")))) visit_week,

hit_count

from bebd_goods_focus_user_day

where CAST(SUBSTR(pt,1,6) as int) = 202108 and user_id=1707073

) temp_day

GROUP by id,user_id,visit_week

) goods_day

left join

(select id,brand_id,category2_id,category3_id,claim_ids,ingredient_ids,flag,sda_approval_valid ,COALESCE(name_ch,name_en) goods_name

from bebd_goods where bebd_goods.pt=${bdp.system.bizdate} and brand_id =6856

and bebd_goods.flag=0

) bebd_goods on goods_day.id = bebd_goods.id

) goods_temp

LATERAL VIEW OUTER EXPLODE(split(goods_temp.ingredient_ids,",")) t AS tag

LATERAL VIEW OUTER EXPLODE(split(goods_temp.claim_ids,",")) t AS tag1

group by user_id,brand_id;

ateral view用于和split, explode等UDTF一起使用，它能够将一行数据拆成多行数据，在此基础上可以对拆分后的数据进行聚合。lateral view首先为原始表的每行调用UDTF，UDTF会把一行拆分成一或者多行，lateral view再把结果组合，产生一个支持别名表的虚拟表。

由此可见，lateral view与explode等udtf就是天生好搭档，explode将复杂结构一行拆成多行，然后再用lateral view做各种聚合。

唐三水

关注

0
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
DataWorks的使用到的函数

常用函数：coalesce（expr1，expr2）返回列表中第一个非null的值concat（string A，string B）将参数中的所有字符串连接在一起如果没有参数或者有某个参数为null就返回nullcast（expr as ）将表达式的结果转换成目标类型ROUND(column_name,decimals) 函数用于把数值字段（column_name）舍入为指定的小数位数（decimals）trans_cols(num_keys,key1,key2…,col1,col2) 将
复制链接

扫一扫