hive的udtf函数:inline+实现动态分区+小文件处理+杂记

1、inline 函数说明:
在这里插入图片描述
翻译:将结构数组分解为多行。返回一个包含N列的行集(N=结构中顶级元素的数目),数组中每个结构一行。
场景:源表粒度为 班级,目标表粒度为 班级+老师。
在这里插入图片描述

with t as (
    select 'class1' as class_id, '物联网工程' as class_name,'Anla' as chinese_teacher,200 as chinese_teacher_donate_amount
        ,'Satellite' as math_teacher,500 as math_teacher_donate_amount,'Patty' as english_teacher,288 as english_teacher_donate_amount
    union all
    select 'class2' as class_id, '软件工程' as class_name,'Aron' as chinese_teacher,500 as chinese_teacher_donate_amount
        ,'Lisa' as math_teacher,300 as math_teacher_donate_amount,'Yoga' as english_teacher,168 as english_teacher_donate_amount
)
select
    t.class_id
    ,t.class_name
    ,t_inline.teacher
    ,t_inline.donate_amount
from t 
lateral view inline(array(struct(chinese_teacher,chinese_teacher_donate_amount),
                        struct(math_teacher,math_teacher_donate_amount),
                        struct(english_teacher,english_teacher_donate_amount)
)) t_inline as teacher,donate_amount;

在这里插入图片描述
2、动态分区如何实现:

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict; 
set hive.exec.max.dynamic.partitions=100000; --可选
set hive.exec.max.dynamic.partitions.pernode=100000; --可选
insert overwrite table table_name partition(rand_num)
select field1,...,fieldn,cast(cast(rand() * 10 as int) as string) as rand_num
from t 

这样就实现动态分区了。
注意:
1、distribute by解决的是小文件的问题,而不是实现动态分区,这点很容易混淆。
2、如果分区数超过限定值,会报错:Fatal error occured when node tried to create too many dynamic partitions.
报错则考虑是否设置参数: hive.exec.max.dynamic.partitions、hive.exec.max.dynamic.partitions.pernode

3、小文件如何处理:
3.1、distribute by

insert overwrite table table_name partition(statdate)
distribute by cast(rand() * 10 as int)

这种方法会在分区文件夹statdate=‘xxxx’ 下生成至多10个文件夹。
3.2、repartition,常使用在给一个分区插入数据上

select /*+ repartition(1) */

4、杂记
4.1、sqoop导出新经验:指定分区导出,分区字段得是 string 类型,不然会报错。
4.2、查询字段小数位长度: length(substring_index(cast(field as string),'.',-1))
效果:1.614->3; 12.8292->4

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值