Hive 小文件合并方案
1、创建 Hive 分区表,以天为分区
create external table table_partition_day(
c_id int,
c_name string,
c_gender string,
c_birthday string,
c_flag int
partitioned by (day string)
row format delimited fields terminated by','
stored as parquet;
2、插入数据
insert into table_partition_day partition(day="20211007") values(100,"zhangsan","male","1999-09-09"1);
insert into table_partition_day partition(day="20211007") values(101,"lisi","male","2000-09-09"1);
insert into table_partition_day partition(day="20211007") values(102,"wangwu","fmale","2001-09-09"1);
每插入一条数据就会生成一个小文件
3、按照天分区合并小文件
set hive.exec.reducers.max=1,
set hive.merge.mapredfiles=true;
set hive.exec.reducers.bytes.per.reducer=128000000;
insert overwrite table table_partition_day partition(day="20211007")
select c_id ,
c_name ,
c_gender ,
c_birthday ,
c_flag
from table_partition_day
where day = "20211007"
合并成了一个文件。
5141

被折叠的 条评论
为什么被折叠?



