Hive concatenate合并表的小文件

Hive concatenate 可以合并小文件,语法如下:

ALTER TABLE/PARTITION XXX CONCATENATE;

事务表

如果是事务表,会触发启动一个 major compaction,并等待 major compaction 结束,测试如下:

  • 创建表并插入数据
create table t7 (c1 int) stored as orc tblproperties('transactional'='true');
insert into t7 values(1);
insert into t7 values(2);
insert into t7 values(3);
  • 查看 文件
drwxr-xr-x   - houzhizhen supergroup          0 2022-03-08 17:10 hdfs://localhost:9000/user/hive/warehouse/test.db/t7/delta_0000001_0000001_0000
drwxr-xr-x   - houzhizhen supergroup          0 2022-03-08 17:11 hdfs://localhost:9000/user/hive/warehouse/test.db/t7/delta_0000002_0000002_0000
drwxr-xr-x   - houzhizhen supergroup          0 2022-03-08 17:11 hdfs://localhost:9000/user/hive/warehouse/test.db/t7/delta_0000003_0000003_0000
  • 执行 concatenate;
    可以看到生成一个 Compaction。
hive> alter table t7 concatenate;
Compaction enqueued with id 803

.
Compaction with id 803 finished with status: succeeded
OK
Time taken: 14.086 seconds
  • Compaction 后的文件
    可以看到只有一个数据文件。
Found 3 items
-rw-r--r--   1 houzhizhen supergroup         48 2022-03-08 17:12 hdfs://localhost:9000/user/hive/warehouse/test.db/t7/base_0000003/_metadata_acid
-rw-r--r--   1 houzhizhen supergroup          1 2022-03-08 17:12 hdfs://localhost:9000/user/hive/warehouse/test.db/t7/base_0000003/_orc_acid_version
-rw-r--r--   1 houzhizhen supergroup        616 2022-03-08 17:12 hdfs://localhost:9000/user/hive/warehouse/test.db/t7/base_0000003/bucket_00000

非事务表

如果是非事务表,会启动一个 Merge 的 Job 提交到计算平台,并等待结束,测试如下:

  • 创建表并插入数据
create table t7 (c1 int) stored as orc;
insert into t7 values(1);
insert into t7 values(2);
insert into t7 values(3);
  • 查看 文件
-rw-r--r--   1 houzhizhen supergroup        188 2022-03-08 17:14 hdfs://localhost:9000/user/hive/warehouse/test.db/t7/000000_0
-rw-r--r--   1 houzhizhen supergroup        188 2022-03-08 17:14 hdfs://localhost:9000/user/hive/warehouse/test.db/t7/000000_0_copy_1
-rw-r--r--   1 houzhizhen supergroup        188 2022-03-08 17:14 hdfs://localhost:9000/user/hive/warehouse/test.db/t7/000000_0_copy_2
  • 执行 concatenate;
    可以看到生成一个Job。
hive> alter table t7 concatenate;
2022-03-08 17:15:06 Running Dag: dag_1646730822691_0001_4
2022-03-08 17:15:06 Running Dag: dag_1646730822691_0001_4
Status: Running (Executing on YARN cluster with App id application_1646730822691_0001)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
File Merge       container  INITIALIZING     -1          0        0       -1       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 00/01  [>>--------------------------] 0%    ELAPSED TIME: 0.00 s     
----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
File Merge ..... container     SUCCEEDED      1          1        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 0.24 s     
----------------------------------------------------------------------------------------------
Status: DAG finished successfully in 0.24 seconds

Query Execution Summary
----------------------------------------------------------------------------------------------
OPERATION                            DURATION
----------------------------------------------------------------------------------------------
Compile Query                           0.07s
Prepare Plan                            0.03s
Get Query Coordinator (AM)              0.00s
Submit Plan                             0.05s
Start DAG                               0.03s
Run DAG                                 0.24s
----------------------------------------------------------------------------------------------

Task Execution Summary
----------------------------------------------------------------------------------------------
  VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
----------------------------------------------------------------------------------------------
File Merge              0.00              0              0               0                0
----------------------------------------------------------------------------------------------

Loading data to table test.t7
Table test.t7 stats: [numFiles=1, numRows=3, totalSize=361, rawDataSize=12]
OK
Time taken: 0.672 seconds
  • Compaction 后的文件
    可以看到只有一个数据文件。
-rw-r--r--   1 houzhizhen supergroup        361 2022-03-08 17:15 hdfs://localhost:9000/user/hive/warehouse/test.db/t7/000000_0
  • 3
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值