实践数据湖iceberg 第九课 合并小文件

38 篇文章 16 订阅

系列文章目录

实践数据湖iceberg 第一课 入门
实践数据湖iceberg 第二课 iceberg基于hadoop的底层数据格式
实践数据湖iceberg 第三课 在sqlclient中,以sql方式从kafka读数据到iceberg
实践数据湖iceberg 第四课 在sqlclient中,以sql方式从kafka读数据到iceberg(升级版本到flink1.12.7)
实践数据湖iceberg 第五课 hive catalog特点
实践数据湖iceberg 第六课 从kafka写入到iceberg失败问题 解决
实践数据湖iceberg 第七课 实时写入到iceberg
实践数据湖iceberg 第八课 hive与iceberg集成
实践数据湖iceberg 第九课 合并小文件
实践数据湖iceberg 第十课 快照删除



前言

flink的checkpoint时间间隔是1分钟,意味着,每分钟向底层文件系统输出数据,一小时60分钟,24小时=60*24 = 1440分钟,
就是每天至少生成1440个文件(单并行度下)。

如何处理这些小文件,合并是唯一的方案,本文讲讲作者合并过程,走过的坑。。。


1. 官网合并小文件代码

<font color=#999AAA
参考官网 https://iceberg.apache.org/#flink/#_top

import org.apache.iceberg.flink.actions.Actions; 
TableLoader tableLoader = TableLoader.fromHadoopTable("hdfs://nn:8020/warehouse/path"); 
Table table = tableLoader.loadTable(); 
RewriteDataFilesActionResult result = Actions.forTable(table) .rewriteDataFiles() .execute();

吐槽:经过测试,代码不可行。
表分为hiveCatalog和hadoopCatalog,上面例子是hadoopCatalog. 官网没有hiveCatalog的说明,是不是吐血?

2. 讲讲踩的坑

2.1 PartitionExpressionForMetastore 类找不到

代码如下(示例):

22/01/26 15:11:07 ERROR metastore.RetryingHMSHandler: java.lang.RuntimeException: Error loading PartitionExpressionProxy: org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore class not found
	at org.apache.hadoop.hive.metastore.ObjectStore.createExpressionProxy(ObjectStore.java:434)
	at org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:408)
	at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:342)
	at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:303)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:594)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
	at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6902)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:164)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:129)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstanceChecked(DynConstructors.java:60)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstance(DynConstructors.java:73)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:53)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:32)
	at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:118)
	at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:49)
	at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:76)
	at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:181)
	at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:94)
	at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:77)
	at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:93)
	at org.apache.iceberg.flink.TableLoader$CatalogTableLoader.loadTable(TableLoader.java:113)
	at org.example.FlinkDataStreamSmallFileCompactTest$.main(FlinkDataStreamSmallFileCompactTest.scala:59)
	at org.example.FlinkDataStreamSmallFileCompactTest.main(FlinkDataStreamSmallFileCompactTest.scala)

导入hive源码,查看是哪个包

在这里插入图片描述

解决方法,pom里面引进:

      <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>2.3.6</version>
        </dependency>

又出新问题

java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at javax.jdo.JDOHelper$18.run(JDOHelper.java:2018)
	at javax.jdo.JDOHelper$18.run(JDOHelper.java:2016)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.jdo.JDOHelper.forName(JDOHelper.java:2015)
	at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1162)
	at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
	at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
	at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:521)
	at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:550)
	at org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:405)
	at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:342)
	at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:303)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:594)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
	at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6902)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:164)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:129)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstanceChecked(DynConstructors.java:60)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstance(DynConstructors.java:73)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:53)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:32)
	at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:118)
	at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:49)
	at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:76)
	at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:181)
	at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:94)
	at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:77)
	at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:93)
	at org.apache.iceberg.flink.TableLoader$CatalogTableLoader.loadTable(TableLoader.java:113)
	at org.example.FlinkDataStreamSmallFileCompactTest$.main(FlinkDataStreamSmallFileCompactTest.scala:59)
	at org.example.FlinkDataStreamSmallFileCompactTest.main(FlinkDataStreamSmallFileCompactTest.scala)
22/01/26 15:32:18 INFO metastore.HiveMetaStore: 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore

在idea中查了一下,还是hive-exec引进来的

        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>2.3.6</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-metastore</artifactId>
            <version>2.3.6</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive</artifactId>
            <version>2.3.6</version>
        </dependency>

重跑:

22/01/26 15:39:07 WARN DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics and subclasses resulted in no possible candidates
Required table missing : "CDS" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : "CDS" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
	at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:606)
	at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3385)
	at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2896)
	at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
	at org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1627)
	at org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:672)
	at org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:425)
	at org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:865)
	at org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:347)
	at org.datanucleus.store.query.Query.executeQuery(Query.java:1816)
	at org.datanucleus.store.query.Query.executeWithArray(Query.java:1744)
	at org.datanucleus.store.query.Query.execute(Query.java:1726)
	at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:374)
	at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:216)
	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.ensureDbInit(MetaStoreDirectSql.java:187)
	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.<init>(MetaStoreDirectSql.java:144)
	at org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:410)
	at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:342)
	at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:303)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:594)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
	at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6902)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:164)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:129)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstanceChecked(DynConstructors.java:60)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstance(DynConstructors.java:73)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:53)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:32)
	at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:118)
	at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:49)
	at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:76)
	at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:181)
	at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:94)
	at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:77)
	at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:93)
	at org.apache.iceberg.flink.TableLoader$CatalogTableLoader.loadTable(TableLoader.java:113)
	at org.example.FlinkDataStreamSmallFileCompactTest$.main(FlinkDataStreamSmallFileCompactTest.scala:59)
	at org.example.FlinkDataStreamSmallFileCompactTest.main(FlinkDataStreamSmallFileCompactTest.scala)

2.2 metastore 连接不上

22/01/26 16:02:01 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
22/01/26 16:02:01 WARN DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MDatabase and subclasses resulted in no possible candidates
Required table missing : "DBS" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : "DBS" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
	at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:606)
	at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3385)
	at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2896)
	at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
	at org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1627)
	at org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:672)
	at org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:425)
	at org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:865)
	at org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:347)
	at org.datanucleus.store.query.Query.executeQuery(Query.java:1816)
	at org.datanucleus.store.query.Query.executeWithArray(Query.java:1744)
	at org.datanucleus.store.query.Query.execute(Query.java:1726)
	at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:374)
	at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:216)
	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.ensureDbInit(MetaStoreDirectSql.java:181)
	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.<init>(MetaStoreDirectSql.java:144)
	at org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:410)
	at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:342)
	at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:303)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:594)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
	at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6902)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:164)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:129)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstanceChecked(DynConstructors.java:60)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstance(DynConstructors.java:73)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:53)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:32)
	at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:118)
	at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:49)
	at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:76)
	at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:181)
	at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:94)
	at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:77)
	at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:93)
	at org.apache.iceberg.flink.TableLoader$CatalogTableLoader.loadTable(TableLoader.java:113)
	at org.example.FlinkDataStreamSmallFileCompactTest$.main(FlinkDataStreamSmallFileCompactTest.scala:59)
	at org.example.FlinkDataStreamSmallFileCompactTest.main(FlinkDataStreamSmallFileCompactTest.scala)
22/01/26 16:02:01 WARN DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MTableColumnStatistics and subclasses resulted in no possible candidates

到数据库看看是否有这个表
在这里插入图片描述
确定有这个表。

现在的问题就是,客户端连的mestore,到db这一层有问题。

2.3 5.1.5-jhyde 的包下载不了

Could not find artifact org.pentaho:pentaho-aggdesigner-algorithm:pom:5.1.5-jhyde in nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public)

在这里插入图片描述
这个包,怎么下,都下不了,改了maven几次仓库,阿里的,默认的都不行。百度一下,发现很多人有这问题,这个包还在CSDN上卖。

最后我是怎么解决的?查我老笔记本,从里面的本地仓库取了出来(第二天工作才想到这么整,前一天估计被整傻了,没想到)。

在这里插入图片描述

2.4 还是包的问题(org/apache/avro/Conversion)

22/01/27 11:23:30 INFO iceberg.BaseMetastoreTableOperations: Refreshing table metadata from new version: hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08706-9c3378aa-21cb-48bf-be52-70b25ea59308.metadata.json
22/01/27 11:23:37 INFO iceberg.BaseMetastoreCatalog: Table loaded by catalog: hive_catalog6.iceberg_db6.behavior_log_ib6
22/01/27 11:23:37 INFO iceberg.BaseTableScan: Scanning table hive_catalog6.iceberg_db6.behavior_log_ib6 snapshot 5656651741571290188 created at 2022-01-26 11:35:33.592 with filter true
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/avro/Conversion
	at org.apache.iceberg.ManifestLists.read(ManifestLists.java:37)
	at org.apache.iceberg.BaseSnapshot.cacheManifests(BaseSnapshot.java:137)
	at org.apache.iceberg.BaseSnapshot.dataManifests(BaseSnapshot.java:159)
	at org.apache.iceberg.DataTableScan.planFiles(DataTableScan.java:74)
	at org.apache.iceberg.BaseTableScan.planFiles(BaseTableScan.java:208)
	at org.apache.iceberg.DataTableScan.planFiles(DataTableScan.java:28)
	at org.apache.iceberg.actions.BaseRewriteDataFilesAction.execute(BaseRewriteDataFilesAction.java:211)
	at org.example.FlinkDataStreamSmallFileCompactTest$.main(FlinkDataStreamSmallFileCompactTest.scala:64)
	at org.example.FlinkDataStreamSmallFileCompactTest.main(FlinkDataStreamSmallFileCompactTest.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.avro.Conversion
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 9 more

3.成功运行

3.1 成功运行(更新pom)

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>iceberg-learning</artifactId>
    <version>1.0-SNAPSHOT</version>


    <properties>
        <!-- project compiler -->
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <!-- maven compiler-->
        <scala.maven.plugin.version>3.2.2</scala.maven.plugin.version>
        <maven.compiler.plugin.version>3.8.1</maven.compiler.plugin.version>
        <maven.assembly.plugin.version>3.1.1</maven.assembly.plugin.version>
        <!-- sdk -->
        <java.version>1.8</java.version>
        <scala.version>2.12.12</scala.version>
        <scala.binary.version>2.12</scala.binary.version>
        <!-- engine-->
        <hadoop.version>2.7.2</hadoop.version>
        <flink.version>1.12.7</flink.version>
        <flink.cdc.version>2.0.2</flink.cdc.version>
        <iceberg.version>0.12.1</iceberg.version>
        <hive.version>2.3.6</hive.version>
        <!-- <scope.type>provided</scope.type>-->
        <scope.type>compile</scope.type>
    </properties>
    <dependencies>
    <!-- scala -->
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>${scala.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <!-- flink Dependency -->
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-runtime-web_${scala.binary.version}</artifactId>
        <version>${flink.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-core</artifactId>
        <version>${flink.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-scala_${scala.binary.version}</artifactId>
        <version>${flink.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-table-common</artifactId>
        <version>${flink.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-table-api-scala-bridge_${scala.binary.version}
        </artifactId>
        <version>${flink.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-streaming-scala_${scala.binary.version}
        </artifactId>
        <version>${flink.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <!-- <= 1.13 -->
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-table-planner-blink_${scala.binary.version}
        </artifactId>
        <version>${flink.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <!-- 1.14 -->
    <!-- <dependency>-->
    <!-- <groupId>org.apache.flink</groupId>-->
    <!-- <artifactId>flink-table-planner_${scala.binary.version}
    </artifactId>-->
    <!-- <version>${flink.version}</version>-->
    <!-- <scope>${scope.type}</scope>-->
    <!-- </dependency>-->
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-clients_${scala.binary.version}</artifactId>
        <version>${flink.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-csv</artifactId>
        <version>${flink.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-json</artifactId>
        <version>${flink.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-orc_${scala.binary.version}</artifactId>
        <version>${flink.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-statebackend-rocksdb_${scala.binary.version}
        </artifactId>
        <version>${flink.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-sql-connector-kafka_${scala.binary.version}
        </artifactId>
        <version>${flink.version}</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-connector-hive_${scala.binary.version}
        </artifactId>
        <version>${flink.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <dependency>
        <groupId>com.ververica</groupId>
        <artifactId>flink-sql-connector-mysql-cdc</artifactId>
        <version>${flink.cdc.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <!-- iceberg Dependency -->
    <dependency>
        <groupId>org.apache.iceberg</groupId>
        <artifactId>iceberg-flink-runtime</artifactId>
        <version>${iceberg.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <!-- hadoop Dependency-->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>${hadoop.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>${hadoop.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>${hadoop.version}</version>
        <scope>${scope.type}</scope>
    </dependency>
    <!-- hive Dependency-->
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-exec</artifactId>
        <version>${hive.version}</version>
        <scope>${scope.type}</scope>
        <exclusions>
            <exclusion>
                <groupId>org.apache.logging.log4j</groupId>
                <artifactId>log4j-slf4j-impl</artifactId>
            </exclusion>
            <exclusion>
                <groupId>org.apache.hive</groupId>
                <artifactId>hive-llap-tez</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>org.antlr</groupId>
        <artifactId>antlr-runtime</artifactId>
        <version>3.5.2</version>
    </dependency>
    </dependencies>
    <build>
        <pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
            <plugins>
                <!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle -->
                <plugin>
                    <artifactId>maven-clean-plugin</artifactId>
                    <version>3.1.0</version>
                </plugin>
                <!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
                <plugin>
                    <artifactId>maven-resources-plugin</artifactId>
                    <version>3.0.2</version>
                </plugin>
                <plugin>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>3.8.0</version>
                </plugin>
                <plugin>
                    <artifactId>maven-surefire-plugin</artifactId>
                    <version>2.22.1</version>
                </plugin>
                <plugin>
                    <artifactId>maven-jar-plugin</artifactId>
                    <version>3.0.2</version>
                    <configuration>
                        <archive>
                            <manifest>
                                <mainClass>org.example.GenerateLog</mainClass>
                            </manifest>
                        </archive>
                    </configuration>
                </plugin>
                <plugin>
                    <artifactId>maven-install-plugin</artifactId>
                    <version>2.5.2</version>
                </plugin>
                <plugin>
                    <artifactId>maven-deploy-plugin</artifactId>
                    <version>2.8.2</version>
                </plugin>
                <!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
                <plugin>
                    <artifactId>maven-site-plugin</artifactId>
                    <version>3.7.1</version>
                </plugin>
                <plugin>
                    <artifactId>maven-project-info-reports-plugin</artifactId>
                    <version>3.0.0</version>
                </plugin>
            </plugins>
        </pluginManagement>
    </build>
</project>

3.2 重写代码,运行前后,比对文件个数变化

3.2.1 合并前,数据文件个数:

data目录是存放数据文件,metadata目录存放元数据

数据目录:
[root@hadoop103 hadoop]# hadoop fs -ls hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data|wc
  17960  143675 3573859
元数据目录:
[root@hadoop103 hadoop]# hadoop fs -ls hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata |wc
  26216  209723 5267149

3.2.2 合并后,数据文件个数变多了,为什么?:

合并报错(执行成功了,看下面日志)

数据目录:
[root@hadoop103 hadoop]# hadoop fs -ls hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data |wc
  17977  143811 3577242
元数据目录:
[root@hadoop103 hadoop]# hadoop fs -ls hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata |wc
  26220  209755 5267941

3.2.3 分析合并后的数据文件

检查,正常的文件,是每分钟生成一个文件(执行合并程序前,2022-01-26 我已经把它停止了):

-rw-r--r--   2 root supergroup        685 2022-01-25 14:42 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07453.parquet
-rw-r--r--   2 root supergroup        683 2022-01-25 14:43 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07454.parquet
-rw-r--r--   2 root supergroup        686 2022-01-25 14:44 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07455.parquet
-rw-r--r--   2 root supergroup        682 2022-01-25 14:45 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07456.parquet
-rw-r--r--   2 root supergroup        686 2022-01-25 14:46 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07457.parquet
-rw-r--r--   2 root supergroup        684 2022-01-25 14:47 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07458.parquet
-rw-r--r--   2 root supergroup        683 2022-01-25 14:48 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07459.parquet
-rw-r--r--   2 root supergroup        685 2022-01-25 14:49 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07460.parquet
-rw-r--r--   2 root supergroup        688 2022-01-25 14:50 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07461.parquet
-rw-r--r--   2 root supergroup        682 2022-01-25 14:51 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07462.parquet
-rw-r--r--   2 root supergroup        687 2022-01-25 14:52 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07463.parquet
-rw-r--r--   2 root supergroup        687 2022-01-25 14:53 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07464.parquet
-rw-r--r--   2 root supergroup        688 2022-01-25 14:54 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07465.parquet
-rw-r--r--   2 root supergroup        686 2022-01-25 14:55 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07466.parquet
-rw-r--r--   2 root supergroup        682 2022-01-25 14:56 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07467.parquet
-rw-r--r--   2 root supergroup        684 2022-01-25 14:57 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07468.parquet
-rw-r--r--   2 root supergroup        683 2022-01-25 14:58 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07469.parquet
-rw-r--r--   2 root supergroup        686 2022-01-25 14:59 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07470.parquet
-rw-r--r--   2 root supergroup        689 2022-01-25 15:00 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07471.parquet
-rw-r--r--   2 root supergroup        688 2022-01-25 15:01 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07472.parquet
-rw-r--r--   2 root supergroup        683 2022-01-25 15:02 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07473.parquet
-rw-r--r--   2 root supergroup        686 2022-01-25 15:03 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07474.parquet
-rw-r--r--   2 root supergroup        685 2022-01-25 15:04 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07475.parquet
-rw-r--r--   2 root supergroup        686 2022-01-25 15:05 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07476.parquet
-rw-r--r--   2 root supergroup        686 2022-01-25 15:06 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07477.parquet
-rw-r--r--   2 root supergroup        687 2022-01-25 15:07 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07478.parquet
-rw-r--r--   2 root supergroup        685 2022-01-25 15:08 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07479.parquet
-rw-r--r--   2 root supergroup        680 2022-01-25 15:09 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07480.parquet
-rw-r--r--   2 root supergroup        678 2022-01-25 15:10 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07481.parquet
-rw-r--r--   2 root supergroup        681 2022-01-25 15:11 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07482.parquet
-rw-r--r--   2 root supergroup        687 2022-01-25 15:12 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07483.parquet
-rw-r--r--   2 root supergroup        682 2022-01-25 15:13 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07484.parquet
-rw-r--r--   2 root supergroup        682 2022-01-25 15:14 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-07485.parque

合并后,文件如下:
2022-01-27 14:16生成的就是合并后的文件(我在这个时间跑的):
小文件并没有删除!

-rw-r--r--   2 root supergroup       5907 2022-01-27 11:52 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00004-0-08406eff-a110-4de2-8130-030679a8f4f5-00106.parquet
-rw-r--r--   2 root supergroup       5907 2022-01-27 11:52 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00004-0-08406eff-a110-4de2-8130-030679a8f4f5-00107.parquet
-rw-r--r--   2 root supergroup     373141 2022-01-27 11:52 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00004-0-08406eff-a110-4de2-8130-030679a8f4f5-00108.parquet
-rw-r--r--   2 root supergroup       1128 2022-01-27 11:52 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00004-0-08406eff-a110-4de2-8130-030679a8f4f5-00109.parquet
-rw-r--r--   2 root supergroup     540168 2022-01-27 14:16 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00004-0-fea6f5d5-759f-4769-9ced-b3ecca214e36-00001.parquet
-rw-r--r--   2 root supergroup     173059 2022-01-27 14:16 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00004-0-fea6f5d5-759f-4769-9ced-b3ecca214e36-00002.parquet
-rw-r--r--   2 root supergroup     172866 2022-01-27 14:16 hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00004-0-fea6f5d5-759f-4769-9ced-b3ecca214e36-00003.parquet

3.2.4 在idea中跑的报错信息:

22/01/27 14:16:10 INFO conf.HiveConf: Found configuration file file:/E:/workspace/jt_workspace/iceberg-learning/target/classes/hive-site.xml
22/01/27 14:16:10 WARN conf.HiveConf: HiveConf of name hive.metastore.event.db.notification.api.auth does not exist
22/01/27 14:16:11 INFO security.JniBasedUnixGroupsMapping: Error getting groups for root: Unknown error.
22/01/27 14:16:11 WARN security.UserGroupInformation: No groups available for user root
22/01/27 14:16:11 INFO iceberg.BaseMetastoreTableOperations: Refreshing table metadata from new version: hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08707-afd79c3c-e280-45c4-9797-2fa9a4fa27f4.metadata.json
22/01/27 14:16:18 INFO iceberg.BaseMetastoreCatalog: Table loaded by catalog: hive_catalog6.iceberg_db6.behavior_log_ib6
22/01/27 14:16:18 INFO iceberg.BaseTableScan: Scanning table hive_catalog6.iceberg_db6.behavior_log_ib6 snapshot 6770306142276909677 created at 2022-01-27 11:52:45.274 with filter true
22/01/27 14:16:19 INFO typeutils.TypeExtractor: class org.apache.iceberg.BaseCombinedScanTask does not contain a getter for field tasks
22/01/27 14:16:19 INFO typeutils.TypeExtractor: class org.apache.iceberg.BaseCombinedScanTask does not contain a setter for field tasks
22/01/27 14:16:19 INFO typeutils.TypeExtractor: Class class org.apache.iceberg.BaseCombinedScanTask cannot be used as a POJO type because not all fields are valid POJO fields, and must be processed as GenericType. Please read the Flink documentation on "Data Types & Serialization" for details of the effect on performance.
22/01/27 14:16:20 INFO taskexecutor.TaskExecutorResourceUtils: The configuration option taskmanager.cpu.cores required for local execution is not set, setting it to the maximal possible value.
22/01/27 14:16:20 INFO taskexecutor.TaskExecutorResourceUtils: The configuration option taskmanager.memory.task.heap.size required for local execution is not set, setting it to the maximal possible value.
22/01/27 14:16:20 INFO taskexecutor.TaskExecutorResourceUtils: The configuration option taskmanager.memory.task.off-heap.size required for local execution is not set, setting it to the maximal possible value.
22/01/27 14:16:20 INFO taskexecutor.TaskExecutorResourceUtils: The configuration option taskmanager.memory.network.min required for local execution is not set, setting it to its default value 64 mb.
22/01/27 14:16:20 INFO taskexecutor.TaskExecutorResourceUtils: The configuration option taskmanager.memory.network.max required for local execution is not set, setting it to its default value 64 mb.
22/01/27 14:16:20 INFO taskexecutor.TaskExecutorResourceUtils: The configuration option taskmanager.memory.managed.size required for local execution is not set, setting it to its default value 128 mb.
22/01/27 14:16:20 INFO minicluster.MiniCluster: Starting Flink Mini Cluster
22/01/27 14:16:20 INFO minicluster.MiniCluster: Starting Metrics Registry
22/01/27 14:16:20 INFO metrics.MetricRegistryImpl: No metrics reporter configured, no metrics will be exposed/reported.
22/01/27 14:16:20 INFO minicluster.MiniCluster: Starting RPC Service(s)
22/01/27 14:16:20 INFO akka.AkkaRpcServiceUtils: Trying to start local actor system
22/01/27 14:16:20 INFO akka.AkkaRpcServiceUtils: Actor system started at akka://flink
22/01/27 14:16:20 INFO akka.AkkaRpcServiceUtils: Trying to start local actor system
22/01/27 14:16:21 INFO akka.AkkaRpcServiceUtils: Actor system started at akka://flink-metrics
22/01/27 14:16:21 INFO akka.AkkaRpcService: Starting RPC endpoint for org.apache.flink.runtime.metrics.dump.MetricQueryService at akka://flink-metrics/user/rpc/MetricQueryService .
22/01/27 14:16:21 INFO minicluster.MiniCluster: Starting high-availability services
22/01/27 14:16:21 INFO blob.BlobServer: Created BLOB server storage directory C:\Users\Administrator\AppData\Local\Temp\blobStore-01b74d06-fdf5-4a63-acba-a45949c31adb
22/01/27 14:16:21 INFO blob.BlobServer: Started BLOB server at 0.0.0.0:62600 - max concurrent requests: 50 - max backlog: 1000
22/01/27 14:16:21 INFO blob.PermanentBlobCache: Created BLOB cache storage directory C:\Users\Administrator\AppData\Local\Temp\blobStore-857c985a-74c5-4ae5-9013-0050c9c4a466
22/01/27 14:16:21 INFO blob.TransientBlobCache: Created BLOB cache storage directory C:\Users\Administrator\AppData\Local\Temp\blobStore-a3f32175-7425-4200-9281-607b50fe2515
22/01/27 14:16:21 INFO minicluster.MiniCluster: Starting 1 TaskManger(s)
22/01/27 14:16:21 INFO taskexecutor.TaskManagerRunner: Starting TaskManager with ResourceID: 5e923a69-2d6f-4a7c-82ad-2a264b55ec66
22/01/27 14:16:21 INFO taskexecutor.TaskManagerServices: Temporary file directory 'C:\Users\Administrator\AppData\Local\Temp': total 180 GB, usable 113 GB (62.78% usable)
22/01/27 14:16:21 INFO disk.FileChannelManagerImpl: FileChannelManager uses directory C:\Users\Administrator\AppData\Local\Temp\flink-io-ffab8076-73ad-4315-b887-c085c53ccc60 for spill files.
22/01/27 14:16:21 INFO disk.FileChannelManagerImpl: FileChannelManager uses directory C:\Users\Administrator\AppData\Local\Temp\flink-netty-shuffle-06e9023c-7dee-45ce-835c-435a1d3da9a9 for spill files.
22/01/27 14:16:21 INFO buffer.NetworkBufferPool: Allocated 64 MB for network buffer pool (number of memory segments: 2048, bytes per segment: 32768).
22/01/27 14:16:21 INFO network.NettyShuffleEnvironment: Starting the network environment and its components.
22/01/27 14:16:21 INFO taskexecutor.KvStateService: Starting the kvState service and its components.
22/01/27 14:16:21 INFO akka.AkkaRpcService: Starting RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at akka://flink/user/rpc/taskmanager_0 .
22/01/27 14:16:21 INFO taskexecutor.DefaultJobLeaderService: Start job leader service.
22/01/27 14:16:21 INFO filecache.FileCache: User file cache uses directory C:\Users\Administrator\AppData\Local\Temp\flink-dist-cache-11b91016-ba97-47d5-bfbb-db6e628cc831
22/01/27 14:16:21 INFO dispatcher.DispatcherRestEndpoint: Starting rest endpoint.
22/01/27 14:16:21 WARN webmonitor.WebMonitorUtils: Log file environment variable 'log.file' is not set.
22/01/27 14:16:21 WARN webmonitor.WebMonitorUtils: JobManager log files are unavailable in the web dashboard. Log file location not found in environment variable 'log.file' or configuration key 'web.log.path'.
22/01/27 14:16:21 INFO dispatcher.DispatcherRestEndpoint: Rest endpoint listening at localhost:62637
22/01/27 14:16:21 INFO embedded.EmbeddedLeaderService: Proposing leadership to contender http://localhost:62637
22/01/27 14:16:21 INFO dispatcher.DispatcherRestEndpoint: Web frontend listening at http://localhost:62637.
22/01/27 14:16:21 INFO dispatcher.DispatcherRestEndpoint: http://localhost:62637 was granted leadership with leaderSessionID=3f9dae08-d917-4aa9-9efe-26dcec84417b
22/01/27 14:16:21 INFO embedded.EmbeddedLeaderService: Received confirmation of leadership for leader http://localhost:62637 , session=3f9dae08-d917-4aa9-9efe-26dcec84417b
22/01/27 14:16:21 INFO akka.AkkaRpcService: Starting RPC endpoint for org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at akka://flink/user/rpc/resourcemanager_1 .
22/01/27 14:16:21 INFO embedded.EmbeddedLeaderService: Proposing leadership to contender LeaderContender: DefaultDispatcherRunner
22/01/27 14:16:21 INFO embedded.EmbeddedLeaderService: Proposing leadership to contender LeaderContender: StandaloneResourceManager
22/01/27 14:16:21 INFO resourcemanager.StandaloneResourceManager: ResourceManager akka://flink/user/rpc/resourcemanager_1 was granted leadership with fencing token 874c6d550ae35e6c60c250b556dc4d1d
22/01/27 14:16:21 INFO minicluster.MiniCluster: Flink Mini Cluster started successfully
22/01/27 14:16:21 INFO runner.SessionDispatcherLeaderProcess: Start SessionDispatcherLeaderProcess.
22/01/27 14:16:21 INFO slotmanager.SlotManagerImpl: Starting the SlotManager.
22/01/27 14:16:21 INFO runner.SessionDispatcherLeaderProcess: Recover all persisted job graphs.
22/01/27 14:16:21 INFO runner.SessionDispatcherLeaderProcess: Successfully recovered 0 persisted job graphs.
22/01/27 14:16:21 INFO embedded.EmbeddedLeaderService: Received confirmation of leadership for leader akka://flink/user/rpc/resourcemanager_1 , session=60c250b5-56dc-4d1d-874c-6d550ae35e6c
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Connecting to ResourceManager akka://flink/user/rpc/resourcemanager_1(874c6d550ae35e6c60c250b556dc4d1d).
22/01/27 14:16:21 INFO akka.AkkaRpcService: Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/rpc/dispatcher_2 .
22/01/27 14:16:21 INFO embedded.EmbeddedLeaderService: Received confirmation of leadership for leader akka://flink/user/rpc/dispatcher_2 , session=3d1322d4-c700-4049-b776-4011a2465b64
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Resolved ResourceManager address, beginning registration
22/01/27 14:16:21 INFO resourcemanager.StandaloneResourceManager: Registering TaskManager with ResourceID 5e923a69-2d6f-4a7c-82ad-2a264b55ec66 (akka://flink/user/rpc/taskmanager_0) at ResourceManager
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Successful registration at resource manager akka://flink/user/rpc/resourcemanager_1 under registration id a5bfb270e78ea56916f295a783ee3848.
22/01/27 14:16:21 INFO dispatcher.StandaloneDispatcher: Received JobGraph submission 192865ab3c206d9ad46887ca4033853c (Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6).
22/01/27 14:16:21 INFO dispatcher.StandaloneDispatcher: Submitting job 192865ab3c206d9ad46887ca4033853c (Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6).
22/01/27 14:16:21 INFO embedded.EmbeddedLeaderService: Proposing leadership to contender LeaderContender: JobManagerRunnerImpl
22/01/27 14:16:21 INFO akka.AkkaRpcService: Starting RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at akka://flink/user/rpc/jobmanager_3 .
22/01/27 14:16:21 INFO jobmaster.JobMaster: Initializing job Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6 (192865ab3c206d9ad46887ca4033853c).
22/01/27 14:16:21 INFO jobmaster.JobMaster: Using restart back off time strategy NoRestartBackoffTimeStrategy for Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6 (192865ab3c206d9ad46887ca4033853c).
22/01/27 14:16:21 INFO jobmaster.JobMaster: Running initialization on master for job Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6 (192865ab3c206d9ad46887ca4033853c).
22/01/27 14:16:21 INFO jobmaster.JobMaster: Successfully ran initialization on master in 0 ms.
22/01/27 14:16:21 INFO adapter.DefaultExecutionTopology: Built 1 pipelined regions in 0 ms
22/01/27 14:16:21 INFO jobmaster.JobMaster: No state backend has been configured, using default (Memory / JobManager) MemoryStateBackend (data in heap memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, maxStateSize: 5242880)
22/01/27 14:16:21 INFO checkpoint.CheckpointCoordinator: No checkpoint found during restore.
22/01/27 14:16:21 INFO jobmaster.JobMaster: Using failover strategy org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@180a28d1 for Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6 (192865ab3c206d9ad46887ca4033853c).
22/01/27 14:16:21 INFO jobmaster.JobManagerRunnerImpl: JobManager runner for job Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6 (192865ab3c206d9ad46887ca4033853c) was granted leadership with session id 42eed335-9570-4466-9512-824db5053b50 at akka://flink/user/rpc/jobmanager_3.
22/01/27 14:16:21 INFO jobmaster.JobMaster: Starting execution of job Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6 (192865ab3c206d9ad46887ca4033853c) under job master id 9512824db5053b5042eed33595704466.
22/01/27 14:16:21 INFO jobmaster.JobMaster: Starting scheduling with scheduling strategy [org.apache.flink.runtime.scheduler.strategy.PipelinedRegionSchedulingStrategy]
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Job Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6 (192865ab3c206d9ad46887ca4033853c) switched from state CREATED to RUNNING.
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Source: Collection Source (1/1) (bc41d3bd8a13912bb07a08df1932eccd) switched from CREATED to SCHEDULED.
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Map (1/5) (0745e63c429e3f3f5277957c5a9327a4) switched from CREATED to SCHEDULED.
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Map (2/5) (fce69b618fdc7ee44325859084aefc03) switched from CREATED to SCHEDULED.
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Map (3/5) (332baa5f82c0031c6c2f4ef5a1c2640b) switched from CREATED to SCHEDULED.
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Map (4/5) (a2b8480bd9ef290ca8078401a00082d2) switched from CREATED to SCHEDULED.
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Map (5/5) (12cf712a4901aeefe410f17514203677) switched from CREATED to SCHEDULED.
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Sink: Data stream collect sink (1/1) (3cdc14eb995d664cc6a13696b6573777) switched from CREATED to SCHEDULED.
22/01/27 14:16:21 INFO slotpool.SlotPoolImpl: Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{5af2e3ba8bef556b8dd23ba04e17593d}]
22/01/27 14:16:21 INFO slotpool.SlotPoolImpl: Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{6fdbc63735d0fc056b8a2a5a7a06bfff}]
22/01/27 14:16:21 INFO slotpool.SlotPoolImpl: Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{e38d407cf6b0af3a8f77cd03dfe51fc7}]
22/01/27 14:16:21 INFO slotpool.SlotPoolImpl: Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{5ce445a6c9cf40c7c33b7779909d58cf}]
22/01/27 14:16:21 INFO slotpool.SlotPoolImpl: Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{25073f97b0d46577c5d2de9220e40512}]
22/01/27 14:16:21 INFO embedded.EmbeddedLeaderService: Received confirmation of leadership for leader akka://flink/user/rpc/jobmanager_3 , session=42eed335-9570-4466-9512-824db5053b50
22/01/27 14:16:21 INFO jobmaster.JobMaster: Connecting to ResourceManager akka://flink/user/rpc/resourcemanager_1(874c6d550ae35e6c60c250b556dc4d1d)
22/01/27 14:16:21 INFO jobmaster.JobMaster: Resolved ResourceManager address, beginning registration
22/01/27 14:16:21 INFO resourcemanager.StandaloneResourceManager: Registering job manager 9512824db5053b5042eed33595704466@akka://flink/user/rpc/jobmanager_3 for job 192865ab3c206d9ad46887ca4033853c.
22/01/27 14:16:21 INFO resourcemanager.StandaloneResourceManager: Registered job manager 9512824db5053b5042eed33595704466@akka://flink/user/rpc/jobmanager_3 for job 192865ab3c206d9ad46887ca4033853c.
22/01/27 14:16:21 INFO jobmaster.JobMaster: JobManager successfully registered at ResourceManager, leader id: 874c6d550ae35e6c60c250b556dc4d1d.
22/01/27 14:16:21 INFO slotpool.SlotPoolImpl: Requesting new slot [SlotRequestId{5af2e3ba8bef556b8dd23ba04e17593d}] and profile ResourceProfile{UNKNOWN} with allocation id dce4d8b9e2ef7cdb75772dcd9f663ed5 from resource manager.
22/01/27 14:16:21 INFO resourcemanager.StandaloneResourceManager: Request slot with profile ResourceProfile{UNKNOWN} for job 192865ab3c206d9ad46887ca4033853c with allocation id dce4d8b9e2ef7cdb75772dcd9f663ed5.
22/01/27 14:16:21 INFO slotpool.SlotPoolImpl: Requesting new slot [SlotRequestId{6fdbc63735d0fc056b8a2a5a7a06bfff}] and profile ResourceProfile{UNKNOWN} with allocation id 519f346059a1ff6ddf6b8ea6664338ad from resource manager.
22/01/27 14:16:21 INFO slotpool.SlotPoolImpl: Requesting new slot [SlotRequestId{e38d407cf6b0af3a8f77cd03dfe51fc7}] and profile ResourceProfile{UNKNOWN} with allocation id 5163ceecd85688cc864c61e4a490b2fe from resource manager.
22/01/27 14:16:21 INFO slotpool.SlotPoolImpl: Requesting new slot [SlotRequestId{5ce445a6c9cf40c7c33b7779909d58cf}] and profile ResourceProfile{UNKNOWN} with allocation id 3144b0b9f45ef6d32c73fe979e50bcb9 from resource manager.
22/01/27 14:16:21 INFO slotpool.SlotPoolImpl: Requesting new slot [SlotRequestId{25073f97b0d46577c5d2de9220e40512}] and profile ResourceProfile{UNKNOWN} with allocation id 91f39190bc584d5d5e2e3cfaff8bf2fe from resource manager.
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Receive slot request dce4d8b9e2ef7cdb75772dcd9f663ed5 for job 192865ab3c206d9ad46887ca4033853c from resource manager with leader id 874c6d550ae35e6c60c250b556dc4d1d.
22/01/27 14:16:21 INFO resourcemanager.StandaloneResourceManager: Request slot with profile ResourceProfile{UNKNOWN} for job 192865ab3c206d9ad46887ca4033853c with allocation id 519f346059a1ff6ddf6b8ea6664338ad.
22/01/27 14:16:21 INFO resourcemanager.StandaloneResourceManager: Request slot with profile ResourceProfile{UNKNOWN} for job 192865ab3c206d9ad46887ca4033853c with allocation id 5163ceecd85688cc864c61e4a490b2fe.
22/01/27 14:16:21 INFO resourcemanager.StandaloneResourceManager: Request slot with profile ResourceProfile{UNKNOWN} for job 192865ab3c206d9ad46887ca4033853c with allocation id 3144b0b9f45ef6d32c73fe979e50bcb9.
22/01/27 14:16:21 INFO resourcemanager.StandaloneResourceManager: Request slot with profile ResourceProfile{UNKNOWN} for job 192865ab3c206d9ad46887ca4033853c with allocation id 91f39190bc584d5d5e2e3cfaff8bf2fe.
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Allocated slot for dce4d8b9e2ef7cdb75772dcd9f663ed5.
22/01/27 14:16:21 INFO taskexecutor.DefaultJobLeaderService: Add job 192865ab3c206d9ad46887ca4033853c for job leader monitoring.
22/01/27 14:16:21 INFO taskexecutor.DefaultJobLeaderService: Try to register at job manager akka://flink/user/rpc/jobmanager_3 with leader id 42eed335-9570-4466-9512-824db5053b50.
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Receive slot request 519f346059a1ff6ddf6b8ea6664338ad for job 192865ab3c206d9ad46887ca4033853c from resource manager with leader id 874c6d550ae35e6c60c250b556dc4d1d.
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Allocated slot for 519f346059a1ff6ddf6b8ea6664338ad.
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Receive slot request 5163ceecd85688cc864c61e4a490b2fe for job 192865ab3c206d9ad46887ca4033853c from resource manager with leader id 874c6d550ae35e6c60c250b556dc4d1d.
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Allocated slot for 5163ceecd85688cc864c61e4a490b2fe.
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Receive slot request 3144b0b9f45ef6d32c73fe979e50bcb9 for job 192865ab3c206d9ad46887ca4033853c from resource manager with leader id 874c6d550ae35e6c60c250b556dc4d1d.
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Allocated slot for 3144b0b9f45ef6d32c73fe979e50bcb9.
22/01/27 14:16:21 INFO taskexecutor.DefaultJobLeaderService: Resolved JobManager address, beginning registration
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Receive slot request 91f39190bc584d5d5e2e3cfaff8bf2fe for job 192865ab3c206d9ad46887ca4033853c from resource manager with leader id 874c6d550ae35e6c60c250b556dc4d1d.
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Allocated slot for 91f39190bc584d5d5e2e3cfaff8bf2fe.
22/01/27 14:16:21 INFO taskexecutor.DefaultJobLeaderService: Successful registration at job manager akka://flink/user/rpc/jobmanager_3 for job 192865ab3c206d9ad46887ca4033853c.
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Establish JobManager connection for job 192865ab3c206d9ad46887ca4033853c.
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Offer reserved slots to the leader of job 192865ab3c206d9ad46887ca4033853c.
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Source: Collection Source (1/1) (bc41d3bd8a13912bb07a08df1932eccd) switched from SCHEDULED to DEPLOYING.
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Deploying Source: Collection Source (1/1) (attempt #0) with attempt id bc41d3bd8a13912bb07a08df1932eccd to 5e923a69-2d6f-4a7c-82ad-2a264b55ec66 @ 127.0.0.1 (dataPort=-1) with allocation id 3144b0b9f45ef6d32c73fe979e50bcb9
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Map (1/5) (0745e63c429e3f3f5277957c5a9327a4) switched from SCHEDULED to DEPLOYING.
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Deploying Map (1/5) (attempt #0) with attempt id 0745e63c429e3f3f5277957c5a9327a4 to 5e923a69-2d6f-4a7c-82ad-2a264b55ec66 @ 127.0.0.1 (dataPort=-1) with allocation id 3144b0b9f45ef6d32c73fe979e50bcb9
22/01/27 14:16:21 INFO slot.TaskSlotTableImpl: Activate slot 3144b0b9f45ef6d32c73fe979e50bcb9.
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Map (2/5) (fce69b618fdc7ee44325859084aefc03) switched from SCHEDULED to DEPLOYING.
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Deploying Map (2/5) (attempt #0) with attempt id fce69b618fdc7ee44325859084aefc03 to 5e923a69-2d6f-4a7c-82ad-2a264b55ec66 @ 127.0.0.1 (dataPort=-1) with allocation id dce4d8b9e2ef7cdb75772dcd9f663ed5
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Map (3/5) (332baa5f82c0031c6c2f4ef5a1c2640b) switched from SCHEDULED to DEPLOYING.
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Deploying Map (3/5) (attempt #0) with attempt id 332baa5f82c0031c6c2f4ef5a1c2640b to 5e923a69-2d6f-4a7c-82ad-2a264b55ec66 @ 127.0.0.1 (dataPort=-1) with allocation id 519f346059a1ff6ddf6b8ea6664338ad
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Map (4/5) (a2b8480bd9ef290ca8078401a00082d2) switched from SCHEDULED to DEPLOYING.
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Deploying Map (4/5) (attempt #0) with attempt id a2b8480bd9ef290ca8078401a00082d2 to 5e923a69-2d6f-4a7c-82ad-2a264b55ec66 @ 127.0.0.1 (dataPort=-1) with allocation id 5163ceecd85688cc864c61e4a490b2fe
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Map (5/5) (12cf712a4901aeefe410f17514203677) switched from SCHEDULED to DEPLOYING.
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Deploying Map (5/5) (attempt #0) with attempt id 12cf712a4901aeefe410f17514203677 to 5e923a69-2d6f-4a7c-82ad-2a264b55ec66 @ 127.0.0.1 (dataPort=-1) with allocation id 91f39190bc584d5d5e2e3cfaff8bf2fe
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Sink: Data stream collect sink (1/1) (3cdc14eb995d664cc6a13696b6573777) switched from SCHEDULED to DEPLOYING.
22/01/27 14:16:21 INFO executiongraph.ExecutionGraph: Deploying Sink: Data stream collect sink (1/1) (attempt #0) with attempt id 3cdc14eb995d664cc6a13696b6573777 to 5e923a69-2d6f-4a7c-82ad-2a264b55ec66 @ 127.0.0.1 (dataPort=-1) with allocation id 3144b0b9f45ef6d32c73fe979e50bcb9
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Received task Source: Collection Source (1/1)#0 (bc41d3bd8a13912bb07a08df1932eccd), deploy into slot with allocation id 3144b0b9f45ef6d32c73fe979e50bcb9.
22/01/27 14:16:21 INFO taskmanager.Task: Source: Collection Source (1/1)#0 (bc41d3bd8a13912bb07a08df1932eccd) switched from CREATED to DEPLOYING.
22/01/27 14:16:21 INFO slot.TaskSlotTableImpl: Activate slot 3144b0b9f45ef6d32c73fe979e50bcb9.
22/01/27 14:16:21 INFO taskmanager.Task: Loading JAR files for task Source: Collection Source (1/1)#0 (bc41d3bd8a13912bb07a08df1932eccd) [DEPLOYING].
22/01/27 14:16:21 INFO taskmanager.Task: Registering task at network: Source: Collection Source (1/1)#0 (bc41d3bd8a13912bb07a08df1932eccd) [DEPLOYING].
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Received task Map (1/5)#0 (0745e63c429e3f3f5277957c5a9327a4), deploy into slot with allocation id 3144b0b9f45ef6d32c73fe979e50bcb9.
22/01/27 14:16:21 INFO slot.TaskSlotTableImpl: Activate slot dce4d8b9e2ef7cdb75772dcd9f663ed5.
22/01/27 14:16:21 INFO taskmanager.Task: Map (1/5)#0 (0745e63c429e3f3f5277957c5a9327a4) switched from CREATED to DEPLOYING.
22/01/27 14:16:21 INFO taskmanager.Task: Loading JAR files for task Map (1/5)#0 (0745e63c429e3f3f5277957c5a9327a4) [DEPLOYING].
22/01/27 14:16:21 INFO taskmanager.Task: Registering task at network: Map (1/5)#0 (0745e63c429e3f3f5277957c5a9327a4) [DEPLOYING].
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Received task Map (2/5)#0 (fce69b618fdc7ee44325859084aefc03), deploy into slot with allocation id dce4d8b9e2ef7cdb75772dcd9f663ed5.
22/01/27 14:16:21 INFO slot.TaskSlotTableImpl: Activate slot 519f346059a1ff6ddf6b8ea6664338ad.
22/01/27 14:16:21 INFO taskmanager.Task: Map (2/5)#0 (fce69b618fdc7ee44325859084aefc03) switched from CREATED to DEPLOYING.
22/01/27 14:16:21 INFO taskmanager.Task: Loading JAR files for task Map (2/5)#0 (fce69b618fdc7ee44325859084aefc03) [DEPLOYING].
22/01/27 14:16:21 INFO taskmanager.Task: Registering task at network: Map (2/5)#0 (fce69b618fdc7ee44325859084aefc03) [DEPLOYING].
22/01/27 14:16:21 INFO taskexecutor.TaskExecutor: Received task Map (3/5)#0 (332baa5f82c0031c6c2f4ef5a1c2640b), deploy into slot with allocation id 519f346059a1ff6ddf6b8ea6664338ad.
22/01/27 14:16:21 INFO taskmanager.Task: Map (3/5)#0 (332baa5f82c0031c6c2f4ef5a1c2640b) switched from CREATED to DEPLOYING.
22/01/27 14:16:21 INFO slot.TaskSlotTableImpl: Activate slot 5163ceecd85688cc864c61e4a490b2fe.
22/01/27 14:16:21 INFO taskmanager.Task: Loading JAR files for task Map (3/5)#0 (332baa5f82c0031c6c2f4ef5a1c2640b) [DEPLOYING].
22/01/27 14:16:21 INFO taskmanager.Task: Registering task at network: Map (3/5)#0 (332baa5f82c0031c6c2f4ef5a1c2640b) [DEPLOYING].
22/01/27 14:16:22 INFO taskexecutor.TaskExecutor: Received task Map (4/5)#0 (a2b8480bd9ef290ca8078401a00082d2), deploy into slot with allocation id 5163ceecd85688cc864c61e4a490b2fe.
22/01/27 14:16:22 INFO taskmanager.Task: Map (4/5)#0 (a2b8480bd9ef290ca8078401a00082d2) switched from CREATED to DEPLOYING.
22/01/27 14:16:22 INFO slot.TaskSlotTableImpl: Activate slot 91f39190bc584d5d5e2e3cfaff8bf2fe.
22/01/27 14:16:22 INFO taskmanager.Task: Loading JAR files for task Map (4/5)#0 (a2b8480bd9ef290ca8078401a00082d2) [DEPLOYING].
22/01/27 14:16:22 INFO taskmanager.Task: Registering task at network: Map (4/5)#0 (a2b8480bd9ef290ca8078401a00082d2) [DEPLOYING].
22/01/27 14:16:22 INFO taskexecutor.TaskExecutor: Received task Map (5/5)#0 (12cf712a4901aeefe410f17514203677), deploy into slot with allocation id 91f39190bc584d5d5e2e3cfaff8bf2fe.
22/01/27 14:16:22 INFO taskmanager.Task: Map (5/5)#0 (12cf712a4901aeefe410f17514203677) switched from CREATED to DEPLOYING.
22/01/27 14:16:22 INFO taskmanager.Task: Loading JAR files for task Map (5/5)#0 (12cf712a4901aeefe410f17514203677) [DEPLOYING].
22/01/27 14:16:22 INFO slot.TaskSlotTableImpl: Activate slot dce4d8b9e2ef7cdb75772dcd9f663ed5.
22/01/27 14:16:22 INFO slot.TaskSlotTableImpl: Activate slot 519f346059a1ff6ddf6b8ea6664338ad.
22/01/27 14:16:22 INFO slot.TaskSlotTableImpl: Activate slot 5163ceecd85688cc864c61e4a490b2fe.
22/01/27 14:16:22 INFO slot.TaskSlotTableImpl: Activate slot 3144b0b9f45ef6d32c73fe979e50bcb9.
22/01/27 14:16:22 INFO slot.TaskSlotTableImpl: Activate slot 91f39190bc584d5d5e2e3cfaff8bf2fe.
22/01/27 14:16:22 INFO slot.TaskSlotTableImpl: Activate slot 3144b0b9f45ef6d32c73fe979e50bcb9.
22/01/27 14:16:22 INFO taskmanager.Task: Registering task at network: Map (5/5)#0 (12cf712a4901aeefe410f17514203677) [DEPLOYING].
22/01/27 14:16:22 INFO taskexecutor.TaskExecutor: Received task Sink: Data stream collect sink (1/1)#0 (3cdc14eb995d664cc6a13696b6573777), deploy into slot with allocation id 3144b0b9f45ef6d32c73fe979e50bcb9.
22/01/27 14:16:22 INFO taskmanager.Task: Sink: Data stream collect sink (1/1)#0 (3cdc14eb995d664cc6a13696b6573777) switched from CREATED to DEPLOYING.
22/01/27 14:16:22 INFO taskmanager.Task: Loading JAR files for task Sink: Data stream collect sink (1/1)#0 (3cdc14eb995d664cc6a13696b6573777) [DEPLOYING].
22/01/27 14:16:22 INFO taskmanager.Task: Registering task at network: Sink: Data stream collect sink (1/1)#0 (3cdc14eb995d664cc6a13696b6573777) [DEPLOYING].
22/01/27 14:16:22 INFO tasks.StreamTask: No state backend has been configured, using default (Memory / JobManager) MemoryStateBackend (data in heap memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, maxStateSize: 5242880)
22/01/27 14:16:22 INFO tasks.StreamTask: No state backend has been configured, using default (Memory / JobManager) MemoryStateBackend (data in heap memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, maxStateSize: 5242880)
22/01/27 14:16:22 INFO tasks.StreamTask: No state backend has been configured, using default (Memory / JobManager) MemoryStateBackend (data in heap memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, maxStateSize: 5242880)
22/01/27 14:16:22 INFO tasks.StreamTask: No state backend has been configured, using default (Memory / JobManager) MemoryStateBackend (data in heap memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, maxStateSize: 5242880)
22/01/27 14:16:22 INFO tasks.StreamTask: No state backend has been configured, using default (Memory / JobManager) MemoryStateBackend (data in heap memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, maxStateSize: 5242880)
22/01/27 14:16:22 INFO tasks.StreamTask: No state backend has been configured, using default (Memory / JobManager) MemoryStateBackend (data in heap memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, maxStateSize: 5242880)
22/01/27 14:16:22 INFO tasks.StreamTask: No state backend has been configured, using default (Memory / JobManager) MemoryStateBackend (data in heap memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, maxStateSize: 5242880)
22/01/27 14:16:22 INFO taskmanager.Task: Map (3/5)#0 (332baa5f82c0031c6c2f4ef5a1c2640b) switched from DEPLOYING to RUNNING.
22/01/27 14:16:22 INFO taskmanager.Task: Map (1/5)#0 (0745e63c429e3f3f5277957c5a9327a4) switched from DEPLOYING to RUNNING.
22/01/27 14:16:22 INFO taskmanager.Task: Source: Collection Source (1/1)#0 (bc41d3bd8a13912bb07a08df1932eccd) switched from DEPLOYING to RUNNING.
22/01/27 14:16:22 INFO taskmanager.Task: Sink: Data stream collect sink (1/1)#0 (3cdc14eb995d664cc6a13696b6573777) switched from DEPLOYING to RUNNING.
22/01/27 14:16:22 INFO taskmanager.Task: Map (4/5)#0 (a2b8480bd9ef290ca8078401a00082d2) switched from DEPLOYING to RUNNING.
22/01/27 14:16:22 INFO taskmanager.Task: Map (5/5)#0 (12cf712a4901aeefe410f17514203677) switched from DEPLOYING to RUNNING.
22/01/27 14:16:22 INFO taskmanager.Task: Map (2/5)#0 (fce69b618fdc7ee44325859084aefc03) switched from DEPLOYING to RUNNING.
22/01/27 14:16:22 INFO executiongraph.ExecutionGraph: Map (3/5) (332baa5f82c0031c6c2f4ef5a1c2640b) switched from DEPLOYING to RUNNING.
22/01/27 14:16:22 INFO executiongraph.ExecutionGraph: Map (1/5) (0745e63c429e3f3f5277957c5a9327a4) switched from DEPLOYING to RUNNING.
22/01/27 14:16:22 INFO executiongraph.ExecutionGraph: Source: Collection Source (1/1) (bc41d3bd8a13912bb07a08df1932eccd) switched from DEPLOYING to RUNNING.
22/01/27 14:16:22 INFO executiongraph.ExecutionGraph: Sink: Data stream collect sink (1/1) (3cdc14eb995d664cc6a13696b6573777) switched from DEPLOYING to RUNNING.
22/01/27 14:16:22 INFO executiongraph.ExecutionGraph: Map (4/5) (a2b8480bd9ef290ca8078401a00082d2) switched from DEPLOYING to RUNNING.
22/01/27 14:16:22 INFO executiongraph.ExecutionGraph: Map (5/5) (12cf712a4901aeefe410f17514203677) switched from DEPLOYING to RUNNING.
22/01/27 14:16:22 INFO executiongraph.ExecutionGraph: Map (2/5) (fce69b618fdc7ee44325859084aefc03) switched from DEPLOYING to RUNNING.
22/01/27 14:16:22 INFO collect.CollectSinkFunction: Initializing collect sink state with offset = 0, buffered results bytes = 0
22/01/27 14:16:22 INFO collect.CollectSinkFunction: Collect sink server established, address = localhost/127.0.0.1:62640
22/01/27 14:16:22 INFO collect.CollectSinkOperatorCoordinator: Received sink socket server address: localhost/127.0.0.1:62640
22/01/27 14:16:22 INFO consumer.SingleInputGate: Converting recovered input channels (5 channels)
22/01/27 14:16:22 INFO consumer.SingleInputGate: Converting recovered input channels (1 channels)
22/01/27 14:16:22 INFO consumer.SingleInputGate: Converting recovered input channels (1 channels)
22/01/27 14:16:22 INFO consumer.SingleInputGate: Converting recovered input channels (1 channels)
22/01/27 14:16:22 INFO consumer.SingleInputGate: Converting recovered input channels (1 channels)
22/01/27 14:16:22 INFO consumer.SingleInputGate: Converting recovered input channels (1 channels)
22/01/27 14:16:22 INFO collect.CollectSinkOperatorCoordinator: Sink connection established
22/01/27 14:16:22 INFO collect.CollectSinkFunction: Coordinator connection received
22/01/27 14:16:22 INFO collect.CollectSinkFunction: Invalid request. Received version = , offset = 0, while expected version = 57cecdd1-f215-4ed7-a434-a0333e53bd30, offset = 0
22/01/27 14:16:22 INFO taskmanager.Task: Source: Collection Source (1/1)#0 (bc41d3bd8a13912bb07a08df1932eccd) switched from RUNNING to FINISHED.
22/01/27 14:16:22 INFO taskmanager.Task: Freeing task resources for Source: Collection Source (1/1)#0 (bc41d3bd8a13912bb07a08df1932eccd).
22/01/27 14:16:22 INFO taskexecutor.TaskExecutor: Un-registering task and sending final execution state FINISHED to JobManager for task Source: Collection Source (1/1)#0 bc41d3bd8a13912bb07a08df1932eccd.
22/01/27 14:16:22 INFO executiongraph.ExecutionGraph: Source: Collection Source (1/1) (bc41d3bd8a13912bb07a08df1932eccd) switched from RUNNING to FINISHED.
22/01/27 14:16:22 WARN zlib.ZlibFactory: Failed to load/initialize native-zlib library
22/01/27 14:16:22 INFO compress.CodecPool: Got brand-new compressor [.gz]
22/01/27 14:16:22 INFO compress.CodecPool: Got brand-new compressor [.gz]
22/01/27 14:16:22 INFO compress.CodecPool: Got brand-new compressor [.gz]
22/01/27 14:16:22 INFO compress.CodecPool: Got brand-new compressor [.gz]
...省略重复的...
22/01/27 14:16:32 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:16:32 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:16:32 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:16:32 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:16:32 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:16:32 INFO taskmanager.Task: Map (1/5)#0 (0745e63c429e3f3f5277957c5a9327a4) switched from RUNNING to FINISHED.
22/01/27 14:16:32 INFO taskmanager.Task: Freeing task resources for Map (1/5)#0 (0745e63c429e3f3f5277957c5a9327a4).
22/01/27 14:16:32 INFO taskexecutor.TaskExecutor: Un-registering task and sending final execution state FINISHED to JobManager for task Map (1/5)#0 0745e63c429e3f3f5277957c5a9327a4.
22/01/27 14:16:32 INFO executiongraph.ExecutionGraph: Map (1/5) (0745e63c429e3f3f5277957c5a9327a4) switched from RUNNING to FINISHED.
22/01/27 14:16:32 INFO compress.CodecPool: Got brand-new compressor [.gz]
22/01/27 14:16:32 INFO compress.CodecPool: Got brand-new compressor [.gz]
22/01/27 14:16:32 INFO taskmanager.Task: Map (5/5)#0 (12cf712a4901aeefe410f17514203677) switched from RUNNING to FINISHED.
22/01/27 14:16:32 INFO taskmanager.Task: Freeing task resources for Map (5/5)#0 (12cf712a4901aeefe410f17514203677).
22/01/27 14:16:32 INFO taskexecutor.TaskExecutor: Un-registering task and sending final execution state FINISHED to JobManager for task Map (5/5)#0 12cf712a4901aeefe410f17514203677.
22/01/27 14:16:32 INFO executiongraph.ExecutionGraph: Map (5/5) (12cf712a4901aeefe410f17514203677) switched from RUNNING to FINISHED.
22/01/27 14:16:32 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:16:32 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:16:32 INFO taskmanager.Task: Map (2/5)#0 (fce69b618fdc7ee44325859084aefc03) switched from RUNNING to FINISHED.
22/01/27 14:16:32 INFO taskmanager.Task: Freeing task resources for Map (2/5)#0 (fce69b618fdc7ee44325859084aefc03).
22/01/27 14:16:32 INFO taskexecutor.TaskExecutor: Un-registering task and sending final execution state FINISHED to JobManager for task Map (2/5)#0 fce69b618fdc7ee44325859084aefc03.
22/01/27 14:16:32 INFO executiongraph.ExecutionGraph: Map (2/5) (fce69b618fdc7ee44325859084aefc03) switched from RUNNING to FINISHED.
22/01/27 14:16:32 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:16:32 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:16:33 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:16:33 INFO compress.CodecPool: Got brand-new decompressor [.gz]

22/01/27 14:16:35 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:16:35 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:16:35 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:16:35 INFO taskmanager.Task: Map (4/5)#0 (a2b8480bd9ef290ca8078401a00082d2) switched from RUNNING to FINISHED.
22/01/27 14:16:35 INFO taskmanager.Task: Freeing task resources for Map (4/5)#0 (a2b8480bd9ef290ca8078401a00082d2).
22/01/27 14:16:35 INFO taskexecutor.TaskExecutor: Un-registering task and sending final execution state FINISHED to JobManager for task Map (4/5)#0 a2b8480bd9ef290ca8078401a00082d2.
22/01/27 14:16:35 INFO executiongraph.ExecutionGraph: Map (4/5) (a2b8480bd9ef290ca8078401a00082d2) switched from RUNNING to FINISHED.
22/01/27 14:16:36 INFO taskmanager.Task: Map (3/5)#0 (332baa5f82c0031c6c2f4ef5a1c2640b) switched from RUNNING to FINISHED.
22/01/27 14:16:36 INFO taskmanager.Task: Freeing task resources for Map (3/5)#0 (332baa5f82c0031c6c2f4ef5a1c2640b).
22/01/27 14:16:36 INFO taskexecutor.TaskExecutor: Un-registering task and sending final execution state FINISHED to JobManager for task Map (3/5)#0 332baa5f82c0031c6c2f4ef5a1c2640b.
22/01/27 14:16:36 INFO executiongraph.ExecutionGraph: Map (3/5) (332baa5f82c0031c6c2f4ef5a1c2640b) switched from RUNNING to FINISHED.
22/01/27 14:16:36 INFO taskmanager.Task: Sink: Data stream collect sink (1/1)#0 (3cdc14eb995d664cc6a13696b6573777) switched from RUNNING to FINISHED.
22/01/27 14:16:36 INFO taskmanager.Task: Freeing task resources for Sink: Data stream collect sink (1/1)#0 (3cdc14eb995d664cc6a13696b6573777).
22/01/27 14:16:36 INFO taskexecutor.TaskExecutor: Un-registering task and sending final execution state FINISHED to JobManager for task Sink: Data stream collect sink (1/1)#0 3cdc14eb995d664cc6a13696b6573777.
22/01/27 14:16:36 INFO executiongraph.ExecutionGraph: Sink: Data stream collect sink (1/1) (3cdc14eb995d664cc6a13696b6573777) switched from RUNNING to FINISHED.
22/01/27 14:16:36 INFO executiongraph.ExecutionGraph: Job Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6 (192865ab3c206d9ad46887ca4033853c) switched from state RUNNING to FINISHED.
22/01/27 14:16:36 INFO checkpoint.CheckpointCoordinator: Stopping checkpoint coordinator for job 192865ab3c206d9ad46887ca4033853c.
22/01/27 14:16:36 INFO checkpoint.StandaloneCompletedCheckpointStore: Shutting down
22/01/27 14:16:36 INFO minicluster.MiniCluster: Shutting down Flink Mini Cluster
22/01/27 14:16:36 INFO dispatcher.StandaloneDispatcher: Job 192865ab3c206d9ad46887ca4033853c reached terminal state FINISHED.
22/01/27 14:16:36 INFO taskexecutor.TaskExecutor: Stopping TaskExecutor akka://flink/user/rpc/taskmanager_0.
22/01/27 14:16:36 INFO taskexecutor.TaskExecutor: Close ResourceManager connection a55ac0aa14085354b81ce468182a5def.
22/01/27 14:16:36 INFO dispatcher.DispatcherRestEndpoint: Shutting down rest endpoint.
22/01/27 14:16:36 INFO taskexecutor.TaskExecutor: Close JobManager connection for job 192865ab3c206d9ad46887ca4033853c.
22/01/27 14:16:36 INFO resourcemanager.StandaloneResourceManager: Closing TaskExecutor connection 5e923a69-2d6f-4a7c-82ad-2a264b55ec66 because: The TaskExecutor is shutting down.
22/01/27 14:16:36 INFO jobmaster.JobMaster: Stopping the JobMaster for job Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6(192865ab3c206d9ad46887ca4033853c).
22/01/27 14:16:36 INFO slot.TaskSlotTableImpl: Free slot TaskSlot(index:0, state:ALLOCATED, resource profile: ResourceProfile{taskHeapMemory=204.800gb (219902325555 bytes), taskOffHeapMemory=204.800gb (219902325555 bytes), managedMemory=25.600mb (26843545 bytes), networkMemory=12.800mb (13421772 bytes)}, allocationId: dce4d8b9e2ef7cdb75772dcd9f663ed5, jobId: 192865ab3c206d9ad46887ca4033853c).
22/01/27 14:16:36 INFO slotpool.SlotPoolImpl: Suspending SlotPool.
22/01/27 14:16:36 INFO jobmaster.JobMaster: Close ResourceManager connection a55ac0aa14085354b81ce468182a5def: Stopping JobMaster for job Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6(192865ab3c206d9ad46887ca4033853c)..
22/01/27 14:16:36 INFO slotpool.SlotPoolImpl: Stopping SlotPool.
22/01/27 14:16:36 INFO slot.TaskSlotTableImpl: Free slot TaskSlot(index:1, state:ALLOCATED, resource profile: ResourceProfile{taskHeapMemory=204.800gb (219902325555 bytes), taskOffHeapMemory=204.800gb (219902325555 bytes), managedMemory=25.600mb (26843545 bytes), networkMemory=12.800mb (13421772 bytes)}, allocationId: 519f346059a1ff6ddf6b8ea6664338ad, jobId: 192865ab3c206d9ad46887ca4033853c).
22/01/27 14:16:36 INFO resourcemanager.StandaloneResourceManager: Disconnect job manager 9512824db5053b5042eed33595704466@akka://flink/user/rpc/jobmanager_3 for job 192865ab3c206d9ad46887ca4033853c from the resource manager.
22/01/27 14:16:36 INFO slot.TaskSlotTableImpl: Free slot TaskSlot(index:2, state:ALLOCATED, resource profile: ResourceProfile{taskHeapMemory=204.800gb (219902325555 bytes), taskOffHeapMemory=204.800gb (219902325555 bytes), managedMemory=25.600mb (26843545 bytes), networkMemory=12.800mb (13421772 bytes)}, allocationId: 5163ceecd85688cc864c61e4a490b2fe, jobId: 192865ab3c206d9ad46887ca4033853c).
22/01/27 14:16:36 INFO slot.TaskSlotTableImpl: Free slot TaskSlot(index:3, state:ALLOCATED, resource profile: ResourceProfile{taskHeapMemory=204.800gb (219902325555 bytes), taskOffHeapMemory=204.800gb (219902325555 bytes), managedMemory=25.600mb (26843545 bytes), networkMemory=12.800mb (13421772 bytes)}, allocationId: 3144b0b9f45ef6d32c73fe979e50bcb9, jobId: 192865ab3c206d9ad46887ca4033853c).
22/01/27 14:16:36 INFO slot.TaskSlotTableImpl: Free slot TaskSlot(index:4, state:ALLOCATED, resource profile: ResourceProfile{taskHeapMemory=204.800gb (219902325555 bytes), taskOffHeapMemory=204.800gb (219902325555 bytes), managedMemory=25.600mb (26843545 bytes), networkMemory=12.800mb (13421772 bytes)}, allocationId: 91f39190bc584d5d5e2e3cfaff8bf2fe, jobId: 192865ab3c206d9ad46887ca4033853c).
22/01/27 14:16:36 INFO dispatcher.DispatcherRestEndpoint: Removing cache directory C:\Users\Administrator\AppData\Local\Temp\flink-web-ui
22/01/27 14:16:36 INFO dispatcher.DispatcherRestEndpoint: Shut down complete.
22/01/27 14:16:36 INFO taskexecutor.TaskExecutor: JobManager for job 192865ab3c206d9ad46887ca4033853c with leader id 9512824db5053b5042eed33595704466 lost leadership.
22/01/27 14:16:36 INFO resourcemanager.StandaloneResourceManager: Shut down cluster because application is in CANCELED, diagnostics DispatcherResourceManagerComponent has been closed..
22/01/27 14:16:36 INFO component.DispatcherResourceManagerComponent: Closing components.
22/01/27 14:16:36 INFO taskexecutor.DefaultJobLeaderService: Stop job leader service.
22/01/27 14:16:36 INFO state.TaskExecutorLocalStateStoresManager: Shutting down TaskExecutorLocalStateStoresManager.
22/01/27 14:16:36 INFO runner.SessionDispatcherLeaderProcess: Stopping SessionDispatcherLeaderProcess.
22/01/27 14:16:36 INFO dispatcher.StandaloneDispatcher: Stopping dispatcher akka://flink/user/rpc/dispatcher_2.
22/01/27 14:16:36 INFO dispatcher.StandaloneDispatcher: Stopping all currently running jobs of dispatcher akka://flink/user/rpc/dispatcher_2.
22/01/27 14:16:36 INFO slotmanager.SlotManagerImpl: Closing the SlotManager.
22/01/27 14:16:36 INFO slotmanager.SlotManagerImpl: Suspending the SlotManager.
22/01/27 14:16:36 INFO backpressure.BackPressureRequestCoordinator: Shutting down back pressure request coordinator.
22/01/27 14:16:36 INFO dispatcher.StandaloneDispatcher: Stopped dispatcher akka://flink/user/rpc/dispatcher_2.
22/01/27 14:16:36 INFO disk.FileChannelManagerImpl: FileChannelManager removed spill file directory C:\Users\Administrator\AppData\Local\Temp\flink-io-ffab8076-73ad-4315-b887-c085c53ccc60
22/01/27 14:16:36 INFO network.NettyShuffleEnvironment: Shutting down the network environment and its components.
22/01/27 14:16:36 INFO disk.FileChannelManagerImpl: FileChannelManager removed spill file directory C:\Users\Administrator\AppData\Local\Temp\flink-netty-shuffle-06e9023c-7dee-45ce-835c-435a1d3da9a9
22/01/27 14:16:36 INFO taskexecutor.KvStateService: Shutting down the kvState service and its components.
22/01/27 14:16:36 INFO taskexecutor.DefaultJobLeaderService: Stop job leader service.
22/01/27 14:16:36 INFO filecache.FileCache: removed file cache directory C:\Users\Administrator\AppData\Local\Temp\flink-dist-cache-11b91016-ba97-47d5-bfbb-db6e628cc831
22/01/27 14:16:36 INFO taskexecutor.TaskExecutor: Stopped TaskExecutor akka://flink/user/rpc/taskmanager_0.
22/01/27 14:16:36 INFO akka.AkkaRpcService: Stopping Akka RPC service.
22/01/27 14:16:36 WARN collect.CollectResultFetcher: Failed to get job status so we assume that the job has terminated. Some data might be lost.
java.lang.IllegalStateException: MiniCluster is not yet running or has already been shut down.
	at org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)
	at org.apache.flink.runtime.minicluster.MiniCluster.getDispatcherGatewayFuture(MiniCluster.java:850)
	at org.apache.flink.runtime.minicluster.MiniCluster.runDispatcherCommand(MiniCluster.java:750)
	at org.apache.flink.runtime.minicluster.MiniCluster.getJobStatus(MiniCluster.java:703)
	at org.apache.flink.runtime.minicluster.MiniClusterJobClient.getJobStatus(MiniClusterJobClient.java:88)
	at org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.isJobTerminated(CollectResultFetcher.java:195)
	at org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:115)
	at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:106)
	at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:80)
	at org.apache.iceberg.relocated.com.google.common.collect.Iterators.addAll(Iterators.java:355)
	at org.apache.iceberg.relocated.com.google.common.collect.Lists.newArrayList(Lists.java:143)
	at org.apache.iceberg.flink.source.RowDataRewriter.rewriteDataForTasks(RowDataRewriter.java:86)
	at org.apache.iceberg.flink.actions.RewriteDataFilesAction.rewriteDataForTasks(RewriteDataFilesAction.java:56)
	at org.apache.iceberg.actions.BaseRewriteDataFilesAction.execute(BaseRewriteDataFilesAction.java:246)
	at org.example.FlinkDataStreamSmallFileCompactTest$.main(FlinkDataStreamSmallFileCompactTest.scala:65)
	at org.example.FlinkDataStreamSmallFileCompactTest.main(FlinkDataStreamSmallFileCompactTest.scala)
22/01/27 14:16:36 INFO akka.AkkaRpcService: Stopping Akka RPC service.
22/01/27 14:16:36 INFO akka.AkkaRpcService: Stopped Akka RPC service.
22/01/27 14:16:36 INFO blob.PermanentBlobCache: Shutting down BLOB cache
22/01/27 14:16:36 INFO blob.TransientBlobCache: Shutting down BLOB cache
22/01/27 14:16:36 INFO blob.BlobServer: Stopped BLOB server at 0.0.0.0:62600
22/01/27 14:16:36 INFO akka.AkkaRpcService: Stopped Akka RPC service.
22/01/27 14:16:40 INFO iceberg.BaseMetastoreTableOperations: Successfully committed to table hive_catalog6.iceberg_db6.behavior_log_ib6 in 3143 ms
22/01/27 14:16:40 INFO iceberg.SnapshotProducer: Committed snapshot 3755008657917548011 (BaseRewriteFiles)
22/01/27 14:16:40 INFO iceberg.BaseMetastoreTableOperations: Refreshing table metadata from new version: hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08708-75efd8f6-ba3f-47dc-8b89-b3177c477a62.metadata.json

Process finished with exit code 0

最后三行的日志:
生成新的快照和metadata文件

22/01/27 14:16:40 INFO iceberg.BaseMetastoreTableOperations: Successfully committed to table hive_catalog6.iceberg_db6.behavior_log_ib6 in 3143 ms
22/01/27 14:16:40 INFO iceberg.SnapshotProducer: Committed snapshot 3755008657917548011 (BaseRewriteFiles)
22/01/27 14:16:40 INFO iceberg.BaseMetastoreTableOperations: Refreshing table metadata from new version: hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08708-75efd8f6-ba3f-47dc-8b89-b3177c477a62.metadata.json

3.3 继续合并

再合并一次看看:

3.3.1 执行前,文件个数:

[root@hadoop103 hadoop]#  hadoop fs -ls hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data |wc
  17977  143811 3577242
[root@hadoop103 hadoop]#  hadoop fs -ls hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata |wc
  26220  209755 5267941

3.3.2 执行后

[root@hadoop103 hadoop]#   hadoop fs -ls hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data |wc
  17978  143819 3577441
[root@hadoop103 hadoop]#   hadoop fs -ls hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata |wc
  26225  209795 5268922

3.3.3 分析差异

数据文件多一个,metadata文件多4个

4.运行的代码

4.1 合并小文件代码

    val env = StreamExecutionEnvironment.getExecutionEnvironment
    System.setProperty("HADOOP_USER_NAME", "root") //以root用户写入
    val map = new util.HashMap[String,String]()
    map.put("type", "iceberg")
    map.put("catalog-type", "hive")
    map.put("property-version", "2")
    map.put("/warehouse", "/user/hive/warehouse")
//    map.put("datanucleus.schema.autoCreateTables", "true")
//    压缩小文件
//    快照过期处理
    map.put("uri", "thrift://hadoop101:9083")
    val iceberg_catalog = CatalogLoader.hive(
      "hive_catalog6",//catalog名称
      new Configuration(),
      new util.HashMap()
    )
    val identifier = TableIdentifier.of(Namespace.of("iceberg_db6"), //db名称
      "behavior_log_ib6")//表名称
    val loader = TableLoader.fromCatalog(iceberg_catalog, identifier)
    loader.open()
    val table = loader.loadTable()
    Actions.forTable(env, table)
      .rewriteDataFiles
      .maxParallelism(5)
      .targetSizeInBytes(128 * 1024 * 1024)
      .execute

执行日志:

22/01/27 14:37:52 INFO conf.HiveConf: Found configuration file file:/E:/workspace/jt_workspace/iceberg-learning/target/classes/hive-site.xml
22/01/27 14:37:52 WARN conf.HiveConf: HiveConf of name hive.metastore.event.db.notification.api.auth does not exist
22/01/27 14:37:52 INFO security.JniBasedUnixGroupsMapping: Error getting groups for root: Unknown error.
22/01/27 14:37:52 WARN security.UserGroupInformation: No groups available for user root
22/01/27 14:37:52 INFO iceberg.BaseMetastoreTableOperations: Refreshing table metadata from new version: hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08708-75efd8f6-ba3f-47dc-8b89-b3177c477a62.metadata.json
22/01/27 14:37:59 INFO iceberg.BaseMetastoreCatalog: Table loaded by catalog: hive_catalog6.iceberg_db6.behavior_log_ib6
22/01/27 14:37:59 INFO iceberg.BaseTableScan: Scanning table hive_catalog6.iceberg_db6.behavior_log_ib6 snapshot 3755008657917548011 created at 2022-01-27 14:16:37.204 with filter true
22/01/27 14:38:01 INFO typeutils.TypeExtractor: class org.apache.iceberg.BaseCombinedScanTask does not contain a getter for field tasks
22/01/27 14:38:01 INFO typeutils.TypeExtractor: class org.apache.iceberg.BaseCombinedScanTask does not contain a setter for field tasks
22/01/27 14:38:01 INFO typeutils.TypeExtractor: Class class org.apache.iceberg.BaseCombinedScanTask cannot be used as a POJO type because not all fields are valid POJO fields, and must be processed as GenericType. Please read the Flink documentation on "Data Types & Serialization" for details of the effect on performance.
22/01/27 14:38:01 INFO taskexecutor.TaskExecutorResourceUtils: The configuration option taskmanager.cpu.cores required for local execution is not set, setting it to the maximal possible value.
22/01/27 14:38:01 INFO taskexecutor.TaskExecutorResourceUtils: The configuration option taskmanager.memory.task.heap.size required for local execution is not set, setting it to the maximal possible value.
22/01/27 14:38:01 INFO taskexecutor.TaskExecutorResourceUtils: The configuration option taskmanager.memory.task.off-heap.size required for local execution is not set, setting it to the maximal possible value.
22/01/27 14:38:01 INFO taskexecutor.TaskExecutorResourceUtils: The configuration option taskmanager.memory.network.min required for local execution is not set, setting it to its default value 64 mb.
22/01/27 14:38:01 INFO taskexecutor.TaskExecutorResourceUtils: The configuration option taskmanager.memory.network.max required for local execution is not set, setting it to its default value 64 mb.
22/01/27 14:38:01 INFO taskexecutor.TaskExecutorResourceUtils: The configuration option taskmanager.memory.managed.size required for local execution is not set, setting it to its default value 128 mb.
22/01/27 14:38:01 INFO minicluster.MiniCluster: Starting Flink Mini Cluster
22/01/27 14:38:01 INFO minicluster.MiniCluster: Starting Metrics Registry
22/01/27 14:38:01 INFO metrics.MetricRegistryImpl: No metrics reporter configured, no metrics will be exposed/reported.
22/01/27 14:38:01 INFO minicluster.MiniCluster: Starting RPC Service(s)
22/01/27 14:38:01 INFO akka.AkkaRpcServiceUtils: Trying to start local actor system
22/01/27 14:38:02 INFO akka.AkkaRpcServiceUtils: Actor system started at akka://flink
22/01/27 14:38:02 INFO akka.AkkaRpcServiceUtils: Trying to start local actor system
22/01/27 14:38:02 INFO akka.AkkaRpcServiceUtils: Actor system started at akka://flink-metrics
22/01/27 14:38:02 INFO akka.AkkaRpcService: Starting RPC endpoint for org.apache.flink.runtime.metrics.dump.MetricQueryService at akka://flink-metrics/user/rpc/MetricQueryService .
22/01/27 14:38:02 INFO minicluster.MiniCluster: Starting high-availability services
22/01/27 14:38:02 INFO blob.BlobServer: Created BLOB server storage directory C:\Users\Administrator\AppData\Local\Temp\blobStore-b8344438-f061-4331-914c-c4e2cfae7d6f
22/01/27 14:38:02 INFO blob.BlobServer: Started BLOB server at 0.0.0.0:60731 - max concurrent requests: 50 - max backlog: 1000
22/01/27 14:38:02 INFO blob.PermanentBlobCache: Created BLOB cache storage directory C:\Users\Administrator\AppData\Local\Temp\blobStore-9bb037b6-38bd-45db-8ecd-37fac58a3f19
22/01/27 14:38:02 INFO blob.TransientBlobCache: Created BLOB cache storage directory C:\Users\Administrator\AppData\Local\Temp\blobStore-fd98d426-a515-4230-be4d-a9ed2e0a6604
22/01/27 14:38:02 INFO minicluster.MiniCluster: Starting 1 TaskManger(s)
22/01/27 14:38:02 INFO taskexecutor.TaskManagerRunner: Starting TaskManager with ResourceID: 2591245a-420a-4ee9-b827-c7a59c701176
22/01/27 14:38:02 INFO taskexecutor.TaskManagerServices: Temporary file directory 'C:\Users\Administrator\AppData\Local\Temp': total 180 GB, usable 113 GB (62.78% usable)
22/01/27 14:38:02 INFO disk.FileChannelManagerImpl: FileChannelManager uses directory C:\Users\Administrator\AppData\Local\Temp\flink-io-dd637dbf-fee2-4d2d-b0ff-855ea7e7b4a2 for spill files.
22/01/27 14:38:02 INFO disk.FileChannelManagerImpl: FileChannelManager uses directory C:\Users\Administrator\AppData\Local\Temp\flink-netty-shuffle-6df7818a-c10c-4b76-acd4-16d41c25e48e for spill files.
22/01/27 14:38:02 INFO buffer.NetworkBufferPool: Allocated 64 MB for network buffer pool (number of memory segments: 2048, bytes per segment: 32768).
22/01/27 14:38:02 INFO network.NettyShuffleEnvironment: Starting the network environment and its components.
22/01/27 14:38:02 INFO taskexecutor.KvStateService: Starting the kvState service and its components.
22/01/27 14:38:02 INFO akka.AkkaRpcService: Starting RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at akka://flink/user/rpc/taskmanager_0 .
22/01/27 14:38:02 INFO taskexecutor.DefaultJobLeaderService: Start job leader service.
22/01/27 14:38:02 INFO filecache.FileCache: User file cache uses directory C:\Users\Administrator\AppData\Local\Temp\flink-dist-cache-52714df9-cf7f-4499-8456-82bd55e34aa9
22/01/27 14:38:02 INFO dispatcher.DispatcherRestEndpoint: Starting rest endpoint.
22/01/27 14:38:02 WARN webmonitor.WebMonitorUtils: Log file environment variable 'log.file' is not set.
22/01/27 14:38:02 WARN webmonitor.WebMonitorUtils: JobManager log files are unavailable in the web dashboard. Log file location not found in environment variable 'log.file' or configuration key 'web.log.path'.
22/01/27 14:38:03 INFO dispatcher.DispatcherRestEndpoint: Rest endpoint listening at localhost:60767
22/01/27 14:38:03 INFO embedded.EmbeddedLeaderService: Proposing leadership to contender http://localhost:60767
22/01/27 14:38:03 INFO dispatcher.DispatcherRestEndpoint: Web frontend listening at http://localhost:60767.
22/01/27 14:38:03 INFO dispatcher.DispatcherRestEndpoint: http://localhost:60767 was granted leadership with leaderSessionID=59fe1c92-00c7-4ed9-8ed3-3d18d9fabb74
22/01/27 14:38:03 INFO embedded.EmbeddedLeaderService: Received confirmation of leadership for leader http://localhost:60767 , session=59fe1c92-00c7-4ed9-8ed3-3d18d9fabb74
22/01/27 14:38:03 INFO akka.AkkaRpcService: Starting RPC endpoint for org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at akka://flink/user/rpc/resourcemanager_1 .
22/01/27 14:38:03 INFO embedded.EmbeddedLeaderService: Proposing leadership to contender LeaderContender: DefaultDispatcherRunner
22/01/27 14:38:03 INFO embedded.EmbeddedLeaderService: Proposing leadership to contender LeaderContender: StandaloneResourceManager
22/01/27 14:38:03 INFO resourcemanager.StandaloneResourceManager: ResourceManager akka://flink/user/rpc/resourcemanager_1 was granted leadership with fencing token be03704155ef33eb576388b7a9ba40f4
22/01/27 14:38:03 INFO minicluster.MiniCluster: Flink Mini Cluster started successfully
22/01/27 14:38:03 INFO slotmanager.SlotManagerImpl: Starting the SlotManager.
22/01/27 14:38:03 INFO runner.SessionDispatcherLeaderProcess: Start SessionDispatcherLeaderProcess.
22/01/27 14:38:03 INFO runner.SessionDispatcherLeaderProcess: Recover all persisted job graphs.
22/01/27 14:38:03 INFO runner.SessionDispatcherLeaderProcess: Successfully recovered 0 persisted job graphs.
22/01/27 14:38:03 INFO embedded.EmbeddedLeaderService: Received confirmation of leadership for leader akka://flink/user/rpc/resourcemanager_1 , session=576388b7-a9ba-40f4-be03-704155ef33eb
22/01/27 14:38:03 INFO taskexecutor.TaskExecutor: Connecting to ResourceManager akka://flink/user/rpc/resourcemanager_1(be03704155ef33eb576388b7a9ba40f4).
22/01/27 14:38:03 INFO akka.AkkaRpcService: Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/rpc/dispatcher_2 .
22/01/27 14:38:03 INFO embedded.EmbeddedLeaderService: Received confirmation of leadership for leader akka://flink/user/rpc/dispatcher_2 , session=6f4da6b4-8d37-48dc-b0c7-fffd4df03d26
22/01/27 14:38:03 INFO taskexecutor.TaskExecutor: Resolved ResourceManager address, beginning registration
22/01/27 14:38:03 INFO resourcemanager.StandaloneResourceManager: Registering TaskManager with ResourceID 2591245a-420a-4ee9-b827-c7a59c701176 (akka://flink/user/rpc/taskmanager_0) at ResourceManager
22/01/27 14:38:03 INFO taskexecutor.TaskExecutor: Successful registration at resource manager akka://flink/user/rpc/resourcemanager_1 under registration id 393e714296231d5c37661994b8a3b12c.
22/01/27 14:38:03 INFO dispatcher.StandaloneDispatcher: Received JobGraph submission 30878dc6c49074a250f4b561c074e00b (Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6).
22/01/27 14:38:03 INFO dispatcher.StandaloneDispatcher: Submitting job 30878dc6c49074a250f4b561c074e00b (Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6).
22/01/27 14:38:03 INFO embedded.EmbeddedLeaderService: Proposing leadership to contender LeaderContender: JobManagerRunnerImpl
22/01/27 14:38:03 INFO akka.AkkaRpcService: Starting RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at akka://flink/user/rpc/jobmanager_3 .
22/01/27 14:38:03 INFO jobmaster.JobMaster: Initializing job Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6 (30878dc6c49074a250f4b561c074e00b).
22/01/27 14:38:03 INFO jobmaster.JobMaster: Using restart back off time strategy NoRestartBackoffTimeStrategy for Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6 (30878dc6c49074a250f4b561c074e00b).
22/01/27 14:38:03 INFO jobmaster.JobMaster: Running initialization on master for job Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6 (30878dc6c49074a250f4b561c074e00b).
22/01/27 14:38:03 INFO jobmaster.JobMaster: Successfully ran initialization on master in 0 ms.
22/01/27 14:38:03 INFO adapter.DefaultExecutionTopology: Built 1 pipelined regions in 0 ms
22/01/27 14:38:03 INFO jobmaster.JobMaster: No state backend has been configured, using default (Memory / JobManager) MemoryStateBackend (data in heap memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, maxStateSize: 5242880)
22/01/27 14:38:03 INFO checkpoint.CheckpointCoordinator: No checkpoint found during restore.
22/01/27 14:38:03 INFO jobmaster.JobMaster: Using failover strategy org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@493345a5 for Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6 (30878dc6c49074a250f4b561c074e00b).
22/01/27 14:38:03 INFO jobmaster.JobManagerRunnerImpl: JobManager runner for job Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6 (30878dc6c49074a250f4b561c074e00b) was granted leadership with session id 5e0be95b-935b-4825-a010-3fe0e31dede3 at akka://flink/user/rpc/jobmanager_3.
22/01/27 14:38:03 INFO jobmaster.JobMaster: Starting execution of job Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6 (30878dc6c49074a250f4b561c074e00b) under job master id a0103fe0e31dede35e0be95b935b4825.
22/01/27 14:38:03 INFO jobmaster.JobMaster: Starting scheduling with scheduling strategy [org.apache.flink.runtime.scheduler.strategy.PipelinedRegionSchedulingStrategy]
22/01/27 14:38:03 INFO executiongraph.ExecutionGraph: Job Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6 (30878dc6c49074a250f4b561c074e00b) switched from state CREATED to RUNNING.
22/01/27 14:38:03 INFO executiongraph.ExecutionGraph: Source: Collection Source -> Map -> Sink: Data stream collect sink (1/1) (d84d4270791f5ca1636abaf04cc0d88c) switched from CREATED to SCHEDULED.
22/01/27 14:38:03 INFO slotpool.SlotPoolImpl: Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{40ad13783d1195d51569b63d74312680}]
22/01/27 14:38:03 INFO embedded.EmbeddedLeaderService: Received confirmation of leadership for leader akka://flink/user/rpc/jobmanager_3 , session=5e0be95b-935b-4825-a010-3fe0e31dede3
22/01/27 14:38:03 INFO jobmaster.JobMaster: Connecting to ResourceManager akka://flink/user/rpc/resourcemanager_1(be03704155ef33eb576388b7a9ba40f4)
22/01/27 14:38:03 INFO jobmaster.JobMaster: Resolved ResourceManager address, beginning registration
22/01/27 14:38:03 INFO resourcemanager.StandaloneResourceManager: Registering job manager a0103fe0e31dede35e0be95b935b4825@akka://flink/user/rpc/jobmanager_3 for job 30878dc6c49074a250f4b561c074e00b.
22/01/27 14:38:03 INFO resourcemanager.StandaloneResourceManager: Registered job manager a0103fe0e31dede35e0be95b935b4825@akka://flink/user/rpc/jobmanager_3 for job 30878dc6c49074a250f4b561c074e00b.
22/01/27 14:38:03 INFO jobmaster.JobMaster: JobManager successfully registered at ResourceManager, leader id: be03704155ef33eb576388b7a9ba40f4.
22/01/27 14:38:03 INFO slotpool.SlotPoolImpl: Requesting new slot [SlotRequestId{40ad13783d1195d51569b63d74312680}] and profile ResourceProfile{UNKNOWN} with allocation id bfd0a6be24d74c510c6c058d7580262a from resource manager.
22/01/27 14:38:03 INFO resourcemanager.StandaloneResourceManager: Request slot with profile ResourceProfile{UNKNOWN} for job 30878dc6c49074a250f4b561c074e00b with allocation id bfd0a6be24d74c510c6c058d7580262a.
22/01/27 14:38:03 INFO taskexecutor.TaskExecutor: Receive slot request bfd0a6be24d74c510c6c058d7580262a for job 30878dc6c49074a250f4b561c074e00b from resource manager with leader id be03704155ef33eb576388b7a9ba40f4.
22/01/27 14:38:03 INFO taskexecutor.TaskExecutor: Allocated slot for bfd0a6be24d74c510c6c058d7580262a.
22/01/27 14:38:03 INFO taskexecutor.DefaultJobLeaderService: Add job 30878dc6c49074a250f4b561c074e00b for job leader monitoring.
22/01/27 14:38:03 INFO taskexecutor.DefaultJobLeaderService: Try to register at job manager akka://flink/user/rpc/jobmanager_3 with leader id 5e0be95b-935b-4825-a010-3fe0e31dede3.
22/01/27 14:38:03 INFO taskexecutor.DefaultJobLeaderService: Resolved JobManager address, beginning registration
22/01/27 14:38:03 INFO taskexecutor.DefaultJobLeaderService: Successful registration at job manager akka://flink/user/rpc/jobmanager_3 for job 30878dc6c49074a250f4b561c074e00b.
22/01/27 14:38:03 INFO taskexecutor.TaskExecutor: Establish JobManager connection for job 30878dc6c49074a250f4b561c074e00b.
22/01/27 14:38:03 INFO taskexecutor.TaskExecutor: Offer reserved slots to the leader of job 30878dc6c49074a250f4b561c074e00b.
22/01/27 14:38:03 INFO executiongraph.ExecutionGraph: Source: Collection Source -> Map -> Sink: Data stream collect sink (1/1) (d84d4270791f5ca1636abaf04cc0d88c) switched from SCHEDULED to DEPLOYING.
22/01/27 14:38:03 INFO executiongraph.ExecutionGraph: Deploying Source: Collection Source -> Map -> Sink: Data stream collect sink (1/1) (attempt #0) with attempt id d84d4270791f5ca1636abaf04cc0d88c to 2591245a-420a-4ee9-b827-c7a59c701176 @ 127.0.0.1 (dataPort=-1) with allocation id bfd0a6be24d74c510c6c058d7580262a
22/01/27 14:38:03 INFO slot.TaskSlotTableImpl: Activate slot bfd0a6be24d74c510c6c058d7580262a.
22/01/27 14:38:03 INFO taskexecutor.TaskExecutor: Received task Source: Collection Source -> Map -> Sink: Data stream collect sink (1/1)#0 (d84d4270791f5ca1636abaf04cc0d88c), deploy into slot with allocation id bfd0a6be24d74c510c6c058d7580262a.
22/01/27 14:38:03 INFO taskmanager.Task: Source: Collection Source -> Map -> Sink: Data stream collect sink (1/1)#0 (d84d4270791f5ca1636abaf04cc0d88c) switched from CREATED to DEPLOYING.
22/01/27 14:38:03 INFO slot.TaskSlotTableImpl: Activate slot bfd0a6be24d74c510c6c058d7580262a.
22/01/27 14:38:03 INFO taskmanager.Task: Loading JAR files for task Source: Collection Source -> Map -> Sink: Data stream collect sink (1/1)#0 (d84d4270791f5ca1636abaf04cc0d88c) [DEPLOYING].
22/01/27 14:38:03 INFO taskmanager.Task: Registering task at network: Source: Collection Source -> Map -> Sink: Data stream collect sink (1/1)#0 (d84d4270791f5ca1636abaf04cc0d88c) [DEPLOYING].
22/01/27 14:38:03 INFO tasks.StreamTask: No state backend has been configured, using default (Memory / JobManager) MemoryStateBackend (data in heap memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, maxStateSize: 5242880)
22/01/27 14:38:03 INFO taskmanager.Task: Source: Collection Source -> Map -> Sink: Data stream collect sink (1/1)#0 (d84d4270791f5ca1636abaf04cc0d88c) switched from DEPLOYING to RUNNING.
22/01/27 14:38:03 INFO executiongraph.ExecutionGraph: Source: Collection Source -> Map -> Sink: Data stream collect sink (1/1) (d84d4270791f5ca1636abaf04cc0d88c) switched from DEPLOYING to RUNNING.
22/01/27 14:38:03 INFO collect.CollectSinkFunction: Initializing collect sink state with offset = 0, buffered results bytes = 0
22/01/27 14:38:03 INFO collect.CollectSinkFunction: Collect sink server established, address = localhost/127.0.0.1:60769
22/01/27 14:38:03 INFO collect.CollectSinkOperatorCoordinator: Received sink socket server address: localhost/127.0.0.1:60769
22/01/27 14:38:03 INFO collect.CollectSinkOperatorCoordinator: Sink connection established
22/01/27 14:38:03 INFO collect.CollectSinkFunction: Coordinator connection received
22/01/27 14:38:03 INFO collect.CollectSinkFunction: Invalid request. Received version = , offset = 0, while expected version = 42ad1eb7-0144-4a8e-8a12-5c2c0524a630, offset = 0
22/01/27 14:38:03 WARN zlib.ZlibFactory: Failed to load/initialize native-zlib library
22/01/27 14:38:03 INFO compress.CodecPool: Got brand-new compressor [.gz]
22/01/27 14:38:04 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:38:04 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:38:04 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:38:04 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:38:05 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:38:05 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:38:05 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:38:05 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:38:06 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:38:06 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:38:06 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:38:06 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:38:07 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:38:07 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:38:07 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:38:07 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:38:07 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:38:08 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/01/27 14:38:09 INFO taskmanager.Task: Source: Collection Source -> Map -> Sink: Data stream collect sink (1/1)#0 (d84d4270791f5ca1636abaf04cc0d88c) switched from RUNNING to FINISHED.
22/01/27 14:38:09 INFO taskmanager.Task: Freeing task resources for Source: Collection Source -> Map -> Sink: Data stream collect sink (1/1)#0 (d84d4270791f5ca1636abaf04cc0d88c).
22/01/27 14:38:09 INFO taskexecutor.TaskExecutor: Un-registering task and sending final execution state FINISHED to JobManager for task Source: Collection Source -> Map -> Sink: Data stream collect sink (1/1)#0 d84d4270791f5ca1636abaf04cc0d88c.
22/01/27 14:38:09 INFO executiongraph.ExecutionGraph: Source: Collection Source -> Map -> Sink: Data stream collect sink (1/1) (d84d4270791f5ca1636abaf04cc0d88c) switched from RUNNING to FINISHED.
22/01/27 14:38:09 INFO executiongraph.ExecutionGraph: Job Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6 (30878dc6c49074a250f4b561c074e00b) switched from state RUNNING to FINISHED.
22/01/27 14:38:09 INFO checkpoint.CheckpointCoordinator: Stopping checkpoint coordinator for job 30878dc6c49074a250f4b561c074e00b.
22/01/27 14:38:09 INFO checkpoint.StandaloneCompletedCheckpointStore: Shutting down
22/01/27 14:38:09 INFO minicluster.MiniCluster: Shutting down Flink Mini Cluster
22/01/27 14:38:09 INFO dispatcher.StandaloneDispatcher: Job 30878dc6c49074a250f4b561c074e00b reached terminal state FINISHED.
22/01/27 14:38:09 INFO taskexecutor.TaskExecutor: Stopping TaskExecutor akka://flink/user/rpc/taskmanager_0.
22/01/27 14:38:09 INFO taskexecutor.TaskExecutor: Close ResourceManager connection 16aae5434315b506142180930f0de03b.
22/01/27 14:38:09 INFO dispatcher.DispatcherRestEndpoint: Shutting down rest endpoint.
22/01/27 14:38:09 INFO resourcemanager.StandaloneResourceManager: Closing TaskExecutor connection 2591245a-420a-4ee9-b827-c7a59c701176 because: The TaskExecutor is shutting down.
22/01/27 14:38:09 INFO taskexecutor.TaskExecutor: Close JobManager connection for job 30878dc6c49074a250f4b561c074e00b.
22/01/27 14:38:09 INFO slot.TaskSlotTableImpl: Free slot TaskSlot(index:0, state:ALLOCATED, resource profile: ResourceProfile{taskHeapMemory=1024.000gb (1099511627776 bytes), taskOffHeapMemory=1024.000gb (1099511627776 bytes), managedMemory=128.000mb (134217728 bytes), networkMemory=64.000mb (67108864 bytes)}, allocationId: bfd0a6be24d74c510c6c058d7580262a, jobId: 30878dc6c49074a250f4b561c074e00b).
22/01/27 14:38:09 INFO jobmaster.JobMaster: Stopping the JobMaster for job Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6(30878dc6c49074a250f4b561c074e00b).
22/01/27 14:38:09 INFO slotpool.SlotPoolImpl: Suspending SlotPool.
22/01/27 14:38:09 INFO jobmaster.JobMaster: Close ResourceManager connection 16aae5434315b506142180930f0de03b: Stopping JobMaster for job Rewrite table :hive_catalog6.iceberg_db6.behavior_log_ib6(30878dc6c49074a250f4b561c074e00b)..
22/01/27 14:38:09 INFO slotpool.SlotPoolImpl: Stopping SlotPool.
22/01/27 14:38:09 INFO resourcemanager.StandaloneResourceManager: Disconnect job manager a0103fe0e31dede35e0be95b935b4825@akka://flink/user/rpc/jobmanager_3 for job 30878dc6c49074a250f4b561c074e00b from the resource manager.
22/01/27 14:38:09 INFO taskexecutor.DefaultJobLeaderService: Stop job leader service.
22/01/27 14:38:09 INFO state.TaskExecutorLocalStateStoresManager: Shutting down TaskExecutorLocalStateStoresManager.
22/01/27 14:38:09 INFO dispatcher.DispatcherRestEndpoint: Removing cache directory C:\Users\Administrator\AppData\Local\Temp\flink-web-ui
22/01/27 14:38:09 INFO dispatcher.DispatcherRestEndpoint: Shut down complete.
22/01/27 14:38:09 INFO disk.FileChannelManagerImpl: FileChannelManager removed spill file directory C:\Users\Administrator\AppData\Local\Temp\flink-io-dd637dbf-fee2-4d2d-b0ff-855ea7e7b4a2
22/01/27 14:38:09 INFO network.NettyShuffleEnvironment: Shutting down the network environment and its components.
22/01/27 14:38:09 INFO disk.FileChannelManagerImpl: FileChannelManager removed spill file directory C:\Users\Administrator\AppData\Local\Temp\flink-netty-shuffle-6df7818a-c10c-4b76-acd4-16d41c25e48e
22/01/27 14:38:09 INFO resourcemanager.StandaloneResourceManager: Shut down cluster because application is in CANCELED, diagnostics DispatcherResourceManagerComponent has been closed..
22/01/27 14:38:09 INFO taskexecutor.KvStateService: Shutting down the kvState service and its components.
22/01/27 14:38:09 INFO taskexecutor.DefaultJobLeaderService: Stop job leader service.
22/01/27 14:38:09 INFO component.DispatcherResourceManagerComponent: Closing components.
22/01/27 14:38:09 INFO runner.SessionDispatcherLeaderProcess: Stopping SessionDispatcherLeaderProcess.
22/01/27 14:38:09 INFO filecache.FileCache: removed file cache directory C:\Users\Administrator\AppData\Local\Temp\flink-dist-cache-52714df9-cf7f-4499-8456-82bd55e34aa9
22/01/27 14:38:09 INFO dispatcher.StandaloneDispatcher: Stopping dispatcher akka://flink/user/rpc/dispatcher_2.
22/01/27 14:38:09 INFO dispatcher.StandaloneDispatcher: Stopping all currently running jobs of dispatcher akka://flink/user/rpc/dispatcher_2.
22/01/27 14:38:09 INFO taskexecutor.TaskExecutor: Stopped TaskExecutor akka://flink/user/rpc/taskmanager_0.
22/01/27 14:38:09 INFO slotmanager.SlotManagerImpl: Closing the SlotManager.
22/01/27 14:38:09 INFO slotmanager.SlotManagerImpl: Suspending the SlotManager.
22/01/27 14:38:09 INFO backpressure.BackPressureRequestCoordinator: Shutting down back pressure request coordinator.
22/01/27 14:38:09 INFO dispatcher.StandaloneDispatcher: Stopped dispatcher akka://flink/user/rpc/dispatcher_2.
22/01/27 14:38:09 INFO akka.AkkaRpcService: Stopping Akka RPC service.
22/01/27 14:38:09 INFO akka.AkkaRpcService: Stopping Akka RPC service.
22/01/27 14:38:09 INFO akka.AkkaRpcService: Stopped Akka RPC service.
22/01/27 14:38:09 INFO blob.PermanentBlobCache: Shutting down BLOB cache
22/01/27 14:38:09 INFO blob.TransientBlobCache: Shutting down BLOB cache
22/01/27 14:38:09 INFO blob.BlobServer: Stopped BLOB server at 0.0.0.0:60731
22/01/27 14:38:09 INFO akka.AkkaRpcService: Stopped Akka RPC service.
22/01/27 14:38:13 INFO iceberg.BaseMetastoreTableOperations: Successfully committed to table hive_catalog6.iceberg_db6.behavior_log_ib6 in 3247 ms
22/01/27 14:38:13 INFO iceberg.SnapshotProducer: Committed snapshot 7762404597294868190 (BaseRewriteFiles)
22/01/27 14:38:13 INFO iceberg.BaseMetastoreTableOperations: Refreshing table metadata from new version: hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08709-78209251-777c-4a4f-9292-64cf3f2190ae.metadata.json

Process finished with exit code 0


总结

合并文件就是生成新的快照,拥有新的合并后的数据文件,对原来的小文件并不进行删除。 那如何删除小文件?请看下一课。
  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 7
    评论
### 回答1: 数据湖Iceberg是一种新型的数据存储模式。它不同于传统的数据仓库,数据湖Iceberg通过分层结构对数据进行存储,可以支持更加灵活和高效的数据处理及分析。 数据湖Iceberg的核心概念是“iceberg table”,即冰山表。冰山表是一个包含了数据的所有历史版本的可变表格。它允许在表的顶层进行数据的增、删、改操作,而历史版本则被保留在表的底层。当我们进行数据分析时,可以选择按需加载较新的数据版本以加快查询速度,而历史版本则可用于数据可追溯性和合规性的要求。 数据湖Iceberg同时支持分布式和即席查询,可在存储大规模数据的同时支持高效处理。此外,Iceberg还提供了Schema Evolution功能,允许数据模式随着时间的推移而改变。 综上所述,数据湖Iceberg是一种具有高度灵活性和可伸缩性的数据存储方法,可以为企业提供更好的数据处理和分析体验,从而提高企业的决策效率和业务竞争力。 ### 回答2: 数据湖Iceberg是一种高度可扩展和灵活的数据存储模型,可帮助组织快速处理和管理大量的非结构化和半结构化数据。与传统数据仓库不同,数据湖的设计原则在于不结构化数据,具有高度可扩展性和灵活性。 Iceberg是一种构建于数据湖之上的开源存储框架,它使用多种数据格式,从传统的Hadoop MapReduce,到Spark,到AWS,到Azure,以及其他类似的技术,可以运行在静态访问和动态查询两种模式下,以便更好地管理和查询数据湖中的数据。Iceberg能够支持多种存储引擎,并且支持多个语言。 Iceberg提供了一种可靠的分布式事务技术,确保数据完整性和一致性。此外,它还提供了一种轻松访问历史数据的方法,同时还能检测到数据更改的时间,并允许用户向不同的存储层提供不同的数据视图。 总之,数据湖Iceberg为大数据处理提供了一种可靠和灵活的存储和查询解决方案,并且对Hadoop生态系统的使用和扩展非常友好。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值