1.版本说明:Flink : 1.2.x, Hudi : 0.8.x, Scala : 2.11;
2.创建Flink项目,参考文章;
3.pom文件引入对table和sql的支持包,如下:
<!-- table start -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-api-java-bridge_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner-blink_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-common</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<!-- table end -->
4.pom文件引入对hudi的支持包,如下:
<!-- hudi start -->
<dependency>
<groupId>org.apache.hudi</groupId>
<artifactId>hudi-flink-bundle_${scala.binary.version}</artifactId>
<version>0.8.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hudi</groupId>
<artifactId>hudi-flink_${scala.binary.version}</artifactId>
<version>0.8.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hudi</groupId>
<artifactId>hudi-flink-client</artifactId>
<version>0.8.0</version>
<scope>provided</scope>
</dependency>
<!-- hudi end -->
5.下载hadoop和对应版本的winutils,解压两个文件之后,把winutils的bin目录下的文件复制到hadoop的bin目录下;
6.运行如下代码:
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.TableEnvironment;
public class HudiTable {
public static void main(String[] args) throws Exception {
System.setProperty("hadoop.home.dir", "E:/software/hadoop-3.3.0");
EnvironmentSettings settings = EnvironmentSettings.newInstance().inBatchMode().build();
TableEnvironment env = TableEnvironment.create(settings);
env.executeSql("CREATE TABLE t1(uuid VARCHAR(20),name VARCHAR(10),age INT,ts TIMESTAMP(3),`partition` VARCHAR(20)) PARTITIONED BY (`partition`) WITH ('connector' ='hudi','path' = 'e:/hudi','write.tasks' = '1', 'compaction.tasks' = '1', 'table.type' = 'COPY_ON_WRITE')");
//插入一条数据
env.executeSql("INSERT INTO t1 VALUES('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1')")
.print();
env.sqlQuery("SELECT * FROM t1")//结果①
.execute()
.print();
//修改数据
env.executeSql("INSERT INTO t1 VALUES('id1','Danny',24,TIMESTAMP '1970-01-01 00:00:01','par1')")
.print();
env.sqlQuery("SELECT * FROM t1")//结果②
.execute()
.print();
}
}
7.第二次插入主键为id1的数据,会更新数据,两次的结果如下:
结果①:
+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
| uuid | name | age | ts | partition |
+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
2021-05-22 18:06:24,800 INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor [.gz]
| id1 | Danny | 23 | 1970-01-01T00:00:01 | par1 |
+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
结果②:
+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
| uuid | name | age | ts | partition |
+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
2021-05-22 18:06:32,369 INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor [.gz]
| id1 | Danny | 24 | 1970-01-01T00:00:01 | par1 |
+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
8.参考资料:http://hudi.apache.org/docs/flink-quick-start-guide.html