搭建Flink和Hudi本地开发环境

1.版本说明:Flink : 1.2.x, Hudi : 0.8.x, Scala : 2.11;

2.创建Flink项目,参考文章

3.pom文件引入对table和sql的支持包,如下:

<!-- table start -->
<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-table-api-java-bridge_${scala.binary.version}</artifactId>
	<version>${flink.version}</version>
	<scope>provided</scope>
</dependency>
<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-table-planner-blink_${scala.binary.version}</artifactId>
	<version>${flink.version}</version>
	<scope>provided</scope>
</dependency>
<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
	<version>${flink.version}</version>
	<scope>provided</scope>
</dependency>
<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-table-common</artifactId>
	<version>${flink.version}</version>
	<scope>provided</scope>
</dependency>
<!-- table end -->

4.pom文件引入对hudi的支持包,如下:

<!-- hudi start -->
<dependency>
	<groupId>org.apache.hudi</groupId>
	<artifactId>hudi-flink-bundle_${scala.binary.version}</artifactId>
	<version>0.8.0</version>
	<scope>provided</scope>
</dependency>
<dependency>
	<groupId>org.apache.hudi</groupId>
	<artifactId>hudi-flink_${scala.binary.version}</artifactId>
	<version>0.8.0</version>
	<scope>provided</scope>
</dependency>
<dependency>
	<groupId>org.apache.hudi</groupId>
	<artifactId>hudi-flink-client</artifactId>
	<version>0.8.0</version>
	<scope>provided</scope>
</dependency>
<!-- hudi end -->

5.下载hadoop对应版本的winutils,解压两个文件之后,把winutils的bin目录下的文件复制到hadoop的bin目录下;

6.运行如下代码:

import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.TableEnvironment;

public class HudiTable {

	public static void main(String[] args) throws Exception {
		System.setProperty("hadoop.home.dir", "E:/software/hadoop-3.3.0");
		EnvironmentSettings settings = EnvironmentSettings.newInstance().inBatchMode().build();
		TableEnvironment env = TableEnvironment.create(settings);
		env.executeSql("CREATE TABLE t1(uuid VARCHAR(20),name VARCHAR(10),age INT,ts TIMESTAMP(3),`partition` VARCHAR(20)) PARTITIONED BY (`partition`) WITH ('connector' ='hudi','path' = 'e:/hudi','write.tasks' = '1', 'compaction.tasks' = '1', 'table.type' = 'COPY_ON_WRITE')");
		
		//插入一条数据
		env.executeSql("INSERT INTO t1 VALUES('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1')")
			.print();
		env.sqlQuery("SELECT * FROM t1")//结果①
			.execute()
			.print();
		
		//修改数据
		env.executeSql("INSERT INTO t1 VALUES('id1','Danny',24,TIMESTAMP '1970-01-01 00:00:01','par1')")
			.print();
		env.sqlQuery("SELECT * FROM t1")//结果②
			.execute()
			.print();
	}

}

7.第二次插入主键为id1的数据,会更新数据,两次的结果如下:

结果①:

+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
|                           uuid |                           name |         age |                      ts |                      partition |
+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
2021-05-22 18:06:24,800 INFO  org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor [.gz]
|                            id1 |                          Danny |          23 |     1970-01-01T00:00:01 |                           par1 |
+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+

结果②:

+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
|                           uuid |                           name |         age |                      ts |                      partition |
+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
2021-05-22 18:06:32,369 INFO  org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor [.gz]
|                            id1 |                          Danny |          24 |     1970-01-01T00:00:01 |                           par1 |
+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+

8.参考资料:http://hudi.apache.org/docs/flink-quick-start-guide.html

  • 0
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
Apache Flink 和 Apache Hudi 都是 Apache 软件基金会的开源项目,它们都是处理大规模数据的工具。Apache Flink 是一个分布式流处理引擎,而 Apache Hudi 是一个分布式数据湖,可以实现数据仓库中数据的更新、删除和插入。 要集成 Apache Flink 和 Apache Hudi,可以按照以下步骤进行操作: 1.下载 Apache Flink 和 Apache Hudi,将它们解压到本地文件夹。 2.启动 Apache Flink 集群。可以使用以下命令启动: ``` ./bin/start-cluster.sh ``` 3.启动 Apache Hudi。可以使用以下命令启动: ``` ./bin/start-hoodie.sh ``` 4.在代码中使用 Apache Flink 和 Apache Hudi。可以使用以下代码示例: ```java import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.api.functions.source.SourceFunction; import org.apache.hudi.client.HoodieWriteClient; import org.apache.hudi.client.WriteStatus; import org.apache.hudi.client.common.HoodieFlinkEngineContext; import org.apache.hudi.client.common.HoodieSparkEngineContext; import org.apache.hudi.common.model.HoodieTableType; import org.apache.hudi.common.util.CommitUtils; import org.apache.hudi.common.util.ReflectionUtils; import org.apache.hudi.common.util.TypedProperties; import org.apache.hudi.common.util.ValidationUtils; import org.apache.hudi.flink.HoodieFlinkWriteConfiguration; import org.apache.hudi.flink.HoodieFlinkWriter; import org.apache.hudi.flink.HoodieFlinkWriterFactory; import org.apache.hudi.flink.source.StreamReadOperator; import org.apache.hudi.flink.utils.CollectSink; import org.apache.hudi.flink.utils.TestConfigurations; import org.apache.hudi.flink.utils.TestData; import org.apache.hudi.flink.utils.TestDataGenerator; import org.apache.hudi.streamer.FlinkStreamer; import org.apache.kafka.clients.consumer.ConsumerRecord; import java.util.List; import java.util.Properties; public class FlinkHudiIntegrationExample { public static void main(String[] args) throws Exception { // set up the streaming execution environment final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(1); // create a Kafka source SourceFunction<ConsumerRecord<String, String>> kafkaSource = KafkaSource.<String, String>builder() .setBootstrapServers("localhost:9092") .setTopics("test_topic") .build(); // create a Hudi sink TypedProperties properties = new TypedProperties(); properties.setProperty("hoodie.datasource.write.recordkey.field", "id"); properties.setProperty("hoodie.datasource.write.partitionpath.field", "ts"); properties.setProperty("hoodie.table.name", "test_table"); properties.setProperty("hoodie.table.type", HoodieTableType.COPY_ON_WRITE.name()); properties.setProperty("hoodie.datasource.write.keygenerator.class", ReflectionUtils.loadClass( "org.apache.hudi.keygen.SimpleKeyGenerator").getName()); properties.setProperty("hoodie.datasource.write.payload.class", ReflectionUtils.loadClass( "org.apache.hudi.example.data.SimpleJsonPayload").getName()); properties.setProperty("hoodie.datasource.write.hive_style_partitioning", "true"); HoodieFlinkWriteConfiguration writeConfig = HoodieFlinkWriteConfiguration.newBuilder() .withProperties(properties) .build(); HoodieFlinkWriter<ConsumerRecord<String, String>> hudiSink = HoodieFlinkWriterFactory.<ConsumerRecord<String, String>>newInstance() .writeConfig(writeConfig) .withEngineContext(new HoodieFlinkEngineContext(env)) .build(); // add the Kafka source and Hudi sink to the pipeline env.addSource(kafkaSource) .map(new StreamReadOperator()) .addSink(hudiSink); // execute the pipeline env.execute("Flink Hudi Integration Example"); } } ``` 这个代码示例展示了如何在 Apache Flink 中使用 Apache Hudi。它使用 Kafka 作为数据源,将数据写入到 Hudi 表中。 以上就是集成 Apache Flink 和 Apache Hudi 的步骤。需要注意的是,集成过程中可能会遇到一些问题,需要根据具体情况进行解决。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

画蛇添足

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值