Flink 各层 API 的用法支持

最新推荐文章于 2022-06-29 23:06:33 发布

程序了个猴

最新推荐文章于 2022-06-29 23:06:33 发布

阅读量745

点赞数

分类专栏：大数据 Flink 文章标签： flink

本文链接：https://blog.csdn.net/yym373872996/article/details/105682078

版权

本文详细介绍了Flink的DataSet、DataStream、Table和SQL API，包括数据来源、数据目标、转换操作以及各种连接器。对于每个API，都列出了具体的Source、Sink和Transformation，并提到了如Kafka、Cassandra、Elasticsearch等连接器的支持。

摘要由CSDN通过智能技术生成

DataSet API

Source - 数据来源

InputFormat

CRowValuesInputFormat
CollectionInputFormat
CsvInputFormat
IteratorInputFormat
JDBCInputFormat
ParallelIteratorInputFormat
PojoCsvInputFormat
PrimitiveInputFormat
ReplicatingInputFormat
RowCsvInputFormat
SerializedInputFormat
TextInputFormat
TextValueInputFormat
TupleCsvInputFormat
TypeSerializerInputFormat
ValuesInputFormat

Sink - 数据目标

OutputFormat

BlockingShuffleOutputFormat
CsvOutputFormat
DiscardingOutputFormat
JDBCOutputFormat
JDBCUpsertOutputFormat
LocalCollectionOutputFormat
PrintingOutputFormat
ScalaCsvOutputFormat
SerializedOutputFormat
TextOutputFormat
TypeSerializerOutputFormat

Transformation - 转换

DataStream API

Source - 数据来源

File 类型

readTextFile
readFile

Socket 类型

socketTextStream

Collection 类型

fromCollection
fromElements
fromParallelCollection
generateSequence

自定义

addSource - 调用 SourceFunction 的实现类实现自定义数据来源

Sink - 数据目标

Print

print
printToErr

File 类型

writeAsText
writeAsCsv

Socket 类型

writeToSocket

自定义

addSink - 调用 SinkFunction 的实现类实现自定义数据目标

Connector - 连接器

除了基础的 Source 和 Sink 编程接口，Flink 提供一些多样化的第三方系统专用交互支持，也就是连接器。连接器可以同时支持 Souce 和 Sink 的相关功能。
目前提供的系统支持如下：

Apache Kafka (source/sink)
Apache Cassandra (sink)
Amazon Kinesis Streams (source/sink)
Elasticsearch (sink)
Hadoop FileSystem (sink)
RabbitMQ (source/sink)
Apache NiFi (source/sink)
Twitter Streaming API (source)
Google PubSub (source/sink)

还有一部分连接器通过 Apache Bahir 发布，包括：

Apache ActiveMQ (source/sink)
Apache Flume (sink)
Redis (sink)
Akka (sink)
Netty (source)

Transformation - 转换

Table API

Source - 数据来源

TableSource

CsvTableSource

// create Hadoop Configuration
Configuration config = new Configuration();

OrcTableSource orcTableSource = OrcTableSource.builder()
  // path to ORC file(s). NOTE: By default, directories are recursively scanned.
  .path("file:///path/to/data")
  // schema of ORC files
  .forOrcSchema("struct<name:string,addresses:array<struct<street:string,zip:smallint>>>")
  // Hadoop configuration
  .withConfiguration(config)
  // build OrcTableSource
  .build();