一、读取数据
1.1、从内存中读取
DataStreamSource<Integer> ds = env.fromElements(1, 2, 3, 4);
DataStreamSource<Integer> source = env.fromCollection(Arrays.asList(1, 2, 3));
1.2、从文件中读取
从文件中读取需要引入相应的POM依赖
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-files</artifactId>
<version>1.17.0</version>
</dependency>
FileSource<String> fileSource = FileSource.forRecordStreamFormat(new TextLineInputFormat(), new Path("input/word.txt")).build();
env.fromSource(fileSource, WatermarkStrategy.noWatermarks(), "filesource").print();
1.3、从kafka中读取
从kafka中读取需要引入相应的POM依赖
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka</artifactId>
<version>1.17.0</version>
</dependency>
KafkaSource<String> dataSource = KafkaSource.<String>builder()
.setBootstrapServers("hadoop1,hadoop2")
.setGroupId("消费者组")
.setTopics("队列")
.setValueOnlyDeserializer(new SimpleStringSchema())
.setStartingOffsets(OffsetsInitializer.latest())
.build();
1.4、使用datagen生成数据
官方提供的数据生成方式
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-datagen</artifactId>
<version>1.17.0</version>
</dependency>
DataGeneratorSource<String> dataGeneratorSource = new DataGeneratorSource<>(
// 数据
aLong -> "Number:" + aLong,
// 数据条数
10,
// 数据生成频率
RateLimiterStrategy.perSecond(1),
// 返回的数据类型
Types.STRING);
env.fromSource(dataGeneratorSource, WatermarkStrategy.noWatermarks(), "data-generator").print();