Flink Environment
-
getExecutionEnvironment()
根据当前平台, 获取对应的执行环境, 若未设置并行度, 使用 flink-conf.yaml 中的并行度配置, 默认 1. StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
-
createLocalEnviroment()
创建本地环境, 并行度默认为 CPU 核数, 也可在构造函数中传参设置 LocalStreamEnvironment localEnvironment = StreamExecutionEnvironment.createLocalEnvironment();
-
createRemoteEnviroment()
创建远程环境, 将 jar 提交到远程环境执行 StreamExecutionEnvironment remoteEnvironment = StreamExecutionEnvironment.createRemoteEnvironment("localhost", 7777, "/home/WordCount.jar");
Flink 输入源
- 使用集合数据作为输入源
env.fromCollection(new ArrayList<>()); env.fromElements(1, 2, 3);
- 使用文件作为输入源
env.readTextFile("/home/test.txt");
- 使用消息队列作为输入源
如下, 使用 Kafka 作为输入源引入连接器依赖: <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-kafka-0.11_2.12</artifactId> <version>1.10.1</version> </dependency> env.addSource(new FlinkKafkaConsumer011<String>("sensor", new SimpleStringSchema(), properties));
- 用户自定义输入源(实现 SourceFunction 接口)
主要用于测试, 定义假数据.
具体实操代码如下:
import com.regotto.entity.SensorReading;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer011;
import java.util.Arrays;
import java.util.Properties;
import java.util.Random;
/**
* @author regotto
*/
public class SourceTest {
private static StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
private static void readFromCollectionAndElement() {
/*从集合中读取, SensorReading 自定义实体(String id, Long timestamp, Double temperature)*/
DataStream<SensorReading> dataStream = env.fromCollection(Arrays.asList(new SensorReading("1", 1111L, 35.1)