Java版Flink(五)source源

一、source from collection

1.1、数据源类

public class SensorReading {
    // 传感器 id
    private String id;
    // 传感器时间戳
    private Long timestamp;
    // 传感器温度
    private Double temperature;

    public SensorReading() {
    }

    public SensorReading(String id, Long timestamp, Double temperature) {
        this.id = id;
        this.timestamp = timestamp;
        this.temperature = temperature;
    }

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public Long getTimestamp() {
        return timestamp;
    }

    public void setTimestamp(Long timestamp) {
        this.timestamp = timestamp;
    }

    public Double getTemperature() {
        return temperature;
    }

    public void setTemperature(Double temperature) {
        this.temperature = temperature;
    }

    @Override
    public String toString() {
        return "SensorReading{" +
                "id='" + id + '\'' +
                ", timestamp=" + timestamp +
                ", temperature=" + temperature +
                '}';
    }
}

1.2、读取数据

import com.tan.flink.bean.SensorReading;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import java.util.Arrays;

public class SourceFromCollection {
    public static void main(String[] args) throws Exception{
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        DataStream<SensorReading> inputDataStream = env.fromCollection(Arrays.asList(
                new SensorReading("sensor_1", 1547718199L, 35.8),
                new SensorReading("sensor_6", 1547718201L, 15.4),
                new SensorReading("sensor_7", 1547718202L, 6.7),
                new SensorReading("sensor_10", 1547718205L, 38.1)
        ));
        inputDataStream.print();
        env.execute();
    }
}

二、source from file

env.readTextFile(path);

三、source from kafka

3.1、pom 依赖

<dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka-0.11_2.12</artifactId>
            <version>1.10.1</version>
        </dependency>

3.2、代码

import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer011;
import java.util.Properties;

public class SourceFromKafka {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // kafka 配置
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers", "192.168.200.102:9092,192.168.200.102:9092,192.168.200.104:9092");
        properties.setProperty("group.id", "flink-kafka");
        properties.setProperty("key.deserializer",
                "org.apache.kafka.common.serialization.StringDeserializer");
        properties.setProperty("value.deserializer",
                "org.apache.kafka.common.serialization.StringDeserializer");
        properties.setProperty("auto.offset.reset", "latest");

        DataStreamSource<String> inputDataStream = env.addSource(new FlinkKafkaConsumer011<String>(
                "sensor",
                new SimpleStringSchema(),
                properties
        ));

        inputDataStream.print();
        env.execute();
    }
}

四、custom source

需要实现SourceFunction 或者继承SourceFunction的富函数RichSourceFunction

import com.tan.flink.bean.SensorReading;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import java.util.Random;
import java.util.UUID;

public class SourceFromCustom {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        DataStreamSource<SensorReading> inputDataStream = env.addSource(new CustomSource());
        inputDataStream.print();
        env.execute();
    }

    public static class CustomSource implements SourceFunction<SensorReading> {
        boolean running = true;
        @Override
        public void run(SourceContext<SensorReading> sourceContext) throws Exception {

            Random random = new Random();
            while (running) {
                // 每隔 100 秒数据
                for (int i = 0; i < 5; i++) {
                    String id = UUID.randomUUID().toString().substring(0, 8);
                    long timestamp = System.currentTimeMillis();
                    double temperature = 60 + random.nextGaussian() * 20;
                    sourceContext.collect(new SensorReading(id, timestamp, temperature));

                    Thread.sleep(100L);
                }

                Thread.sleep(1000L);
            }
        }

        @Override
        public void cancel() {
            running = false;
        }
    }
}
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
Flink是一个流式处理引擎,它支持从各种数据(例如Kafka、Kinesis、HDFS等)读取数据,并将数据处理并输出到各种目标系统(例如HDFS、Elasticsearch等)。Flink提供了一组内置的Source,如KafkaSource和SocketTextStreamSource,可以轻松地与常见的数据进行交互。除此之外,Flink还提供了一种自定义Source的方式,让用户可以轻松地从自己的数据中读取数据。 自定义Source的步骤如下: 1.实现SourceFunction接口 在Flink中,自定义Source需要实现SourceFunction接口。SourceFunction是所有自定义Source的基类,它定义了两个方法:run和cancel。run方法中包含了执行自定义Source的逻辑,cancel方法用于取消任务。 ```java public interface SourceFunction<T> extends Function, Serializable { void run(SourceContext<T> ctx) throws Exception; void cancel(); } ``` 2.实现run方法 在run方法中,应该包含从自定义数据中读取数据的逻辑。Flink提供了SourceContext接口,可以使用它将数据发送到下游算子中。 ```java public interface SourceContext<T> { void collect(T element); void collectWithTimestamp(T element, long timestamp); void emitWatermark(Watermark mark); Object getCheckpointLock(); void close(); } ``` 例如,以下示例代码从自定义数据中读取整数,并将它们发送到下游算子中: ```java public class CustomSource implements SourceFunction<Integer> { private volatile boolean isRunning = true; @Override public void run(SourceContext<Integer> ctx) throws Exception { while (isRunning) { int number = // 从自定义数据读取数据 ctx.collect(number); } } @Override public void cancel() { isRunning = false; } } ``` 3.添加自定义SourceFlink程序中 一旦自定义Source已经实现,就可以将它添加到Flink程序中。以下示例代码展示了如何将自定义Source添加到Flink程序中: ```java StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream<Integer> dataStream = env.addSource(new CustomSource()); ``` 在这个例子中,我们使用DataStream API将自定义Source添加到Flink程序中,并将其转换为DataStream对象。 4.配置自定义Source 用户可以通过调用DataStream API中的各种方法来配置自定义Source。例如,可以使用setParallelism方法设置并行度,使用setUid方法设置唯一标识符等。 例如,以下示例代码展示了如何设置自定义Source的并行度: ```java StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream<Integer> dataStream = env.addSource(new CustomSource()).setParallelism(2); ``` 在这个例子中,我们将自定义Source的并行度设置为2。 5.启动Flink程序 配置完成后,就可以启动Flink程序了。Flink程序将从自定义Source中读取数据,并将其发送到下游算子中进行处理。 参考文献: - https://ci.apache.org/projects/flink/flink-docs-release-1.13/dev/datastream_api.html#sources - https://ci.apache.org/projects/flink/flink-docs-release-1.13/dev/datastream_api.html#transformations-on-datastreams
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

qq_41311979

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值