Flink DataStream API之Data Sources

1、介绍

基于文件 

   readTextFile(path) 读文本文件,文件遵循TextInputFormat 读取规则,逐行读取并返回。

基于socket

  socketTextStream  从socker中读取数据,元素可以通过一个分隔符切开。

基于集合

  fromCollection(Collection) 通过java 的collection集合创建一个数据流,集合中的所有元素必须是相同类型的。

自定义输入

addSource 可以实现读取第三方数据源的数据

另外,系统内置提供了一批connectors,连接器会提供对应的source支持【如kafka

2、内置Connectors

Apache Kafka (source/sink)

Apache Cassandra (sink)

Elasticsearch (sink)

Hadoop FileSystem (sink)

RabbitMQ (source/sink)

Apache ActiveMQ (source/sink)

Redis (sink)

3、Source 容错性保证

Source

语义保证

备注

kafka

exactly once(仅一次)

建议使用0.10及以上

Collections

exactly once

 

Files

exactly once

 

Socktes

at most once

 

4、自定义source

实现并行度为1的自定义source(实现SourceFunction)

package com.flink.learn.custom;

import org.apache.flink.streaming.api.functions.source.SourceFunction;

public class NoParalleSource implements SourceFunction<Long> {
    private long count =1;
    private boolean isRun = true;
    @Override
    public void run(SourceContext<Long> ctx) throws Exception {
        while (isRun) {
            ctx.collect(count++);
            Thread.sleep(1000);
        }
    }

    @Override
    public void cancel() {
        isRun = false;
    }
}

实现并行化的自定义source(实现ParallelSourceFunction )

import org.apache.flink.streaming.api.functions.source.ParallelSourceFunction;

public class ParalleSource implements ParallelSourceFunction<Long> {
    private long count = 1;
    private boolean isRun = true;

    @Override
    public void run(SourceContext<Long> ctx) throws Exception {
        while (isRun) {
            ctx.collect(count++);
            Thread.sleep(1000);
        }
    }

    @Override
    public void cancel() {
        isRun = false;
    }
}

实现并行化的自定义source(继承RichParallelSourceFunction,open方法在初始化时调用,并且只调用一次)

import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction;

public class RichParalleSource extends RichParallelSourceFunction<Long> {
    private long count = 1;
    private boolean isRun = true;

    @Override
    public void run(SourceContext<Long> ctx) throws Exception {
        while (isRun) {
            ctx.collect(count++);
            Thread.sleep(1000);
        }
    }

    @Override
    public void cancel() {
        isRun = false;
    }

    @Override
    public void open(Configuration parameters) throws Exception {
        System.out.println("open.............");
        super.open(parameters);
    }

    @Override
    public void close() throws Exception {
        super.close();
    }
}

自定义source测试

import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;

public class TestCustomSource {

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
         DataStream<Long> text = env.addSource(new NoParalleSource());
        // DataStream<Long> text = env.addSource(new ParalleSource()).setParallelism(2);
        // DataStream<Long> text = env.addSource(new RichParalleSource()).setParallelism(2);
        SingleOutputStreamOperator<Long> num = text.map(new MapFunction<Long, Long>() {
            @Override
            public Long map(Long value) throws Exception {
                System.out.println("this out:" + value);
                return value;
            }
        });
        SingleOutputStreamOperator<Long> sum = num.timeWindowAll(Time.seconds(2)).sum(0);
        sum.print();
        String name = TestCustomSourceScala.class.getSimpleName();
        env.execute(name);
    }
}

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值