Flink笔记

使用Flink的四个步骤

创建执行环境

  • Flink支持 批处理流处理,两者创建执行环境的API是不一样的,创建批处理env的代码如下:
val env = ExecutionEnvironment.getExecutionEnvironment

创建流处理env的代码如下:

val env = StreamExecutionEnvironment.getExecutionEnvironment

添加Source

  • 基于集合的source
val stream = env.fromCollection(List(
     TemperatureRecord("d1", 25.5, 100),
     TemperatureRecord("d2", 25.5, 100),
     TemperatureRecord("d3", 25.5, 100),
     TemperatureRecord("d4", 25.5, 100),
     TemperatureRecord("d5", 25.5, 100),
     TemperatureRecord("d6", 25.5, 100),
     TemperatureRecord("d7", 25.5, 100)
   ))
  • 基于文件的source
val stream = env.readTextFile("""/source.txt""")
  • 基于网络套接字的source
val dataStream = env.socketTextStream("192.168.1.101", 7777)
  • 自定义的source
    kafka source
val properties = new Properties()
properties.setProperty("bootstrap.servers", "192.168.1.101:9092")
properties.setProperty("zookeeper.connect", "192.168.1.101:2181")
properties.setProperty("group.id", "kafkaStreamTest")

val kafka11 = new FlinkKafkaConsumer011[String]("kafkaStreamTest", new SimpleStringSchema(), properties)

val stream = env.addSource(kafka11)

自定义测试source

class TestSourceFunction extends SourceFunction[String] {
  var running = true

  override def run(sourceContext: SourceFunction.SourceContext[String]): Unit = {

    var i = 0
    while (running) {
      i += 1
      sourceContext.collect(i + " " + "v_" + Random.nextInt(5))
      Thread.sleep(1000)
    }
  }

  override def cancel(): Unit = {
    running = false
  }

}

Transform

Flink中的tranform可以类比spark中的transform,也是进行转换的API,但细节可能不同

Transformation转换描述
mapDataStream → DataStream取一个元素并产生一个元素
flatMapDataStream → DataStream取一个元素并产生0或多个元素
filterDataStream → DataStream对数据流进行过滤
keyByDataStream → KeyedStream类似数据库中的group by
reduceKeyedStream → DataStream有一个泛型T,接收两个T类型的参数,返回一个T类型的参数,返回值作为下一次执行的第一个参数,第二个参数是数据流里面的数据
foldKeyedStream → DataStream已弃用类似reduce,但接收两个泛型,进出流的泛型可以不一样
AggregationsKeyedStream → DataStream对KeyedStream进行聚合,包含sum,min,max,minBy,maxBy
unionDataStream*→ DataStream连接 多个同类型的数据流
connectDataStream,DataStream → ConnectedStreams连接两个数据流,这两个数据流的类型可以不相同
splitDataStream → SplitStream根据某些标准将流分成两个或多个流
selectSplitStream → DataStream从拆分流中选择一个或多个流
windowKeyedStream → WindowedStream在已经分区的KeyedStreams上定义Window
timeWindowKeyedStream → WindowedStream在已经分区的KeyedStreams上定义时间Window
countWindowKeyedStream → WindowedStream在已经分区的KeyedStreams上定义计数Window
windowAllDataStream → WindowedStream在普通的DataStreams上定义Window

Window api待更

以上操作都支持传入一个自定义的函数类,如MapFunction、RichMapFunction

Sink

Flink的Sink类似Spark中的action,主要用于数据的输出
常见的Sink

  • kafka Sink
    首先,增加依赖
<dependency>
   <groupId>org.apache.flink</groupId>
   <artifactId>flink-connector-kafka-0.10_2.11</artifactId>
   <version>1.3.1</version>
</dependency>

主程序

val env = StreamExecutionEnvironment.getExecutionEnvironment
//TestObjectSourceFunction是自定义的测试Source
val stream = env.addSource(new TestObjectSourceFunction)
val producer = new FlinkKafkaProducer011[String]("localhost:9092", "test", new SimpleStringSchema())
stream.map(_.id)
  .addSink(producer)
env.execute("kafkaSinkTest")
  • es Sink
  • 首先,增加依赖
 <dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-elasticsearch6_2.11</artifactId>
    <version>1.7.2</version>
</dependency>

主程序

val env = StreamExecutionEnvironment.getExecutionEnvironment
//TestObjectSourceFunction是默认的测试Source
val stream = env.addSource(new TestObjectSourceFunction)

val httpHosts = new util.ArrayList[HttpHost]
httpHosts.add(new HttpHost("localhost", 9200))

val esSink = new ElasticsearchSink.Builder[ApplyInfo](httpHosts, new ElasticsearchSinkFunction[ApplyInfo] {
  override def process(item: ApplyInfo, ctx: RuntimeContext, requestIndexer: RequestIndexer): Unit = {
	val esSource = new util.HashMap[String, String]()
	esSource.put("id", item.id)
	esSource.put("areaCode", item.areaCode)
	val req = Requests.indexRequest("applyInfo").`type`("applyInfo").source(esSource)
	requestIndexer.add(req)
  }
}).build()

stream.addSink(esSink)
env.execute("esSinkTest")
  • 自定义Sink
    实现自定义的Sink Function即可,一般实现RichSinkFunction会提供更丰富的功能
val env = StreamExecutionEnvironment.getExecutionEnvironment
//TestObjectSourceFunction是默认的测试Source
val stream = env.addSource(new TestObjectSourceFunction)
stream.addSink(new RichSinkFunction[ApplyInfo] {
  var out: OutputStream = _

  override def open(parameters: Configuration): Unit = {
	out = new FileOutputStream("/tmp/customSink.txt")
  }

  override def invoke(value: ApplyInfo, context: SinkFunction.Context[_]): Unit = {
	out.write((value.id + "," + value.areaCode + "\r\n").getBytes(StandardCharsets.UTF_8))
  }

  override def close(): Unit = {
	out.close()
  }

})
env.execute("esSinkTest")
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值