Flink代码架构

最新推荐文章于 2024-06-26 00:14:13 发布

深夜的星星

最新推荐文章于 2024-06-26 00:14:13 发布

阅读量295

点赞数

文章标签：大数据 flink

本文链接：https://blog.csdn.net/DataJunGe/article/details/105346186

版权

Flink代码架构篇

Quick Start

maven 依赖

<dependencies>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-clients_2.11</artifactId>
        <version>1.8.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-core</artifactId>
        <version>1.8.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-streaming-scala_2.11</artifactId>
        <version>1.8.1</version>
    </dependency>
</dependencies>
<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <configuration>
                <source>7</source>
                <target>7</target>
            </configuration>
        </plugin>
        <plugin>
            <groupId>net.alchim31.maven</groupId>
            <artifactId>scala-maven-plugin</artifactId>
            <version>4.0.1</version>
            <executions>
                <execution>
                    <id>scala-compile-first</id>
                    <phase>process-resources</phase>
                    <goals>
                        <goal>add-source</goal>
                        <goal>compile</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
        <plugin>
            <!-- maven 打包插件 打原始jar包 第三方依赖打入jar包中-->
            <artifactId>maven-assembly-plugin</artifactId>
            <configuration>
                <archive>
                    <manifest>
                        <mainClass></mainClass>
                    </manifest>
                    <manifestEntries>
                        <Class-Path>.</Class-Path>
                    </manifestEntries>
                </archive>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
            <executions>
                <execution>
                    <id>make-assembly</id> <!-- this is used for inheritance merges -->
                    <phase>package</phase> <!-- 指定在打包节点执行jar包合并操作 -->
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

案例代码

import org.apache.flink.streaming.api.scala._

//1.创建 StreamExecutionEnvironment
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment

//2.指定输入Source - 细化 
fsEnv.socketTextStream("HadoopNode00",9999)

//3.Flink常见 Operators -细化
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.keyBy(0)
.sum(1)

//4.指定输出Sink - 细化
.print()

//5.执行流计算
fsEnv.execute("FlinkWordCounts")

StreamExecutionEnvironment

//仅仅用于本地测试，需要用户指定并行度
val fsEnv = StreamExecutionEnvironment.createLocalEnvironment(3)

//跨平台提交，交叉测试
var jarFiles="flink\\target\\flink-1.0-SNAPSHOT.jar"
val fsEnv = StreamExecutionEnvironment.createRemoteEnvironment("HadoopNode00",8081,3,jarFiles )

//该模式可以自动识别是生产环境还是IDE集成开发环境
//如果是IDE环境，自动识别系统CPU核数，作为系统化测试并行度
//如果是生产环境下需要指定--parallelism指定并行度
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment

DataSource

是程序读入数据的地方，一般通过调用fsEnv#addSource完成Source的指定工作。Flink内置一些SourceFunction以便测试使用，同时Flink也允许用户自定义Source，自定义Source一般通过实现SourceFunction（非并行实现）或者实现ParallelSourceFunction|RichParallelSourceFunction `实现并行Source。

File-based
readTextFile(path) - Reads text files, i.e. files that respect the TextInputFormat specification, line-by-line and returns them as Strings.- 类似于批处理，仅仅运算一次

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.readTextFile("file:///D:/demo")
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.keyBy(0)
.sum(1)
.print()

fsEnv.execute("FlinkWordCountsProductEnv")

readFile(fileInputFormat, path) - Reads (once) files as dictated by the specified file input format.

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
var inoutFormat= new TextInputFormat(null)
var filePath="file:///D:/demo"
fsEnv.readFile(inoutFormat,filePath)
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.keyBy(0)
.sum(1)
.print()

fsEnv.execute("FlinkWordCountsReadFile")

readFile(fileInputFormat, path, watchType, interval, pathFilter) - This is the method called internally by the two previous ones. It reads files in the path based on the given fileInputFormat. Depending on the provided watchType, this source may periodically monitor (every interval ms) the path for new data (FileProcessingMode.PROCESS_CONTINUOUSLY), or process once the data currently in the path and exit (FileProcessingMode.PROCESS_ONCE). Using the pathFilter, the user can further exclude files from being processed.

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
var inoutFormat= new TextInputFormat(null)
var filePath="file:///D:/demo"
inoutFormat.setFilesFilter(new FilePathFilter {
    override def filterPath(filePath: Path): Boolean = {
        filePath.getName.endsWith(".txt")//过滤掉.txt结尾的文件
    }
})
fsEnv.readFile(inoutFormat,filePath,FileProcessingMode.PROCESS_CONTINUOUSLY,1000)
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.keyBy(0)
.sum(1)
.print()

fsEnv.execute("FlinkWordCountsReadFile")

如果检测文件内容发生变化，Flink会重新计算整个文件，导致结果重复。

Socket-based

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment

fsEnv.addSource[String](new SocketTextStreamFunction("HadoopNode00",9999,"\n",3))
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.keyBy(0)
.sum(1)
.print()

fsEnv.execute("FlinkWordCountsSocketBased")

Collection-based

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment

fsEnv.fromCollection(Array("this is a demo","my name is jimi"))
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.keyBy(0)
.sum(1)
.print()

fsEnv.execute("FlinkWordCountsCollection")

UserDefineSource

object FlinkWordCountsSourceFunction {
    def main(args: Array[String]): Unit = {
        val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment

        fsEnv.addSource[String](new UserDefineSourceFunction)
        .flatMap(line=>line.split("\\s+"))
        .map(word=>(word,1))
        .keyBy(0)
        .sum(1)
        .print()
        
        fsEnv.execute("FlinkWordCountsCollection")
    }
}
class UserDefineSourceFunction extends SourceFunction[String]{
    @volatile //禁止线程拷贝该变量
    private var isRunning = true
    var messages=Array("this is a demo","hello world")
    override def run(ctx: SourceFunction.SourceContext[String]): Unit = {
        while(isRunning){
            Thread.sleep(1000)
            val randomIndex = new Random().nextInt(messages.length)
            ctx.collect(messages(randomIndex))//将结果输出给下游
        }
    }

    override def cancel(): Unit = {
        println("===cancel===")
        isRunning=false
    }
}

Kafka

0.11.0.0FlinkKafkaConsumer011|FlinkKafkaProducer011

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-kafka-0.11_2.11</artifactId>
    <version>1.8.1</version>
</dependency>

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment

val props = new Properties()
props.setProperty("bootstrap.servers", "hadoopnode00:9092")
props.setProperty("group.id", "flink")
fsEnv.addSource[String](new FlinkKafkaConsumer011("flink_kafka",new SimpleStringSchema(),props))

    .flatMap(line=>line.split("\\s+"))
    .map(word=>(word,1))
    .keyBy(0)
    .sum(1)
    .print()

fsEnv.execute("FlinkWordCountsCollection")

1.0.0+FlinkKafkaConsumer|FlinkKafkaProducer

<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-connector-kafka_2.11</artifactId>
  <version>1.8.1</version>
</dependency>

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment

val props = new Properties()
props.setProperty("bootstrap.servers", "hadoopnode00:9092")
props.setProperty("group.id", "flink")
fsEnv.addSource[String](new FlinkKafkaConsumer("flink_kafka",new SimpleStringSchema(),props))

.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.keyBy(0)
.sum(1)
.print()

fsEnv.execute("FlinkWordCountsCollection")

获取key信息

class UserDefineKeyedDeserializationSchema extends KafkaDeserializationSchema[(String,String,Int,Long,Long)]{

  override def isEndOfStream(t: (String, String, Int, Long, Long)): Boolean = false

  override def deserialize(consumerRecord: ConsumerRecord[Array[Byte], Array[Byte]]): (String, String, Int, Long, Long) = {
    var value:String=null
    var key:String=null
    if(consumerRecord.value()!=null){
      value=new String(consumerRecord.value())
    }
    if(consumerRecord.key()!=null){
      key=new String(consumerRecord.key())
    }
    var partition=consumerRecord.partition()
    var offset=consumerRecord.offset()
    var timestamp=consumerRecord.timestamp()
    (value,key,partition,offset,timestamp)
  }

  override def getProducedType: TypeInformation[(String, String, Int, Long, Long)] = {
    createTypeInformation[(String, String, Int, Long, Long)]
  }
}

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment

val props = new Properties()
props.setProperty("bootstrap.servers", "hadoopnode00:9092")
props.setProperty("group.id", "flink")
fsEnv.addSource[(String,String,Int,Long,Long)](new FlinkKafkaConsumer("flink_kafka",new UserDefineKeyedDeserializationSchema(),props))

.print()

fsEnv.execute("FlinkWordCountsKafkaConsumer02")

解析Kafka中json数据

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment

val props = new Properties()
props.setProperty("bootstrap.servers", "hadoopnode00:9092")
props.setProperty("group.id", "flink")
fsEnv.addSource[ObjectNode](new FlinkKafkaConsumer011("flink_kafka",new JSONKeyValueDeserializationSchema(true),props))
.map(on=>(on.get("value").get("name").asText(),on.get("value").get("id").asInt(),on.get("metadata")))
.print()

fsEnv.execute("FlinkWordCountsKafkaConsumer04")
}

如果没有，必须考虑自定义Source

DataSink

DataSink负责消费Datastream中数据，然后将数据写入到网络/消息队列/数据库/文件等外围系统。flink预定义了一些常用的DataSink，同时也允许用户自定义DataSink通过实现SinkFunction或者是RichSinkFunction

write
writeAsText()/writeAsCsv/writeUsingOutputFormat/writeToSocket）——不做要求

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment

val props = new Properties()
props.setProperty("bootstrap.servers", "hadoopnode00:9092")
props.setProperty("group.id", "flink")
fsEnv.addSource[String](new FlinkKafkaConsumer011("flink_kafka",new SimpleStringSchema(),props))

.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.keyBy(0)
.sum(1)
.writeAsText("file:///D:/flink/results",WriteMode.OVERWRITE)

fsEnv.execute("FlinkWordCountsCollection")

由于测试，该输出到目标文件系统可能存在延迟。一般用于测试。

BucketingSink

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-filesystem_2.11</artifactId>
    <version>1.8.1</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.6.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.6.0</version>
</dependency>

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
var inoutFormat= new TextInputFormat(null)
var filePath="file:///D:/demo"

val bucketingSink = new BucketingSink[(String,Int)]("hdfs://HadoopNode00:9000/BucketingSink")
bucketingSink.setBucketer(new DateTimeBucketer[(String,Int)]("yyyy-MM-dd--HHmm", ZoneId.of("Asia/Shanghai")))
bucketingSink.setBatchSize(1024 * 1024 * 128); // this is 400 MB,
bucketingSink.setBatchRolloverInterval(20 * 60 * 1000)
bucketingSink.setWriter(new StringWriter[(String,Int)]());
fsEnv.readFile(inoutFormat,filePath)
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.keyBy(0)
.sum(1)
.addSink(bucketingSink)

fsEnv.execute("FlinkWordCountsBucketingSink")

UserDefineSink

class UserDefineSinkFunction(addationalKey:String) extends RichSinkFunction[(String,Int)]{
  var pool:JedisPool=_
  var resource:Jedis = _
  override def open(parameters: Configuration): Unit = {
    pool=new JedisPool("HadoopNode00",6379)
    resource=pool.getResource
  }

  override def invoke(value: (String, Int), context: SinkFunction.Context[_]): Unit = {

    resource.hset(addationalKey,value._1,value._2.toString)

  }

  override def close(): Unit = {
    resource.close()
    pool.close()
  }
}

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
var inoutFormat= new TextInputFormat(null)
var filePath="file:///D:/demo"

fsEnv.readFile(inoutFormat,filePath)
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.keyBy(0)
.sum(1)
.addSink(new UserDefineSinkFunction("wordcounts"))

fsEnv.execute("FlinkWordCountsUserDefineSink")

print/printToError

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.keyBy(0)
.sum(1)
.printToErr("测试")

fsEnv.execute("FlinkWordCounts")

RedisSink

参考:https://bahir.apache.org/docs/flink/current/flink-streaming-redis/

<dependency>
<groupId>org.apache.bahir</groupId>
<artifactId>flink-connector-redis_2.11</artifactId>
<version>1.0</version>
</dependency>

class UserDefineRedisMapper(addationKey:String) extends RedisMapper[(String,Int)]{
  override def getCommandDescription: RedisCommandDescription = {
    new RedisCommandDescription(RedisCommand.HSET,addationKey)
  }

  override def getKeyFromData(t: (String, Int)): String = {
    t._1
  }

  override def getValueFromData(t: (String, Int)): String = {
    t._2.toString
  }
}

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment

val props = new Properties()
props.setProperty("bootstrap.servers", "hadoopnode00:9092")
props.setProperty("group.id", "flink")

val jedisConfig = new FlinkJedisPoolConfig.Builder().setHost("hadoopnode00").setPort(6379).build

fsEnv.addSource[String](new FlinkKafkaConsumer011("flink_kafka",new SimpleStringSchema(),props))

.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.keyBy(0)
.sum(1)
.addSink(new RedisSink[(String,Int)](jedisConfig,new UserDefineRedisMapper("wc_redis")))

fsEnv.execute("FlinkWordCountsRedisSink")

KafkaSink

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-kafka-0.11_2.11</artifactId>
    <version>1.8.1</version>
</dependency>

class UserDefineKeyedSerializationSchema extends KeyedSerializationSchema[(String,Int)]{
  override def serializeKey(element: (String, Int)): Array[Byte] = {
    element._1.getBytes()
  }

  override def serializeValue(element: (String, Int)): Array[Byte] = {
    element._2.toString.getBytes()
  }

  override def getTargetTopic(element: (String, Int)): String = {
    "count_topic"
  }
}

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment

val props1 = new Properties()
props1.setProperty("bootstrap.servers", "hadoopnode00:9092")
props1.setProperty("group.id", "flink")

val props2 = new Properties()
//无需指定key-value序列化
props2.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "hadoopnode00:9092")
props2.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG,"true")
props2.setProperty(ProducerConfig.ACKS_CONFIG,"all")
props2.setProperty(ProducerConfig.BATCH_SIZE_CONFIG,"1024")
props2.setProperty(ProducerConfig.LINGER_MS_CONFIG,"500")

fsEnv.addSource[String](new FlinkKafkaConsumer011("flink_kafka",new SimpleStringSchema(),props1))

.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.keyBy(0)
.sum(1)
.addSink(new FlinkKafkaProducer011[(String, Int)]("defaultTopic",new UserDefineKeyedSerializationSchema,props2) )

fsEnv.execute("FlinkWordCountsKafkaSink")

Flink Operators

参考：https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/operators/

DataStream → DataStream

Map
Takes one element and produces one element. A map function that doubles the values of the input stream:

dataStream.map { x => x * 2 }

FlatMap
Takes one element and produces zero, one, or more elements. A flatmap function that splits sentences to words:

dataStream.flatMap { str => str.split("\\s+") }

Filter
Evaluates a boolean function for each element and retains those for which the function returns true. A filter that filters out zero values:

dataStream.filter { _ != 0 }

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
    .flatMap(line=>line.split("\\s+"))
    .map(word=>(word,1))
    .filter(!_._1.equals("error"))//过滤掉含有error的tuple
    .print()

fsEnv.execute("FlinkDatastream2Datastream")

DataStream → KeyedStream

KeyBy
Logically partitions a stream into disjoint partitions, each partition containing elements of the same key. Internally, this is implemented with hash partitioning.

dataStream.keyBy("field name") // Key by field "someKey"
dataStream.keyBy(position) // Key by the first element of a Tuple

Reduce
A “rolling” reduce on a keyed data stream. Combines the current element with the last reduced value and emits the new value.

keyedStream.reduce(_+_ )

Fold
A “rolling” fold on a keyed data stream with an initial value. Combines the current element with the last folded value and emits the new value.
A fold function that, when applied on the sequence (1,2,3,4,5), emits the sequence “start-1”, “start-1-2”, “start-1-2-3”, …

val result: DataStream[String] =
keyedStream.fold("start")((str, i) => { str + "-" + i })

Aggregations
Rolling aggregations on a keyed data stream. The difference between min and minBy is that min returns the minimum value, whereas minBy returns the element that has the minimum value in this field (same for max and maxBy).

keyedStream.sum(0)
keyedStream.sum("key")
keyedStream.min(0)
keyedStream.min("key")
keyedStream.max(0)
keyedStream.max("key")
keyedStream.minBy(0)
keyedStream.minBy("key")
keyedStream.maxBy(0)
keyedStream.maxBy("key")

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
//001 zhangsan 研发部 25000 25
//002 lisi 研发部 30000 30
//003 wangwu 产品部 35000 35
fsEnv.socketTextStream("HadoopNode00",9999)
.map(line=>line.split("\\s+"))
.map(tokens=>(tokens(0),tokens(1),tokens(2),tokens(3).toDouble,tokens(4).toInt))
.keyBy(2)
.minBy(4)//返回含有最小值那一条记录，而min仅仅返回最小值，其它信息不变
.print()

fsEnv.execute("FlinkDatastream2KeyedStream03")

case class Employee(id:String,name:String,dept:String,salary:Double,age:Int)
object FlinkDatastream2KeyedStream01 {
  def main(args: Array[String]): Unit = {
    val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
    //001 zhangsan 研发部 25000 25
    //002 lisi 研发部 30000 30
    //003 wangwu 产品部 35000 35

    fsEnv.socketTextStream("HadoopNode00",9999)
        .map(line=>line.split("\\s+"))
        .map(tokens=>Employee(tokens(0),tokens(1),tokens(2),tokens(3).toDouble,tokens(4).toInt))
        .keyBy("dept")
        .sum("salary")
        .print()

    fsEnv.execute("FlinkDatastream2KeyedStream01")
  }
}

DataStream* → DataStream

Union
Union of two or more data streams creating a new stream containing all the elements from all the streams. Note: If you union a data stream with itself you will get each element twice in the resulting stream.

dataStream.union(otherStream1, otherStream2, ...)

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
//001 zhangsan 研发部 25000 25
//002 lisi 研发部 30000 30
//003 wangwu 产品部 35000 35
var dataStream1:DataStream[String]= fsEnv.socketTextStream("HadoopNode00",9999)
var dataStream2:DataStream[String]= fsEnv.socketTextStream("HadoopNode00",8888)
var dataStream3:DataStream[String]= fsEnv.socketTextStream("HadoopNode00",7777)
dataStream1.union(dataStream2,dataStream3)
.map(line=>line.split("\\s+"))
.map(tokens=>(tokens(2),tokens(3).toDouble))
.keyBy(0)
.fold(("",0.0,0))((init,value)=>( value._1,init._2+value._2,init._3+1))
.map(v=>(v._1,v._2,v._2/v._3))
.print()

fsEnv.execute("FlinkManyDatastream2OneDatastream")

DataStream,DataStream → ConnectedStreams

Connect
“Connects” two data streams retaining their types, allowing for shared state between the two streams.

someStream : DataStream[Int] = ...
otherStream : DataStream[String] = ...
val connectedStreams = someStream.connect(otherStream)

CoMap, CoFlatMap
Similar to map and flatMap on a connected data stream

connectedStreams.map(
    (_ : Int) => true,
    (_ : String) => false
)
connectedStreams.flatMap(
    (_ : Int) => true,
    (_ : String) => false
)

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment

val props = new Properties()
props.setProperty("bootstrap.servers", "hadoopnode00:9092")
props.setProperty("group.id", "flink")
//1 zhangsan 18000
var dataStream1:DataStream[String]= fsEnv.socketTextStream("HadoopNode00",9999)
//{"id":1,"name":"zhangsan","salary":18000}
var dataStream2:DataStream[ObjectNode]= fsEnv.addSource[ObjectNode](new FlinkKafkaConsumer011[ObjectNode]("flink_kafka",new JSONKeyValueDeserializationSchema(true),props))

dataStream1.connect(dataStream2)
.map(new CoMapFunction[String,ObjectNode,User] {

    override def map1(value: String): User = {
        val tokens = value.split("\\s+")
        User(tokens(0).toInt,tokens(1),tokens(2).toDouble)
    }
    override def map2(value: ObjectNode): User = {
        val id = value.get("value").get("id").asInt()
        val name = value.get("value").get("name").asText()
        val salary = value.get("value").get("salary").asDouble()
        User(id,name,salary)
    }
})
.keyBy("id","name")
.sum("salary")
.print()

DataStream → SplitStream

Split
Split the stream into two or more streams according to some criterion.

val split = someDataStream.split(
  (num: Int) =>
    (num % 2) match {
      case 0 => List("even")
      case 1 => List("odd")
    }
)

Select
Select one or more streams from a split stream.

val even = split.select("even")
val odd = split select "odd"
val all = split.select("even","odd")

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment

val splitStream: SplitStream[String] = fsEnv.socketTextStream("HadoopNode00", 9999)
.split(value=>{
    if (value.contains("error")) {
        List("error")
    } else {
        List("info")
    }
})
splitStream.select("error").printToErr("错误")
splitStream.select("info").print("信息")
splitStream.select("error","info").print("所有信息")

fsEnv.execute("FlinkConnectStream2Datastream02")

另外一种写法

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment

val outputTag = new OutputTag[String]("error")
val dataStream: DataStream[String] = fsEnv.socketTextStream("HadoopNode00", 9999)
.process(new ProcessFunction[String, String] {
    override def processElement(value: String,
                                ctx: ProcessFunction[String, String]#Context,
                                out: Collector[String]): Unit = {
        if (value.contains("error")) {
            ctx.output(outputTag, value)
        } else {
            out.collect(value)
        }
    }
})
dataStream.print("正常信息")
dataStream.getSideOutput(outputTag).printToErr("错误")

深夜的星星

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Flink代码架构

Flink代码架构篇Quick Startmaven 依赖<dependencies> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_2.11</artifactId> ...
复制链接

扫一扫