Flink流广播实例分析

最新推荐文章于 2024-05-06 09:51:34 发布

沧海笑007

最新推荐文章于 2024-05-06 09:51:34 发布

阅读量1.9k

点赞数 3

分类专栏： Flink

本文链接：https://blog.csdn.net/ZLZ2017/article/details/86444657

版权

Flink 专栏收录该内容

8 篇文章 1 订阅

订阅专栏

前言

继上一篇，我们介绍了广播变量后，本篇将以某报警规则为例进一步说明广播变量的使用。
具体场景如下：
1、数据源有两种消息：Route Msg和Alarm Msg
2、 Route Msg中有两个关键字段：resultType和resultMark，其中resultType需要和每条报警规则对应，resultMark标志该条消息是有效或者无效。
3、 Alarm Msg根据报警规则验证，根据匹配结果流到不同的kafka topic。
这是一个多消息报警场景，一种Alarm/Route消息至少与一条报警规则对应，Route消息有效时，对应的报警规则才有意义。
报警消息的处理需要延时低，因此使用流计算技术处理再适合不过了。我们利用kafka+flink来实现此报警规则的匹配，两种消息Alarm和Route分入到kafka的不同topic消息，供Flink来消费，而报警规则通过配置文件实现动态配置。则上述场景则转化为一个低吞吐量的“报警规则”实时流和一条主实时流的联合计算问题。显然Flink的流广播满足该场景。我们把Route消息和动态报警规则归一化到一条流中，然后进行广播，再将Alarm消息和广播流联合计算进行实时规则匹配计算。

配置文件

监视

配置文件的监控容易实现，Flink本身就支持定期监视文件，当文件内容改变时，重新读取文件的内容，该接口如下：
readFile(fileInputFormat, path, watchType, interval, pathFilter, typeInfo) 。
这里说明几点：
1. Flink将文件读取过程分为两个子任务，即目录监控和数据读取。这些子任务中的每一个都由单独的实体实现。监视由单个非并行（并行性= 1）任务实现，而读取由并行运行的多个任务执行，后者的并行性等于job的并行性。
2. watchType若设置为FileProcessingMode.PROCESS_CONTINUOUSLY，则在修改文件时，将完全重新处理其内容；若设置为FileProcessingMode.PROCESS_ONCE，则仅读取一次文件内容后就退出。

	MyInputFormat myInputFormat = new MyInputFormat(path);
    myInputFormat.setCharsetName("UTF-8");
    //动态监控文件
    DataStream<Map<String, String>> fileInfo =
            env.readFile(myInputFormat,
                    filePath,
                    FileProcessingMode.PROCESS_CONTINUOUSLY,
                    1000,
                    TypeInformation.of(new TypeHint<Map<String, String>>() {}));

Route消息和文件流合并

两种消息合并成一条流作为广播流的输入源，而广播流中又要区分两种流的类型，因为我们需要预先处理两条流，如下：

  	//Route消息
    DataStream<String> routeMsg = env.addSource(new FlinkKafkaConsumer010<>(ROUTE_TOPIC, new SimpleStringSchema(), properties));

    //将规则和route消息routeMsg联合到一个流中
    DataStream<Tuple2<Integer, String>> ruleInfo =fileInfo.filter(
            new FilterFunction<Map<String, String>>() {
                @Override
                public boolean filter(Map<String, String> event) throws Exception {
                    return !event.isEmpty();
                }
            }
    ).map(new MapFunction<Map<String, String>, String>() {
        @Override
        public String map(Map<String, String> rule) throws Exception {
            for (String key: rule.keySet()){
                return "ruleMsg"+"#"+key+"#"+rule.get(key); //规则配置消息
            }
            return null;
        }
    }).filter(new FilterFunction<String>() {
        @Override
        public boolean filter(String event) throws Exception {
            return event != null;
        }
    }).union(routeMsg).map(new MapFunction<String, Tuple2<Integer, String>>() {
            @Override
            public Tuple2<Integer, String> map(String msg) throws Exception {
                if (msg.indexOf("ruleMsg") != -1){
                    return Tuple2.of(1, msg.substring("ruleMsg".length()+ 1)); //规则消息
                }else{
                    return Tuple2.of(2,msg); //Route消息
                }
            }
        });

配置文件格式

由上述使用场景，我们知道：一种Alarm/Route消息至少与一条报警规对应，因此配置文件中每一个规则我们可以以section分开，并且每个section中需要包含分别与Alarm和Route消息对应的关键字，另外规则是必不可少的。考虑到每条报警规则都要有效期，时间字段是必不可少的。
针对该场景（动态规则）下的动态规则的状态保存和变化，Flink官网上有一个很好的例子，这里我介绍另一种方法，不适用其他状态保存，只适用广播状态。我在配置文件的每个section中又新增了个新的关键字valid：标志该条规则是否有效，0无效，1有效，若想删除该条规则，则必须先置为0后才能删除该条规则，这也是为后续界面化动态规则配置提前准备。

文件

配置文件格式如下：

#valid标志该条规则是否有效，0无效，1有效，若想删除该条规则，则必须先置为0后才能删除该条规则。
#resultType：对应的Route消息类型
#duration：持续时间，单位分钟，0为一致有效+++++++--------------
#rule:需要匹配的告警消息类型，第一列是字段，第二列是字段的值
#formate代表规则
#注意Route消息必须用A表示，告警消息必须用B表示，其中一级节点用A.age 表示，二级节点即数组用A.data[0].da表示，字符串用equals表示是否相等，其他整形为关系表达式
#如Alarm Msg：{a:1,xx:"ee",c:1,type:1,m:[{data:1, yy:2}]}    RouteMsg：{a:1,xx:"ee",c:1,resultType:1,resultMark:1,m:[{data:1, yy:2}]}
[rule1]
valid=1
resultType=1  
duration= 1
rule=type,1
formate=A.xx.equals(B.xx) && B.m[0].data=1
[rule2]
valid=1
resultType=2 
duration=0 
rule=type,2
formate=A.xx.equals(B.xx) && B.c=1

自定义文件格式

Flink提供了几种默认的文件读取格式，这里我们实现自己的自定义文件格式，一方面是为了我们后续的数据处理，另一方面也介绍下自定义格式实现的方法。走读Flink源码，进入ContinuousFileReaderOperator的内部类SplitReader，我们可以看到，其读取数据的线程中有一段代码：

.......
 while(!this.format.reachedEnd()) {
                                Object var3 = this.checkpointLock;
                                synchronized(this.checkpointLock) {
                                    e1 = this.format.nextRecord(e1);
                                    if(e1 == null) {
                                        break;
                                    }

                                    this.readerContext.collect(e1);
                                }
                            }

因此我们要实现自定义FileInputFormat，简单的实现方式可以重写nextRecord，而nextRecord又调用readRecord方法，故我们可以对nextRecord和readRecord方法重写来实现我们自己的FileInputFormat。

//自定义文件格式
public class MyInputFormat extends DelimitedInputFormat<Map<String, String>> {
    private static final long serialVersionUID = 1L;
    private String charsetName = "UTF-8";
    private String currentSection = "global"; //处理缺省的section
    
    public MyInputFormat(Path filePath) {
        super(filePath, (Configuration)null);
    }

    public String getCharsetName() {
        return this.charsetName;
    }

    public void setCharsetName(String charsetName) {
        if (charsetName == null) {
            throw new IllegalArgumentException("Charset must not be null.");
        } else {
            this.charsetName = charsetName;
        }
    }

    public void configure(Configuration parameters) {
        super.configure(parameters);
        if (this.charsetName == null || !Charset.isSupported(this.charsetName)) {
            throw new RuntimeException("Unsupported charset: " + this.charsetName);
        }
    }

    public Map<String, String> readRecord(Map<String, String> reusable, byte[] bytes, int offset, int numBytes) throws IOException {
        if (this.getDelimiter() != null && this.getDelimiter().length == 1 && this.getDelimiter()[0] == 10 && offset + numBytes >= 1 && bytes[offset + numBytes - 1] == 13) {
            --numBytes;
        }

        Map<String, String> listResult = new LinkedHashMap<>();
        String str = new String(bytes, offset, numBytes, this.charsetName);
        str = removeIniComments(str).trim(); //去掉尾部的注释、去掉首尾空格
        if("".equals(str)|| str == null){
            return  Collections.emptyMap();
        }
        //是否一个新section开始了
        if(str.startsWith("[")){
            if (str.endsWith("]")){
                String newSection = str.substring(1, str.length()-1).trim();
                //如果新section不是现在的section，则把当前section存进listResult中
                if(!currentSection.equals(newSection)){
                    currentSection = newSection;
                }
            }
            return Collections.emptyMap();
        }else{
            listResult.put(currentSection, str);
        }
        return listResult;
    }

    private String removeIniComments(String source){
        String result = source;
        if(result.contains("#")){
            result = result.substring(0, result.indexOf("#"));
        }
        return result.trim();
    }

    public String toString() {
        return "MyFileInputFormat (" + this.getFilePath() + ") - " + this.charsetName;
    }
}

结语

本文简单介绍了Flink流广播的一个用例，并结合代码介绍了其中的一些实现，具体广播流的存储、报警规则匹配限于篇幅，这里不一一说明，后续会在github上给出，请自行阅读。主要的功能点包括：
1. 广播流的存储：存储、删除、生效、失效等
2. 报警规则匹配：有效期判断、规则等
3. foamt格式转换与解析：字符串解析、Json解析等

沧海笑007

关注

3
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Flink流广播实例分析

前言继上一篇，我们介绍了广播变量后，本篇将以某报警规则为例进一步说明广播变量的使用。具体场景如下： 1、数据源有两种消息：Route Msg和Alarm Msg 2、 Route Msg中有两个关键字段：resultType和resultMark，其中resultType需要和每条报警规则对应，resultMark标志该条消息是有效或者无效。 3、 Alarm Msg根据报...
复制链接

扫一扫

专栏目录