利用Flink(1.17)滚动窗口实时统计Apache Doris审计日志

3 篇文章 0 订阅
背景及需求:

目前生产环境开放给其他业务组后,需要实时统计各业务组访问数仓(Apache Doris)频次,分析异常请求用户,做到实时监控及提前预警,同时针对慢查询日志,要留存分析原因,降低慢查询带来的负影响。

具体需求:统计每30S内各用户访问数仓的频次,同时过滤出慢查询,将慢查询单独输出,用于后期分析。

具体方案:
1.分析Doris审计日志:

从日志可以看到,数据是以"|“分割开的KV数据(这块要特殊说明一点,如果Doris中的一条查询为慢查询,则会产生两条日志,一条日志为正常查询日志格式,一条为慢查询日志格式),可以按照”|"切分数据,然后定义一个实体来接收数据。

#正常查询产生的日志
2023-04-21 18:07:56,218 [query] |Client=127.0.0.1:45716|User=default_cluster:bi_team|Db=default_cluster:information_schema|State=EOF|Time=326|ScanBytes=35123|ScanRows=1188636|ReturnRows=41|StmtId=20149317|QueryId=25bec2d69ce44452-8a63da4232e9a64d|IsQuery=true|feIp=172.22.197.240|Stmt=SELECT * FROM  base.test1 where `_is_delete`='1' LIMIT 5000|CpuTimeMS=12685|SqlHash=0841a9b7ad2049c4346f77fdca0129b1

#慢查询产生的日志:
2023-04-21 18:02:18,306 [slow_query] |Client=127.0.0.1:46080|User=default_cluster:datacenter|Db=default_cluster:base|State=EOF|Time=6924|ScanBytes=141439314|ScanRows=26983397|ReturnRows=1|StmtId=20148539|QueryId=b540f4d08ad64b8e-a9e8b090208fd3d7|IsQuery=true|feIp=172.22.197.240|Stmt=select count(*) ct from base.test2 where _is_delete = 1|CpuTimeMS=25330|SqlHash=bb0c5f8d00f311c556db053c439c59c0
2.依赖导入:

本次采用的事Flink 1.17最新的依赖实现该功能,导入依赖与以前版本稍有区别。

 <dependencies>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>1.17.0</version>
            <scope>provided</scope>
        </dependency>


        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients</artifactId>
            <version>1.17.0</version>
            <scope>provided</scope>
        </dependency>


        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java</artifactId>
            <version>1.17.0</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka</artifactId>
            <version>1.17.0</version>
        </dependency>


        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-statebackend-rocksdb</artifactId>
            <version>1.17.0</version>
            <scope>provided</scope>
        </dependency>


        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-runtime-web</artifactId>
            <version>1.17.0</version>
            <scope>test</scope>
        </dependency>


        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-lang3</artifactId>
            <version>3.9</version>
        </dependency>
        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>2.4</version>
        </dependency>

        <!-- log -->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>1.7.7</version>
        </dependency>
        <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>1.2.17</version>
        </dependency>

        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.18.12</version>
        </dependency>


        <!--加入下面两个依赖才会出现 Flink 的日志出来-->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>1.7.25</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-simple</artifactId>
            <version>1.7.25</version>
        </dependency>

    </dependencies>
3.定义Doris日志实体:
package com.bigdata.entity;

import lombok.Data;

/**
 * Created by:
 *
 * @Author: 
 * @Date: 2023/04/20/14:00
 * @Description:
 */
@Data
public class AuditQueryLogEntity {

    private long logDate;
    private String logType;
    private String client;
    private String user;
    private String db;
    private String status;
    private long time;
    private long scanBytes;
    private long scanRows;
    private long returnRows;
    private String stmtId;
    private String queryId;
    private String isQuery;
    private String feIp;
    private String stmt;
    private long cpuTimeMS;
    private String sqlHash;


    public AuditQueryLogEntity(long logDate, String logType, String client, String user, String db, String status, long time, long scanBytes, long scanRows, long returnRows, String stmtId, String queryId, String isQuery, String feIp, String stmt, long cpuTimeMS, String sqlHash) {
        this.logDate = logDate;
        this.logType = logType;
        this.client = client;
        this.user = user;
        this.db = db;
        this.status = status;
        this.time = time;
        this.scanBytes = scanBytes;
        this.scanRows = scanRows;
        this.returnRows = returnRows;
        this.stmtId = stmtId;
        this.queryId = queryId;
        this.isQuery = isQuery;
        this.feIp = feIp;
        this.stmt = stmt;
        this.cpuTimeMS = cpuTimeMS;
        this.sqlHash = sqlHash;
    }


    public static AuditQueryLogBuild builder() {
        return new AuditQueryLogBuild();
    }


    public static class AuditQueryLogBuild {
        private long logDate;
        private String logType;
        private String client;
        private String user;
        private String db;
        private String status;
        private long time;
        private long scanBytes;
        private long scanRows;
        private long returnRows;
        private String stmtId;
        private String queryId;
        private String isQuery;
        private String feIp;
        private String stmt;
        private long cpuTimeMS;
        private String sqlHash;


        public AuditQueryLogBuild logDate(long logDate) {
            this.logDate = logDate;
            return this;
        }

        public AuditQueryLogBuild logType(String logType) {
            this.logType = logType;
            return this;
        }

        public AuditQueryLogBuild client(String client) {
            this.client = client;
            return this;
        }

        public AuditQueryLogBuild user(String user) {
            this.user = user;
            return this;
        }

        public AuditQueryLogBuild db(String db) {
            this.db = db;
            return this;
        }

        public AuditQueryLogBuild status(String status) {
            this.status = status;
            return this;
        }

        public AuditQueryLogBuild time(long time) {
            this.time = time;
            return this;
        }

        public AuditQueryLogBuild scanBytes(long scanBytes) {
            this.scanBytes = scanBytes;
            return this;
        }

        public AuditQueryLogBuild scanRows(long scanRows) {
            this.scanRows = scanRows;
            return this;
        }

        public AuditQueryLogBuild returnRows(long returnRows) {
            this.returnRows = returnRows;
            return this;
        }

        public AuditQueryLogBuild stmtId(String stmtId) {
            this.stmtId = stmtId;
            return this;
        }

        public AuditQueryLogBuild queryId(String queryId) {
            this.queryId = queryId;
            return this;
        }

        public AuditQueryLogBuild isQuery(String isQuery) {
            this.isQuery = isQuery;
            return this;
        }

        public AuditQueryLogBuild feIp(String feIp) {
            this.feIp = feIp;
            return this;
        }

        public AuditQueryLogBuild stmt(String stmt) {
            this.stmt = stmt;
            return this;
        }

        public AuditQueryLogBuild cpuTimeMS(long cpuTimeMS) {
            this.cpuTimeMS = cpuTimeMS;
            return this;
        }

        public AuditQueryLogBuild sqlHash(String sqlHash) {
            this.sqlHash = sqlHash;
            return this;
        }

        public AuditQueryLogEntity build() {
            return new AuditQueryLogEntity(this.logDate,
                    this.logType,
                    this.client,
                    this.user,
                    this.db,
                    this.status,
                    this.time,
                    this.scanBytes,
                    this.scanRows,
                    this.returnRows,
                    this.stmtId,
                    this.queryId,
                    this.isQuery,
                    this.feIp,
                    this.stmt,
                    this.cpuTimeMS,
                    this.sqlHash);
        }
    }

}

3.定义Kafka数据反序列化格式。

通过实现DeserializationSchema,将数据封装成AuditQueryLogEntity

package com.bigdata.deserializationSchema;

import com.shsc.bigdata.entity.AuditQueryLogEntity;
import com.shsc.bigdata.entity.AuditStreamLoadLogEntity;
import com.shsc.bigdata.utils.DateToTimeStampUtils;
import org.apache.flink.api.common.serialization.DeserializationSchema;
import org.apache.flink.api.common.typeinfo.TypeInformation;

import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.util.Objects;

/**
 * Created by:
 *
 * @Author: 
 * @Date: 2023/04/20/11:13
 * @Description:  反序列化kafka msg,封装成AuditQueryLogEntity
 */
public class AuditQueryLogDeSerializer implements DeserializationSchema<AuditQueryLogEntity> {

    private static final long serialVersionUID = 1L;

    @Override
    public AuditQueryLogEntity deserialize(byte[] message) throws IOException {
        String auditLog = new String(message);
        String[] logArray = auditLog.split("\\|");
        if (logArray.length != 16) {
            return null;
        }
        String logType = logArray[0].substring(logArray[0].indexOf("[") + 1, logArray[0].indexOf("]"));
        String logDate = logArray[0].substring(0, logArray[0].indexOf(","));

        try {
            return AuditQueryLogEntity
                    .builder()
                    .logDate(DateToTimeStampUtils.toTimeStamp(logDate))
                    .logType(logType)
                    .client(sub(logArray[1]))
                    .user(sub(logArray[2]))
                    .db(sub(logArray[3]))
                    .status(sub(logArray[4]))
                    .time(Long.parseLong(sub(logArray[5])))
                    .scanBytes(Long.parseLong(sub(logArray[6])))
                    .scanRows(Long.parseLong(sub(logArray[7])))
                    .returnRows(Long.parseLong(sub(logArray[8])))
                    .stmtId(sub(logArray[9]))
                    .queryId(sub(logArray[10]))
                    .isQuery(sub(logArray[11]))
                    .feIp(sub(logArray[12]))
                    .stmt(sub(logArray[13]))
                    .cpuTimeMS(Long.parseLong(sub(logArray[14])))
                    .sqlHash(sub(logArray[15])).build();
        } catch (Exception e) {
            throw new RuntimeException(e.getMessage());
        }
    }

    @Override
    public boolean isEndOfStream(AuditQueryLogEntity nextElement) {
        return false;
    }

    @Override
    public TypeInformation<AuditQueryLogEntity> getProducedType() {
        return null;
    }


    private String sub(String str) {
        return str.substring(str.indexOf("=") + 1);
    }

}

4.实时统计主函数:

基于事件时间,统计30s内各用户的查询次数, 同时将慢查询日志单独输出

package com.bigdata;

import com.shsc.bigdata.deserializationSchema.AuditQueryLogDeSerializer;
import com.shsc.bigdata.entity.AuditQueryLogEntity;
import com.shsc.bigdata.entity.KafkaSourceEntity;
import com.shsc.bigdata.utils.DateToTimeStampUtils;
import lombok.Data;
import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.AggregateFunction;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.configuration.RestOptions;
import org.apache.flink.connector.kafka.source.KafkaSource;
import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.datastream.WindowedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.ProcessFunction;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import org.apache.flink.util.OutputTag;

import java.time.Duration;
import java.util.Objects;

/**
 * Created by:
 *
 * @Author: 
 * @Date: 2023/04/20/15:13
 * @Description: 统计30s内各用户的查询次数, 同时将慢查询日志单独输出
 */
public class TumblingTimeWindow3 {


    @Data
    static class Event {
        long logTime;
        String user;
        int selectCount;
        String startTime;
        String endTime;


        public Event(long logTime, String user, int selectCount, String startTime, String endTime) {
            this.logTime = logTime;
            this.user = user;
            this.selectCount = selectCount;
            this.startTime = startTime;
            this.endTime = endTime;
        }
    }


    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();
        conf.setInteger(RestOptions.PORT, 8081);

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf);
        //设置checkpoint周期
        env.enableCheckpointing(5000);
        //设置checkpoint保存路径
        env.getCheckpointConfig().setCheckpointStorage("file:///D:\\test\\flink-demo\\checkpoint");
        //定义kafkasource
        KafkaSource<AuditQueryLogEntity> kafkaSource = KafkaSource.<AuditQueryLogEntity>builder()
                .setTopics("test_topic")
                .setValueOnlyDeserializer(new AuditQueryLogDeSerializer())
                .setBootstrapServers("127.0.0.1:9092")
                .setStartingOffsets(OffsetsInitializer.latest())
                .setGroupId("test_v1").build();


        SingleOutputStreamOperator<AuditQueryLogEntity> dataStreamSource = env.fromSource(kafkaSource, WatermarkStrategy.noWatermarks(), "doris-fe-audit-source").returns(AuditQueryLogEntity.class).setParallelism(5);

        //定义旁路输出
        OutputTag<AuditQueryLogEntity> slowQueryOut = new OutputTag<AuditQueryLogEntity>("slow-query") {
        };

        //侧流输出,将慢sql查询日志旁路输出
        SingleOutputStreamOperator<AuditQueryLogEntity> outputStreamOperator = dataStreamSource
                .filter(Objects::nonNull).process(new ProcessFunction<AuditQueryLogEntity, AuditQueryLogEntity>() {
                    @Override
                    public void processElement(AuditQueryLogEntity value, Context ctx, Collector<AuditQueryLogEntity> out) throws Exception {
                        if (Objects.equals(value.getLogType(), "query")) {
                            out.collect(value);
                        } else {
                            ctx.output(slowQueryOut, value);
                        }
                    }
                });

        //打印旁路输出日志
        outputStreamOperator.getSideOutput(slowQueryOut).print("slow-query : ");

        //采用滚动窗口统计30s内各用户的查询次数
        WindowedStream<Event, String, TimeWindow> window = outputStreamOperator.map(log -> new Event(log.getLogDate(), log.getUser(), 1, null, null)).assignTimestampsAndWatermarks(WatermarkStrategy.<Event>forBoundedOutOfOrderness(Duration.ofSeconds(5))
                .withTimestampAssigner(new SerializableTimestampAssigner<Event>() {
                    @Override
                    public long extractTimestamp(Event element, long recordTimestamp) {
                        return element.getLogTime();
                    }
                })).keyBy(e -> e.user).window(TumblingProcessingTimeWindows.of(Time.seconds(30)));

        window.aggregate(new MyLoadFunc(), new MyLoadAggResult()).print();

        env.execute("sid-out-job-test");

    }


    static class MyLoadFunc implements AggregateFunction<Event, Integer, Integer> {
        @Override
        public Integer createAccumulator() {
            return 0;
        }

        @Override
        public Integer add(Event loadResult, Integer accumulator) {
            return accumulator + 1;
        }

        @Override
        public Integer getResult(Integer accumulator) {
            return accumulator;
        }

        @Override
        public Integer merge(Integer integer, Integer acc1) {
            return null;
        }
    }


    static class MyLoadAggResult extends ProcessWindowFunction<Integer, Event, String, TimeWindow> {
        @Override
        public void process(String key, Context context, Iterable<Integer> iterable, Collector<Event> out) throws Exception {
            String start = DateToTimeStampUtils.getDateTime(context.window().getStart());
            String end = DateToTimeStampUtils.getDateTime(context.window().getEnd());
            out.collect(new Event(0L, key, iterable.iterator().next(), start, end));
        }
    }
}
5.结果统计:

从结果中可以看出,30s窗口中,统计了各用户的访问次数,然后从旁路输出了一个慢查询的sql。

3> TumblingTimeWindow3.Event(logTime=0, user=default_cluster:user1, selectCount=94, startTime=2023-04-21 18:25:00, endTime=2023-04-21 18:25:30)
6> TumblingTimeWindow3.Event(logTime=0, user=default_cluster:user2, selectCount=28, startTime=2023-04-21 18:25:00, endTime=2023-04-21 18:25:30)
2> TumblingTimeWindow3.Event(logTime=0, user=default_cluster:user3, selectCount=1, startTime=2023-04-21 18:25:00, endTime=2023-04-21 18:25:30)
2> TumblingTimeWindow3.Event(logTime=0, user=default_cluster:user4, selectCount=5, startTime=2023-04-21 18:25:00, endTime=2023-04-21 18:25:30)
slow-query : :6> AuditQueryLogEntity(logDate=1682072744000, logType=slow_query, client=127.0.0.1:49920, user=default_cluster:user3, db=default_cluster:base, status=OK, time=5135, scanBytes=0, scanRows=0, returnRows=0, stmtId=20151608, queryId=2830914d231a4a0e-b528d4c1a5b5848c, isQuery=false, feIp=172.2.11.5, stmt= select * from test , cpuTimeMS=0, sqlHash=4d8733b6018f9f46cefc6906d7061c97)
2> TumblingTimeWindow3.Event(logTime=0, user=default_cluster:user2, selectCount=2, startTime=2023-04-21 18:25:30, endTime=2023-04-21 18:26:00)
4> TumblingTimeWindow3.Event(logTime=0, user=default_cluster:user3, selectCount=2, startTime=2023-04-21 18:25:30, endTime=2023-04-21 18:26:00)
最后:

只是一个Demo代码,后面还有多大优化,欢迎指正。

  • 4
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值