Dinky-数据开发源码详解

本文详细介绍了Dinky平台的重要功能模块,包括数据开发、运维中心和元数据中心等,重点剖析了FlinkSqlEnv的使用以及作业的分类(FlinkSQL、commonSQL和jar作业)。源码层面展示了如何根据作业类型执行不同操作,如执行FlinkSQL和JAR作业的流程。
摘要由CSDN通过智能技术生成

最近用了Dinky之后,感觉Dinky功能真的很强大,特别是近几年平台开发越来越火,作为一位没有做过平台开发的攻城狮,总感觉差点意思,但是看了Dinky之后,这个平台比朋友公司的实时平台差不了多少,甚至还更优,所以打算花点时间好好看看这个平台的功能和代码逻辑。

我们先看看Dinky的页面:

数据开发:主要是创建作业的交互页面,还可以看到作业之中的血缘

运维中心 :主要管理作业的上下线,查看作业明细

元数据中心:主要管理各种数据库的各种表和元数据

注册中心:主要做Flink集群的管理,jar包管理,数据源管理,报警管理等

 

认证中心:主要做角色管理和权限管理 

上面就是Dinky比较重要的功能模块和其主要作用

1.Dinky作业类型

下面为Dinky的作业类型

其中有很多我们常见的数据库,里面有一个FlinkSqlEnv,他主要是用来做表和库的初始化,如果你有一些库和表经常要用,可以把他放在FlinkSqlEnv,写flink sql作业时选择对应的环境即可完成对应表的初始化 

我这里创建了一个11的FlinkSqlEnv ,里面就放了一个源表的初始化SQL

然后dinky作业还支持,scala,java,python作业,这三个主要是给我们用来写UDF和UDTF时使用的

2.Dinky源码作业类型划分和提交作业源码深入讲解

 上一节我们大致看了一下提交作业的源码,这一节我们详细看下,我们从TaskServiceImpl#submitTask方法开始看起

 @Override
    public JobResult submitTask(Integer id) {
        // 通过taskId获取task信息
        Task task = this.getTaskInfoById(id);
        Asserts.checkNull(task, Tips.TASK_NOT_EXIST);
        // 判断task的Dialect是否是flink sql 不是的话调用executeCommonSql方法执行sql
        if (Dialect.notFlinkSql(task.getDialect())) {
            return executeCommonSql(SqlDTO.build(task.getStatement(),
                    task.getDatabaseId(), null));
        }
        ProcessEntity process = null;
        // 获取进程实例 ProcessEntity
        if (StpUtil.isLogin()) {
            process = ProcessContextHolder.registerProcess(
                    ProcessEntity.init(ProcessType.FLINKSUBMIT, StpUtil.getLoginIdAsInt()));
        } else {
            process = ProcessEntity.NULL_PROCESS;
        }
        process.info("Initializing Flink job config...");
        JobConfig config = buildJobConfig(task);

        // 如果GatewayType是k8s appplication,加载容器
        if (GatewayType.KUBERNETES_APPLICATION.equalsValue(config.getType())) {
            loadDocker(id, config.getClusterConfigurationId(), config.getGatewayConfig());
        }

        // 创建jobmanager
        JobManager jobManager = JobManager.build(config);
        process.start();
        // 判断配置的是否是jarTask,是的话调用jobManager.executeJar(),不是的话调用jobManager.executeSql
        if (!config.isJarTask()) {

            JobResult jobResult = jobManager.executeSql(task.getStatement());
            process.finish("Submit Flink SQL finished.");
            return jobResult;
        } else {
            JobResult jobResult = jobManager.executeJar();
            process.finish("Submit Flink Jar finished.");
            return jobResult;
        }
    }

我们来看第一小节代码:根据taskId获取task信息,然后根据task的dialect判断是否是flink sql,不是的话执行executeCommonSql方法,所以dinky sql作业只会分类两类flink sql和common sql,hive,starrocks,clickhouse都属于common sql 了

 然后我们看看他是怎么通过taskId获取到task信息的

 @Override
    public Task getTaskInfoById(Integer id) {
        //获取task
        Task task = this.getById(id);
        if (task != null) {
            task.parseConfig();
            //获取statement,statement中存储的我们写的sql
            Statement statement = statementService.getById(id);
            if (task.getClusterId() != null) {
                Cluster cluster = clusterService.getById(task.getClusterId());
                if (cluster != null) {
                    task.setClusterName(cluster.getAlias());
                }
            }
            if (statement != null) {
                task.setStatement(statement.getStatement());
            }
            JobInstance jobInstance = jobInstanceService.getJobInstanceByTaskId(id);
            if (Asserts.isNotNull(jobInstance) && !JobStatus.isDone(jobInstance.getStatus())) {
                task.setJobInstanceId(jobInstance.getId());
            } else {
                task.setJobInstanceId(0);
            }
        }
        return task;
    }

 getById方法直接调用的baomidou的源码方法,他会去对应的表dlink_task中查找信息

 可以看到dlink_task表的dialect字段就是我们再前端页面看到的那些作业类型

然后statementService.getId()就是从dlink_task_statement中查找对应的statement信息

可以看到statement字段中存放是我们写的sql,现在dialect和sql我们都获取到了,就可以根据dialect筛选出非flink sql的sql作业然后调用对应方法执行

private JobResult executeCommonSql(SqlDTO sqlDTO) {
        JobResult result = new JobResult();
        result.setStatement(sqlDTO.getStatement());
        result.setStartTime(LocalDateTime.now());
        if (Asserts.isNull(sqlDTO.getDatabaseId())) {
            result.setSuccess(false);
            result.setError("请指定数据源");
            result.setEndTime(LocalDateTime.now());
            return result;
        } else {
            //todo 获取数据库
            DataBase dataBase = dataBaseService.getById(sqlDTO.getDatabaseId());
            if (Asserts.isNull(dataBase)) {
                result.setSuccess(false);
                result.setError("数据源不存在");
                result.setEndTime(LocalDateTime.now());
                return result;
            }
            //todo 获取驱动
            Driver driver = Driver.build(dataBase.getDriverConfig());
            //todo 执行sql
            JdbcSelectResult selectResult = driver.executeSql(sqlDTO.getStatement(), sqlDTO.getMaxRowNum());
            driver.close();
            result.setResult(selectResult);
            if (selectResult.isSuccess()) {
                result.setSuccess(true);
            } else {
                result.setSuccess(false);
                result.setError(selectResult.getError());
            }
            result.setEndTime(LocalDateTime.now());
            return result;
        }
    }

 这个方法中很简单就三步:

1.获取数据库

2.获取驱动

3.执行sql

但是executeSql有三个实现类,大家可以下去自己看看

下面我们就看第二部分Flink SQL和JAR的执行

先看Flink SQL吧,这里前面已经讲过一部分,不再细述,只讲没讲过的

public JobResult executeSql(String statement) {
        initClassLoader(config);
        // 获取进程实例 ProcessEntity
        ProcessEntity process = ProcessContextHolder.getProcess();
        // 初始化job
        Job job = Job.init(runMode, config, executorSetting, executor, statement, useGateway);
        if (!useGateway) {
            job.setJobManagerAddress(environmentSetting.getAddress());
        }
        JobContextHolder.setJob(job);
        ready();
        String currentSql = "";
        // 根据sql类型将sql分类并放进不同的list,封装成JobParam
        JobParam jobParam = Explainer.build(executor, useStatementSet, sqlSeparator)
                .pretreatStatements(SqlUtil.getStatements(statement, sqlSeparator));
        try {
            // 初始化UDF
            initUDF(jobParam.getUdfList(), runMode, config.getTaskId());

            // 执行DDL
            for (StatementParam item : jobParam.getDdl()) {
                currentSql = item.getValue();
                executor.executeSql(item.getValue());
            }
            // insert语句的list集合大于0
            if (jobParam.getTrans().size() > 0) {
                // Use statement set or gateway only submit inserts.
                // 使用statementSet 和gateWay
                if (useStatementSet && useGateway) {
                    List<String> inserts = new ArrayList<>();
                    for (StatementParam item : jobParam.getTrans()) {
                        inserts.add(item.getValue());
                    }

                    // Use statement set need to merge all insert sql into a sql.
                    currentSql = String.join(sqlSeparator, inserts);
                    // 利用gateWay的方式提交sql
                    GatewayResult gatewayResult = submitByGateway(inserts);
                    // Use statement set only has one jid.
                    job.setResult(InsertResult.success(gatewayResult.getAppId()));
                    job.setJobId(gatewayResult.getAppId());
                    job.setJids(gatewayResult.getJids());
                    job.setJobManagerAddress(formatAddress(gatewayResult.getWebURL()));
                    if (gatewayResult.isSucess()) {
                        job.setStatus(Job.JobStatus.SUCCESS);
                    } else {
                        job.setStatus(Job.JobStatus.FAILED);
                        job.setError(gatewayResult.getError());
                    }
                    // 使用statementSet 和不使用gateWay
                } else if (useStatementSet && !useGateway) {
                    List<String> inserts = new ArrayList<>();
                    for (StatementParam item : jobParam.getTrans()) {
                        if (item.getType().isInsert()) {
                            inserts.add(item.getValue());
                        }
                    }
                    if (inserts.size() > 0) {
                        currentSql = String.join(sqlSeparator, inserts);
                        // Remote mode can get the table result.
                        // 调用executor.executeStatementSet提交statementSet
                        TableResult tableResult = executor.executeStatementSet(inserts);
                        if (tableResult.getJobClient().isPresent()) {
                            job.setJobId(tableResult.getJobClient().get().getJobID().toHexString());
                            job.setJids(new ArrayList<String>() {

                                {
                                    add(job.getJobId());
                                }
                            });
                        }
                        if (config.isUseResult()) {
                            // Build insert result.
                            IResult result = ResultBuilder
                                    .build(SqlType.INSERT, config.getMaxRowNum(), config.isUseChangeLog(),
                                            config.isUseAutoCancel(), executor.getTimeZone())
                                    .getResult(tableResult);
                            job.setResult(result);
                        }
                    }
                    // 使用Gateway 和不使用StatementSet
                } else if (!useStatementSet && useGateway) {
                    List<String> inserts = new ArrayList<>();
                    for (StatementParam item : jobParam.getTrans()) {
                        inserts.add(item.getValue());
                        // Only can submit the first of insert sql, when not use statement set.
                        break;
                    }
                    currentSql = String.join(sqlSeparator, inserts);
                    // 使用submitByGateway方法提交sql
                    GatewayResult gatewayResult = submitByGateway(inserts);
                    job.setResult(InsertResult.success(gatewayResult.getAppId()));
                    job.setJobId(gatewayResult.getAppId());
                    job.setJids(gatewayResult.getJids());
                    job.setJobManagerAddress(formatAddress(gatewayResult.getWebURL()));
                    if (gatewayResult.isSucess()) {
                        job.setStatus(Job.JobStatus.SUCCESS);
                    } else {
                        job.setStatus(Job.JobStatus.FAILED);
                        job.setError(gatewayResult.getError());
                    }
                } else {
                    // 其他情况使用FlinkInterceptor提交sql
                    for (StatementParam item : jobParam.getTrans()) {
                        currentSql = item.getValue();
                        FlinkInterceptorResult flinkInterceptorResult = FlinkInterceptor.build(executor,
                                item.getValue());
                        if (Asserts.isNotNull(flinkInterceptorResult.getTableResult())) {
                            if (config.isUseResult()) {
                                IResult result = ResultBuilder
                                        .build(item.getType(), config.getMaxRowNum(), config.isUseChangeLog(),
                                                config.isUseAutoCancel(), executor.getTimeZone())
                                        .getResult(flinkInterceptorResult.getTableResult());
                                job.setResult(result);
                            }
                        } else {
                            if (!flinkInterceptorResult.isNoExecute()) {
                                TableResult tableResult = executor.executeSql(item.getValue());
                                if (tableResult.getJobClient().isPresent()) {
                                    job.setJobId(tableResult.getJobClient().get().getJobID().toHexString());
                                    job.setJids(new ArrayList<String>() {

                                        {
                                            add(job.getJobId());
                                        }
                                    });
                                }
                                if (config.isUseResult()) {
                                    IResult result = ResultBuilder.build(item.getType(), config.getMaxRowNum(),
                                            config.isUseChangeLog(), config.isUseAutoCancel(),
                                            executor.getTimeZone()).getResult(tableResult);
                                    job.setResult(result);
                                }
                            }
                        }
                        // Only can submit the first of insert sql, when not use statement set.
                        break;
                    }
                }
            }
            if (jobParam.getExecute().size() > 0) {
                if (useGateway) {
                    for (StatementParam item : jobParam.getExecute()) {
                        executor.executeSql(item.getValue());
                        if (!useStatementSet) {
                            break;
                        }
                    }
                    GatewayResult gatewayResult = null;
                    config.addGatewayConfig(executor.getSetConfig());
                    if (runMode.isApplicationMode()) {
                        gatewayResult = Gateway.build(config.getGatewayConfig()).submitJar();
                    } else {
                        StreamGraph streamGraph = executor.getStreamGraph();
                        streamGraph.setJobName(config.getJobName());
                        JobGraph jobGraph = streamGraph.getJobGraph();
                        if (Asserts.isNotNullString(config.getSavePointPath())) {
                            jobGraph.setSavepointRestoreSettings(
                                    SavepointRestoreSettings.forPath(config.getSavePointPath(), true));
                        }
                        gatewayResult = Gateway.build(config.getGatewayConfig()).submitJobGraph(jobGraph);
                    }
                    job.setResult(InsertResult.success(gatewayResult.getAppId()));
                    job.setJobId(gatewayResult.getAppId());
                    job.setJids(gatewayResult.getJids());
                    job.setJobManagerAddress(formatAddress(gatewayResult.getWebURL()));
                    if (gatewayResult.isSucess()) {
                        job.setStatus(Job.JobStatus.SUCCESS);
                    } else {
                        job.setStatus(Job.JobStatus.FAILED);
                        job.setError(gatewayResult.getError());
                    }
                } else {
                    for (StatementParam item : jobParam.getExecute()) {
                        executor.executeSql(item.getValue());
                        if (!useStatementSet) {
                            break;
                        }
                    }
                    JobClient jobClient = executor.executeAsync(config.getJobName());
                    if (Asserts.isNotNull(jobClient)) {
                        job.setJobId(jobClient.getJobID().toHexString());
                        job.setJids(new ArrayList<String>() {

                            {
                                add(job.getJobId());
                            }
                        });
                    }
                    if (config.isUseResult()) {
                        IResult result = ResultBuilder
                                .build(SqlType.EXECUTE, config.getMaxRowNum(), config.isUseChangeLog(),
                                        config.isUseAutoCancel(), executor.getTimeZone())
                                .getResult(null);
                        job.setResult(result);
                    }
                }
            }
            job.setEndTime(LocalDateTime.now());
            if (job.isFailed()) {
                failed();
            } else {
                job.setStatus(Job.JobStatus.SUCCESS);
                success();
            }
        } catch (Exception e) {
            String error = LogUtil.getError("Exception in executing FlinkSQL:\n" + currentSql, e);
            job.setEndTime(LocalDateTime.now());
            job.setStatus(Job.JobStatus.FAILED);
            job.setError(error);
            process.error(error);
            failed();
        } finally {
            close();
        }
        return job.getJobResult();
    }

 我们看下这里,这里将sql根据类型分类,那是如何分类,分为几类的,我们再详细看看

这里前面创建了四个list:ddl,trans,execute,statementList ,下面我们看看他们有啥区别

public JobParam pretreatStatements(String[] statements) {
        List<StatementParam> ddl = new ArrayList<>();
        List<StatementParam> trans = new ArrayList<>();
        List<StatementParam> execute = new ArrayList<>();
        List<String> statementList = new ArrayList<>();
        List<UDF> udfList = new ArrayList<>();
        for (String item : statements) {
            String statement = executor.pretreatStatement(item);
            if (statement.isEmpty()) {
                continue;
            }
            SqlType operationType = Operations.getOperationType(statement);
            if (operationType.equals(SqlType.ADD)) {
                AddJarSqlParser.getAllFilePath(statement).forEach(JarPathContextHolder::addOtherPlugins);
                DinkyClassLoaderContextHolder.get()
                        .addURL(URLUtils.getURLs(JarPathContextHolder.getOtherPluginsFiles()));
            } else if (operationType.equals(SqlType.INSERT)
                    || operationType.equals(SqlType.SELECT)
                    || operationType.equals(SqlType.WITH)
                    || operationType.equals(SqlType.SHOW)
                    || operationType.equals(SqlType.DESCRIBE)
                    || operationType.equals(SqlType.DESC)) {
                trans.add(new StatementParam(statement, operationType));
                statementList.add(statement);
                if (!useStatementSet) {
                    break;
                }
            } else if (operationType.equals(SqlType.EXECUTE)) {
                execute.add(new StatementParam(statement, operationType));
            } else {
                UDF udf = UDFUtil.toUDF(statement);
                if (Asserts.isNotNull(udf)) {
                    udfList.add(UDFUtil.toUDF(statement));
                }
                ddl.add(new StatementParam(statement, operationType));
                statementList.add(statement);
            }
        }
        return new JobParam(statementList, ddl, trans, execute, CollUtil.removeNull(udfList));
    }

 代码里获取到statement后,然后循环遍历,根据SqlType类型不同,进行不同操作,sqlType是个枚举类型,他有很多类型

1.SqlType.ADD可以理解为add jar之类的sql

2.他把insert,select,with,show,describe,desc都加入到trans这个list

3.execute类型加入到execute list中

4.然后判断是否是udf,加入udflist,ddllist,statementlist

后面会根据是否使用gateway和statementSet调用submitByGateway(inserts)方法,这里的inserts就是刚才的trans list

 submitByGateway会根据是否是application mode,调用submitJar或者submitJobGraph方法提交sql

最后一个提交jar包直接调用submitJar提交对应的jar包

3.总结 

dinky源码将作业分类三类,common sql, flink sql 和flink jar然后调用不同方法进行提交

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值