最近用了Dinky之后,感觉Dinky功能真的很强大,特别是近几年平台开发越来越火,作为一位没有做过平台开发的攻城狮,总感觉差点意思,但是看了Dinky之后,这个平台比朋友公司的实时平台差不了多少,甚至还更优,所以打算花点时间好好看看这个平台的功能和代码逻辑。

我们先看看Dinky的页面:
数据开发:主要是创建作业的交互页面,还可以看到作业之中的血缘

运维中心 :主要管理作业的上下线,查看作业明细

元数据中心:主要管理各种数据库的各种表和元数据

注册中心:主要做Flink集群的管理,jar包管理,数据源管理,报警管理等

认证中心:主要做角色管理和权限管理
上面就是Dinky比较重要的功能模块和其主要作用
1.Dinky作业类型
下面为Dinky的作业类型


其中有很多我们常见的数据库,里面有一个FlinkSqlEnv,他主要是用来做表和库的初始化,如果你有一些库和表经常要用,可以把他放在FlinkSqlEnv,写flink sql作业时选择对应的环境即可完成对应表的初始化

我这里创建了一个11的FlinkSqlEnv ,里面就放了一个源表的初始化SQL

然后dinky作业还支持,scala,java,python作业,这三个主要是给我们用来写UDF和UDTF时使用的
2.Dinky源码作业类型划分和提交作业源码深入讲解
上一节我们大致看了一下提交作业的源码,这一节我们详细看下,我们从TaskServiceImpl#submitTask方法开始看起
@Override
public JobResult submitTask(Integer id) {
// 通过taskId获取task信息
Task task = this.getTaskInfoById(id);
Asserts.checkNull(task, Tips.TASK_NOT_EXIST);
// 判断task的Dialect是否是flink sql 不是的话调用executeCommonSql方法执行sql
if (Dialect.notFlinkSql(task.getDialect())) {
return executeCommonSql(SqlDTO.build(task.getStatement(),
task.getDatabaseId(), null));
}
ProcessEntity process = null;
// 获取进程实例 ProcessEntity
if (StpUtil.isLogin()) {
process = ProcessContextHolder.registerProcess(
ProcessEntity.init(ProcessType.FLINKSUBMIT, StpUtil.getLoginIdAsInt()));
} else {
process = ProcessEntity.NULL_PROCESS;
}
process.info("Initializing Flink job config...");
JobConfig config = buildJobConfig(task);
// 如果GatewayType是k8s appplication,加载容器
if (GatewayType.KUBERNETES_APPLICATION.equalsValue(config.getType())) {
loadDocker(id, config.getClusterConfigurationId(), config.getGatewayConfig());
}
// 创建jobmanager
JobManager jobManager = JobManager.build(config);
process.start();
// 判断配置的是否是jarTask,是的话调用jobManager.executeJar(),不是的话调用jobManager.executeSql
if (!config.isJarTask()) {
JobResult jobResult = jobManager.executeSql(task.getStatement());
process.finish("Submit Flink SQL finished.");
return jobResult;
} else {
JobResult jobResult = jobManager.executeJar();
process.finish("Submit Flink Jar finished.");
return jobResult;
}
}
我们来看第一小节代码:根据taskId获取task信息,然后根据task的dialect判断是否是flink sql,不是的话执行executeCommonSql方法,所以dinky sql作业只会分类两类flink sql和common sql,hive,starrocks,clickhouse都属于common sql 了

然后我们看看他是怎么通过taskId获取到task信息的
@Override
public Task getTaskInfoById(Integer id) {
//获取task
Task task = this.getById(id);
if (task != null) {
task.parseConfig();
//获取statement,statement中存储的我们写的sql
Statement statement = statementService.getById(id);
if (task.getClusterId() != null) {
Cluster cluster = clusterService.getById(task.getClusterId());
if (cluster != null) {
task.setClusterName(cluster.getAlias());
}
}
if (statement != null) {
task.setStatement(statement.getStatement());
}
JobInstance jobInstance = jobInstanceService.getJobInstanceByTaskId(id);
if (Asserts.isNotNull(jobInstance) && !JobStatus.isDone(jobInstance.getStatus())) {
task.setJobInstanceId(jobInstance.getId());
} else {
task.setJobInstanceId(0);
}
}
return task;
}
getById方法直接调用的baomidou的源码方法,他会去对应的表dlink_task中查找信息
可以看到dlink_task表的dialect字段就是我们再前端页面看到的那些作业类型
然后statementService.getId()就是从dlink_task_statement中查找对应的statement信息

可以看到statement字段中存放是我们写的sql,现在dialect和sql我们都获取到了,就可以根据dialect筛选出非flink sql的sql作业然后调用对应方法执行
private JobResult executeCommonSql(SqlDTO sqlDTO) {
JobResult result = new JobResult();
result.setStatement(sqlDTO.getStatement());
result.setStartTime(LocalDateTime.now());
if (Asserts.isNull(sqlDTO.getDatabaseId())) {
result.setSuccess(false);
result.setError("请指定数据源");
result.setEndTime(LocalDateTime.now());
return result;
} else {
//todo 获取数据库
DataBase dataBase = dataBaseService.getById(sqlDTO.getDatabaseId());
if (Asserts.isNull(dataBase)) {
result.setSuccess(false);
result.setError("数据源不存在");
result.setEndTime(LocalDateTime.now());
return result;
}
//todo 获取驱动
Driver driver = Driver.build(dataBase.getDriverConfig());
//todo 执行sql
JdbcSelectResult selectResult = driver.executeSql(sqlDTO.getStatement(), sqlDTO.getMaxRowNum());
driver.close();
result.setResult(selectResult);
if (selectResult.isSuccess()) {
result.setSuccess(true);
} else {
result.setSuccess(false);
result.setError(selectResult.getError());
}
result.setEndTime(LocalDateTime.now());
return result;
}
}
这个方法中很简单就三步:
1.获取数据库
2.获取驱动
3.执行sql
但是executeSql有三个实现类,大家可以下去自己看看

下面我们就看第二部分Flink SQL和JAR的执行

先看Flink SQL吧,这里前面已经讲过一部分,不再细述,只讲没讲过的
public JobResult executeSql(String statement) {
initClassLoader(config);
// 获取进程实例 ProcessEntity
ProcessEntity process = ProcessContextHolder.getProcess();
// 初始化job
Job job = Job.init(runMode, config, executorSetting, executor, statement, useGateway);
if (!useGateway) {
job.setJobManagerAddress(environmentSetting.getAddress());
}
JobContextHolder.setJob(job);
ready();
String currentSql = "";
// 根据sql类型将sql分类并放进不同的list,封装成JobParam
JobParam jobParam = Explainer.build(executor, useStatementSet, sqlSeparator)
.pretreatStatements(SqlUtil.getStatements(statement, sqlSeparator));
try {
// 初始化UDF
initUDF(jobParam.getUdfList(), runMode, config.getTaskId());
// 执行DDL
for (StatementParam item : jobParam.getDdl()) {
currentSql = item.getValue();
executor.executeSql(item.getValue());
}
// insert语句的list集合大于0
if (jobParam.getTrans().size() > 0) {
// Use statement set or gateway only submit inserts.
// 使用statementSet 和gateWay
if (useStatementSet && useGateway) {
List<String> inserts = new ArrayList<>();
for (StatementParam item : jobParam.getTrans()) {
inserts.add(item.getValue());
}
// Use statement set need to merge all insert sql into a sql.
currentSql = String.join(sqlSeparator, inserts);
// 利用gateWay的方式提交sql
GatewayResult gatewayResult = submitByGateway(inserts);
// Use statement set only has one jid.
job.setResult(InsertResult.success(gatewayResult.getAppId()));
job.setJobId(gatewayResult.getAppId());
job.setJids(gatewayResult.getJids());
job.setJobManagerAddress(formatAddress(gatewayResult.getWebURL()));
if (gatewayResult.isSucess()) {
job.setStatus(Job.JobStatus.SUCCESS);
} else {
job.setStatus(Job.JobStatus.FAILED);
job.setError(gatewayResult.getError());
}
// 使用statementSet 和不使用gateWay
} else if (useStatementSet && !useGateway) {
List<String> inserts = new ArrayList<>();
for (StatementParam item : jobParam.getTrans()) {
if (item.getType().isInsert()) {
inserts.add(item.getValue());
}
}
if (inserts.size() > 0) {
currentSql = String.join(sqlSeparator, inserts);
// Remote mode can get the table result.
// 调用executor.executeStatementSet提交statementSet
TableResult tableResult = executor.executeStatementSet(inserts);
if (tableResult.getJobClient().isPresent()) {
job.setJobId(tableResult.getJobClient().get().getJobID().toHexString());
job.setJids(new ArrayList<String>() {
{
add(job.getJobId());
}
});
}
if (config.isUseResult()) {
// Build insert result.
IResult result = ResultBuilder
.build(SqlType.INSERT, config.getMaxRowNum(), config.isUseChangeLog(),
config.isUseAutoCancel(), executor.getTimeZone())
.getResult(tableResult);
job.setResult(result);
}
}
// 使用Gateway 和不使用StatementSet
} else if (!useStatementSet && useGateway) {
List<String> inserts = new ArrayList<>();
for (StatementParam item : jobParam.getTrans()) {
inserts.add(item.getValue());
// Only can submit the first of insert sql, when not use statement set.
break;
}
currentSql = String.join(sqlSeparator, inserts);
// 使用submitByGateway方法提交sql
GatewayResult gatewayResult = submitByGateway(inserts);
job.setResult(InsertResult.success(gatewayResult.getAppId()));
job.setJobId(gatewayResult.getAppId());
job.setJids(gatewayResult.getJids());
job.setJobManagerAddress(formatAddress(gatewayResult.getWebURL()));
if (gatewayResult.isSucess()) {
job.setStatus(Job.JobStatus.SUCCESS);
} else {
job.setStatus(Job.JobStatus.FAILED);
job.setError(gatewayResult.getError());
}
} else {
// 其他情况使用FlinkInterceptor提交sql
for (StatementParam item : jobParam.getTrans()) {
currentSql = item.getValue();
FlinkInterceptorResult flinkInterceptorResult = FlinkInterceptor.build(executor,
item.getValue());
if (Asserts.isNotNull(flinkInterceptorResult.getTableResult())) {
if (config.isUseResult()) {
IResult result = ResultBuilder
.build(item.getType(), config.getMaxRowNum(), config.isUseChangeLog(),
config.isUseAutoCancel(), executor.getTimeZone())
.getResult(flinkInterceptorResult.getTableResult());
job.setResult(result);
}
} else {
if (!flinkInterceptorResult.isNoExecute()) {
TableResult tableResult = executor.executeSql(item.getValue());
if (tableResult.getJobClient().isPresent()) {
job.setJobId(tableResult.getJobClient().get().getJobID().toHexString());
job.setJids(new ArrayList<String>() {
{
add(job.getJobId());
}
});
}
if (config.isUseResult()) {
IResult result = ResultBuilder.build(item.getType(), config.getMaxRowNum(),
config.isUseChangeLog(), config.isUseAutoCancel(),
executor.getTimeZone()).getResult(tableResult);
job.setResult(result);
}
}
}
// Only can submit the first of insert sql, when not use statement set.
break;
}
}
}
if (jobParam.getExecute().size() > 0) {
if (useGateway) {
for (StatementParam item : jobParam.getExecute()) {
executor.executeSql(item.getValue());
if (!useStatementSet) {
break;
}
}
GatewayResult gatewayResult = null;
config.addGatewayConfig(executor.getSetConfig());
if (runMode.isApplicationMode()) {
gatewayResult = Gateway.build(config.getGatewayConfig()).submitJar();
} else {
StreamGraph streamGraph = executor.getStreamGraph();
streamGraph.setJobName(config.getJobName());
JobGraph jobGraph = streamGraph.getJobGraph();
if (Asserts.isNotNullString(config.getSavePointPath())) {
jobGraph.setSavepointRestoreSettings(
SavepointRestoreSettings.forPath(config.getSavePointPath(), true));
}
gatewayResult = Gateway.build(config.getGatewayConfig()).submitJobGraph(jobGraph);
}
job.setResult(InsertResult.success(gatewayResult.getAppId()));
job.setJobId(gatewayResult.getAppId());
job.setJids(gatewayResult.getJids());
job.setJobManagerAddress(formatAddress(gatewayResult.getWebURL()));
if (gatewayResult.isSucess()) {
job.setStatus(Job.JobStatus.SUCCESS);
} else {
job.setStatus(Job.JobStatus.FAILED);
job.setError(gatewayResult.getError());
}
} else {
for (StatementParam item : jobParam.getExecute()) {
executor.executeSql(item.getValue());
if (!useStatementSet) {
break;
}
}
JobClient jobClient = executor.executeAsync(config.getJobName());
if (Asserts.isNotNull(jobClient)) {
job.setJobId(jobClient.getJobID().toHexString());
job.setJids(new ArrayList<String>() {
{
add(job.getJobId());
}
});
}
if (config.isUseResult()) {
IResult result = ResultBuilder
.build(SqlType.EXECUTE, config.getMaxRowNum(), config.isUseChangeLog(),
config.isUseAutoCancel(), executor.getTimeZone())
.getResult(null);
job.setResult(result);
}
}
}
job.setEndTime(LocalDateTime.now());
if (job.isFailed()) {
failed();
} else {
job.setStatus(Job.JobStatus.SUCCESS);
success();
}
} catch (Exception e) {
String error = LogUtil.getError("Exception in executing FlinkSQL:\n" + currentSql, e);
job.setEndTime(LocalDateTime.now());
job.setStatus(Job.JobStatus.FAILED);
job.setError(error);
process.error(error);
failed();
} finally {
close();
}
return job.getJobResult();
}
我们看下这里,这里将sql根据类型分类,那是如何分类,分为几类的,我们再详细看看
这里前面创建了四个list:ddl,trans,execute,statementList ,下面我们看看他们有啥区别
public JobParam pretreatStatements(String[] statements) {
List<StatementParam> ddl = new ArrayList<>();
List<StatementParam> trans = new ArrayList<>();
List<StatementParam> execute = new ArrayList<>();
List<String> statementList = new ArrayList<>();
List<UDF> udfList = new ArrayList<>();
for (String item : statements) {
String statement = executor.pretreatStatement(item);
if (statement.isEmpty()) {
continue;
}
SqlType operationType = Operations.getOperationType(statement);
if (operationType.equals(SqlType.ADD)) {
AddJarSqlParser.getAllFilePath(statement).forEach(JarPathContextHolder::addOtherPlugins);
DinkyClassLoaderContextHolder.get()
.addURL(URLUtils.getURLs(JarPathContextHolder.getOtherPluginsFiles()));
} else if (operationType.equals(SqlType.INSERT)
|| operationType.equals(SqlType.SELECT)
|| operationType.equals(SqlType.WITH)
|| operationType.equals(SqlType.SHOW)
|| operationType.equals(SqlType.DESCRIBE)
|| operationType.equals(SqlType.DESC)) {
trans.add(new StatementParam(statement, operationType));
statementList.add(statement);
if (!useStatementSet) {
break;
}
} else if (operationType.equals(SqlType.EXECUTE)) {
execute.add(new StatementParam(statement, operationType));
} else {
UDF udf = UDFUtil.toUDF(statement);
if (Asserts.isNotNull(udf)) {
udfList.add(UDFUtil.toUDF(statement));
}
ddl.add(new StatementParam(statement, operationType));
statementList.add(statement);
}
}
return new JobParam(statementList, ddl, trans, execute, CollUtil.removeNull(udfList));
}
代码里获取到statement后,然后循环遍历,根据SqlType类型不同,进行不同操作,sqlType是个枚举类型,他有很多类型


1.SqlType.ADD可以理解为add jar之类的sql
2.他把insert,select,with,show,describe,desc都加入到trans这个list
3.execute类型加入到execute list中
4.然后判断是否是udf,加入udflist,ddllist,statementlist
后面会根据是否使用gateway和statementSet调用submitByGateway(inserts)方法,这里的inserts就是刚才的trans list

submitByGateway会根据是否是application mode,调用submitJar或者submitJobGraph方法提交sql

最后一个提交jar包直接调用submitJar提交对应的jar包
3.总结
dinky源码将作业分类三类,common sql, flink sql 和flink jar然后调用不同方法进行提交
本文详细介绍了Dinky平台的重要功能模块,包括数据开发、运维中心和元数据中心等,重点剖析了FlinkSqlEnv的使用以及作业的分类(FlinkSQL、commonSQL和jar作业)。源码层面展示了如何根据作业类型执行不同操作,如执行FlinkSQL和JAR作业的流程。
400

被折叠的 条评论
为什么被折叠?



