dolphin上MySQL到hive、seatunnel任务创建

dolphin上MySQL到hive、seatunnel任务创建:

1.先获取所需数量的taskCode

Long taskCode = getClient().opsForProcess().generateTaskCode(projectCode, 1).get(0);

2.创建MySqlSource对象,result_table_name、url、user、password、query是必须的

public class MySqlSource extends Source {
  private String url;
  private String driver = "com.mysql.cj.jdbc.Driver";
  private String user;
  private String password;
  private Integer connection_check_timeout_sec = 100;
  private String query;
  private String partition_column;
  private Integer partition_num;

  /** @param result_table_name 结果临时表表名 */
  public MySqlSource(String result_table_name) {
    super(result_table_name);
  }
}

3.根据需要创建TransformParam,TransformParam中可添加多个Transform,Transform用于对数据源表处理产生结果表供后续transform或sink使用,transform的source_table_name为上一段流程的结果表,result_table_name为transform的结果表

​ CopyTransform:对数据源指定列复制产生额外列

​ FieldMapperTransform:字段映射

​ FilterRowKindTransform:过滤数据行类型,如INSERT、UPDATE_BEFORE等

​ FilterTransform:字段过滤

​ ReplaceTransform:对数据源的某一字段的值按匹配替换

​ SplitTransform:对数据源某一字段按分隔符拆分出新列

​ SQLTransform:对数据源表进行SQL操作,不支持复杂SQL,如join、聚合、like等操作

​ 支持的函数:SQL Functions | Apache SeaTunnel

4.创建HiveSink对象,四个参数都是必须的,dbName为数据注入表所在库库名,tableName为数据注入表的表名,metastoreUri为hive的metastoreserver的地址,source_table_name为前置流程处理的最终结果表表名

public class HiveSink extends Sink {
  private String tableName;
  private String dbName;
  /** 例如thrift://192.168.79.51:9083 */
  private String metastoreUri;

  /** @param source_table_name 数据源表名 */
  public HiveSink(String source_table_name) {
    super(source_table_name);
  }
}

5.按需创建SeaTunnelTaskEnvParam对象,jobMode默认batch模式,cdc时需要设置为streaming模式,parallelism任务并行数量,checkpointInterval检查点时间间隔,单位ms

public class SeaTunnelTaskEnvParam {
  /** 任务模式 只有 BATCH STREAMING 两种 {@link SeaTunnelJobModConst} */
  private String jobMode = SeaTunnelJobModConst.BATCH;

  private Integer parallelism = 2;
  /** 检查点时间间隔,单位ms */
  private Integer checkpointInterval = 5000;
}

6.使用SeaTunnelScriptGenerator.generateMysql2Hive(MySqlSource mySqlSource,
HiveSink hiveSink, SeaTunnelTaskEnvParam envParam, TransformParam transformParam)生成rawScript,envParam或transformParam为null时会生成默认的对应对象

7.SeaTunnelTaskGenerator.generateSeaTunnelTask(String rawScript, String startupScript, String deployMode, String jobId)生成SeaTunnelTask对象

* @param rawScript 需要执行的脚本配置
* @param startupScript
* @param deployMode local、cluster
* @param jobId 任务的唯一id,用于保存checkpoint以及任务中断后恢复任务使用
@Data
@Accessors(chain = true)
public class SeaTunnelTask extends AbstractTask {
  private String rawScript;
  /**
   * 启动脚本
   * {@link SeaTunnelStartupScriptConst}
   */
  private String startupScript = SeaTunnelStartupScriptConst.SEATUNNEL;
  private boolean useCustom = true;
  /**
   * seatunnel部署方式
   * {@link SeaTunnelDeployModeConst}
   */
  private String deployMode = SeaTunnelDeployModeConst.CLUSTER;
  private final List<String> localParams = new LinkedList<>();
  private String others;

  @Override
  public String getTaskType() {
    return "SEATUNNEL";
  }
}

8.生成默认TaskDefinition对象,或自定义TaskDefinition对象

TaskDefinitionUtils.createDefaultTaskDefinition(taskCode, seaTunnelTask)

9.创建工作流,创建成功则返回工作流信息ProcessDefineResp,失败抛出DolphinException

工作流创建示例:

submit(
    Long taskCode, TaskDefinition taskDefinition, String processName, String description) {
  ProcessDefineParam pcr = new ProcessDefineParam();
  pcr.setName(processName)
      .setLocations(TaskLocationUtils.verticalLocation(taskCode))
      .setDescription(description)
      .setTenantCode(tenantCode)
      .setTimeout("0")
      .setExecutionType(ProcessDefineParam.EXECUTION_TYPE_PARALLEL)
      .setTaskDefinitionJson(Collections.singletonList(taskDefinition))
      .setTaskRelationJson(TaskRelationUtils.oneLineRelation(taskCode))
      .setGlobalParams(null);

  ProcessDefineResp resp = getClient().opsForProcess().create(projectCode, pcr);
}

任务创建示例:

Long taskCode = getClient().opsForProcess().generateTaskCode(projectCode, 1).get(0);
MySqlSource mySqlSource = new MySqlSource("fake");
mySqlSource.setUrl("jdbc:mysql://192.168.79.100:3306/test?serverTimezone=GMT%2b8")
        .setUser("root")
        .setPassword("root")
        .setQuery("select * from test.test_table")
        .setConnection_check_timeout_sec(100);

HiveSink hiveSink = new HiveSink("fake");
hiveSink.setTableName("test158")
        .setDbName("test")
        .setMetastoreUri("thrift://192.168.79.51:9083");
String mysql2Hive = SeaTunnelScriptGenerator.generateMysql2Hive(mySqlSource, hiveSink, null, null);
SeaTunnelTask seaTunnelTask = SeaTunnelTaskGenerator.generateSeaTunnelTask(mysql2Hive, SeaTunnelStartupScriptConst.SEATUNNEL, SeaTunnelDeployModeConst.CLUSTER, "1145141919840911");

TaskDefinition taskDefinition =
        TaskDefinitionUtils.createDefaultTaskDefinition(taskCode, seaTunnelTask);
submit(taskCode, taskDefinition, "test-seatunnel-mysql2hive-task-dag1", "test-seatunnel-mysql2hive-task1");

Schedule示例:

testCreate() {
  ScheduleDefineParam scheduleDefineParam = new ScheduleDefineParam();
  scheduleDefineParam
      .setProcessDefinitionCode(WORKFLOW_CODE)
      .setSchedule(
          new ScheduleDefineParam.Schedule()
              .setStartTime("2022-09-18 00:00:00")
              .setEndTime("2023-09-20 00:00:00")
              .setCrontab("0 0 * * * ? *"));
  ScheduleInfoResp scheduleInfoResp =
      getClient().opsForSchedule().create(projectCode, scheduleDefineParam);
}
  • 24
    点赞
  • 22
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值