调度工具之dolphinscheduler篇


前言

随着开发程序的增多,任务调度以及任务之间的依赖关系管理就成为一个比较头疼的问题,随时少量的任务可以用linux系统自带的crontab加以定时进行,但缺点也很明细,不够直观,以及修改起来比较麻烦,容易出错,这时候就需要调度工具来帮忙,不知道大家都接触过哪些调度工具,我这边接触过airflow、oozie、 Kyligence,但今天我想推荐的调度工具是dolphinscheduler,下面就从安装部署来简单介绍下该工具。


一、dolphinscheduler是什么?

dolphinscheduler是一个国产的调度工具,非常符合国人的使用习惯,支持的调度任务类型也是非常之多,包括常见的spark、flink、sql、shell、python、datax、sqoop、seatunel、dinky等,可以说是相对比较全面,另外除了任务调度,还具有资源管理,多租户等功能,对于一般的中小型企业来说,这些功能足够用。

二、安装部署

1.环境准备

由于dolphinscheduler元数据注册在zookeeper中,所以部署dolphinscheduler前需安装zookeeper,具体安装步骤在我之前发表的文章中有讲解,可以去翻看下,另外,安装环境也是需要安装jdk的,具体安装步骤这里就不再赘述了,可以看下我之前发表的文章。

2.下载安装包

登录dolphinscher安装包下载地址https://dlcdn.apache.org/dolphinscheduler/,选择一个版本,点击apache-dolphinscheduler-xxx-bin.tar.gz,进入下载页面,目前最新的版本是3.2.0,但笔者还是推荐3.1.8版本,所以今天的安装部署都是围绕3.1.8版本来介绍,
安装包下载后,执行以下命令解压并修改名称

tar -zxvf apache-dolphinscheduler-3.1.8-bin.tar.gz
mv apache-dolphinscheduler-3.1.8-bin dolphinscheduler-3.1.8

2.修改配置

进入解压后的文件到 dolphinscheduler-3.1.8/bin/env目录,vim dolphinscheduler_env.sh配置dolphinscheduler的数据源、zookeeper连接信息以及spark、flink、datax、seatunnel安装目录地址
提示:配置信息可根据自身环境不同而自行修改

export JAVA_HOME=${JAVA_HOME:-"/usr/java/jdk1.8.0_181-cloudera"}

# Database related configuration, set database type, username and password
export DATABASE=${DATABASE:-"mysql"}
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_URL=${SPRING_DATASOURCE_URL:-"jdbc:mysql://ds1:3306/dolphinscheduler?useSSL=false"}
export SPRING_DATASOURCE_USERNAME=${SPRING_DATASOURCE_USERNAME:-"root"}
export SPRING_DATASOURCE_PASSWORD=${SPRING_DATASOURCE_PASSWORD:-"*****"}

# DolphinScheduler server related configuration
export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none}
export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-"Asia/Shanghai"}
export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10}

# Registry center configuration, determines the type and link of the registry center
export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-ds1:2181,ds2:2181,ds3:2181}

# Tasks related configurations, need to change the configuration if you use the related tasks.
export HADOOP_HOME=${HADOOP_HOME:-"/application/hadoop"}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/application/hadoop/etc/hadoop"}
export SPARK_HOME2=${SPARK_HOME2:-"/application/spark"}
export PYTHON_HOME=${PYTHON_HOME:-"/usr/bin/python"}
export HIVE_HOME=${HIVE_HOME:-"/application/hive"}
export FLINK_HOME=${FLINK_HOME:-"/application/flink"}
export DATAX_HOME=${DATAX_HOME:-"/opt/soft/datax"}
export SEATUNNEL_HOME=${SEATUNNEL_HOME:-"/application/seatunnel"}
export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun}

export PATH=$HADOOP_HOME/bin:$SPARK_HOME/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$SEATUNNEL_HOME/bin:$CHUNJUN_HOME/bin:$PATH

vim install_env.sh编辑dolphinscheduler的master、worker、apiserver、alterserver、服务器上安装的路径以及部署的用户名和zookeeper的注册路径

ips=${ips:-"ds1,ds2,ds3"}

# Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine
# modify it if you use different ssh port
sshPort=${sshPort:-"22"}

# A comma separated list of machine hostname or IP would be installed Master server, it
# must be a subset of configuration `ips`.
# Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2"
masters=${masters:-"ds1,ds2,ds3"}

# A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a
# subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts
# Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default"
workers=${workers:-"ds1:default,ds2:default,ds3:default"}

# A comma separated list of machine hostname or IP would be installed Alert server, it
# must be a subset of configuration `ips`.
# Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3"
alertServer=${alertServer:-"ds3"}

# A comma separated list of machine hostname or IP would be installed API server, it
# must be a subset of configuration `ips`.
# Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1"
apiServers=${apiServers:-"ds2"}

# The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists.
# Do not set this configuration same as the current path (pwd). Do not add quotes to it if you using related path.
installPath=${installPath:-"/application/dolphinscheduler"}

# The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh`
# script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs
# to be created by this user
deployUser=${deployUser:-"root"}

# The root of zookeeper, for now DolphinScheduler default registry server is zookeeper.
zkRoot=${zkRoot:-"/dolphinscheduler"}

进入解压后的文件目录dolphinscheduler-3.1.8/api-server/conf,vim common.properties编辑资源配置路径,dolphinscheduler-3.1.8/worker-server/conf目录下的common.properties也需要配置
提示:此处是配置文件或jar包上传的资源中心,需要注意的几个地方分别是data.basedir.path、resource.storage.type、resource.storage.upload.base.path、resource.hdfs.root.user、resource.hdfs.fs.defaultFS其他配置信息可根据需要自行配置或者抱持默认

data.basedir.path=/application/data

# resource view suffixs
#resource.view.suffixs=txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js

# resource storage type: HDFS, S3, OSS, NONE
resource.storage.type=HDFS
# resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
resource.storage.upload.base.path=/dolphinscheduler

# The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.access.key.id=minioadmin
# The AWS secret access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.secret.access.key=minioadmin
# The AWS Region to use. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.region=cn-north-1
# The name of the bucket. You need to create them by yourself. Otherwise, the system cannot start. All buckets in Amazon S3 share a single namespace; ensure the bucket is given a unique name.
resource.aws.s3.bucket.name=dolphinscheduler
# You need to set this parameter when private cloud s3. If S3 uses public cloud, you only need to set resource.aws.region or set to the endpoint of a public cloud such as S3.cn-north-1.amazonaws.com.cn
resource.aws.s3.endpoint=http://localhost:9000

# alibaba cloud access key id, required if you set resource.storage.type=OSS
resource.alibaba.cloud.access.key.id=<your-access-key-id>
# alibaba cloud access key secret, required if you set resource.storage.type=OSS
resource.alibaba.cloud.access.key.secret=<your-access-key-secret>
# alibaba cloud region, required if you set resource.storage.type=OSS
resource.alibaba.cloud.region=cn-hangzhou
# oss bucket name, required if you set resource.storage.type=OSS
resource.alibaba.cloud.oss.bucket.name=dolphinscheduler
# oss bucket endpoint, required if you set resource.storage.type=OSS
resource.alibaba.cloud.oss.endpoint=https://oss-cn-hangzhou.aliyuncs.com
# if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path
resource.hdfs.root.user=root
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
resource.hdfs.fs.defaultFS=hdfs://ds1:8020

# whether to startup kerberos
hadoop.security.authentication.startup.state=false

# java.security.krb5.conf path
java.security.krb5.conf.path=/opt/krb5.conf

# login user from keytab username
login.user.keytab.username=hdfs-mycluster@ESZ.COM

# login user from keytab path
login.user.keytab.path=/opt/hdfs.headless.keytab

# kerberos expire time, the unit is hour
kerberos.expire.time=2


# resourcemanager port, the default value is 8088 if not specified
resource.manager.httpaddress.port=8088
# if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty
yarn.resourcemanager.ha.rm.ids=ds1
# if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
yarn.application.status.address=http://ds1:%s/ws/v1/cluster/apps/%s
# job history status url when application number threshold is reached(default 10000, maybe it was set to 1000)
yarn.job.history.status.address=http://ds1:19888/ws/v1/history/mapreduce/jobs/%s
# datasource encryption enable
datasource.encryption.enable=false

# datasource encryption salt
datasource.encryption.salt=!@#$%^&*

# data quality option
data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar

#data-quality.error.output.path=/tmp/data-quality-error-data

# Network IP gets priority, default inner outer

# Whether hive SQL is executed in the same session
support.hive.oneSession=true

# use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions; if set false, executing user is the deploy user and doesn't need sudo permissions
sudo.enable=true
setTaskDirToTenant.enable=false

2.元数据初始化

由于我这边配置的元数据存储中心是mysql,所以首先需要将mysql驱动拷贝
dolphinscheduler每个模块的libs目录下,其中包括api-server/libs、alert-server/libs、master-server/libs、worker-server/libs和tools/libs;
在mysql数据库中需要先创建dolphinscheduler数据库,如果需要指定用户,需要为该用户赋权,相关命令如下
提示:mysql5和mysql8版本语法有差异,请根据自身版本做修改,下面的例子是mysql8版本

CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
CREATE USER '{user}'@'%' IDENTIFIED BY '{password}';
GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'%';
CREATE USER '{user}'@'localhost' IDENTIFIED BY '{password}';
GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'localhost';
FLUSH PRIVILEGES;

进入到dolphinscheduler-3.1.8目录下,执行bash tools/bin/upgrade-schema.sh命令进行初始化,此时注意是否有错误信息,初始化成功后,可执行bin/install.sh命令,dolphinscheduler即可自行安装到配置文件里的安装路径,并将安装服务复制到定义的worker-server节点,接着输入以下地址看看是否能够登录http://ds2:12345/dolphinscheduler/ui(此处的ds2是配置文件中定义的apiserver),当看到以下界面是证明启动成功,初始账号密码为admin/dolphinscheduler123
在这里插入图片描述
进入系统后,首先需要创建项目
在这里插入图片描述
创建项目后,点击项目名称,即可进入到工作流定义界面
在这里插入图片描述
点击工作流定义,创建工作流,左侧列表中有拖拽自己任务的类型,这里以shell任务为例,输入节点名称以及脚本命令,点击保存
在这里插入图片描述
保存完成后,会弹出定义工作流的名称、租户以及执行策略等,点击确定后,该工作流定义完成
在这里插入图片描述
工作流右侧的按钮分别是编辑、运行、定时、上线、复制、定时管理、工作流树形图、导出、版本信息,需先点击上线后,才能运行该程序
在这里插入图片描述
点击运行时,弹出提示框,有通知策略、流程优先级、分组、环境名称等信息可根据自身需求自行定义,点击确定以运行该工作流
在这里插入图片描述
可在工作流实例中查看工作流的运行情况
在这里插入图片描述
可在任务实例中,查看工作流里面的任务实例的日志信息
在这里插入图片描述
任务运行成功后,可通过工作流定义里面的定时功能,对该工作流定义一个自动运行的时间及频率,点击确定后,还需要点击工作流定义中的定时管理,对刚才定义的定时进行上线,此时该工作流的定时功能才算完成
在这里插入图片描述
在这里插入图片描述


总结

试用dolphinscheduler已经有一段时间了,从之前的2.7到现在的3.x版本,部署的方式有了些许的改变,之前的2.x版本,各个模块都是在一块的,到了3.0版本之后,api-server、work-server、master-server、alter-server都分开的,有了调度平台之后,编写的spark、flink任务部署起来就会直观很多,不用到服务器上逐个任务排查了,由于篇幅有限,其中的资源管理(可以上传脚本以及编写的程序jar包等)、数据源配置以及数据质量等功能就不一一展示了,具体的细节,大家可以下载安装部署,试试它的功能。

  • 24
    点赞
  • 30
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
dolphinscheduler是一个开源的分布式任务调度系统,它可以帮助用户实现定时任务的调度。可以使用Servlet3.0的WebServlet注解配合dolphinscheduler的Http组件,利用通配符的方式快速实现方法调用。这样,我们可以通过配置dolphinscheduler来控制定时任务的执行周期,并通过调用相应的方法来执行任务。同时,还可以使用多阶段随机规划的形式化框架来优化实时运营中的储运调度,以更好地满足多地区可再生能源生产不确定性的输电受限经济调度的要求。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* *3* [【基于Servlet和Dolphinscheduler的定时调度】](https://blog.csdn.net/gaojingsong/article/details/123051387)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v92^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] - *2* [DG不确定性下基于随机对偶动态规划(SDDP)的储能实时优化调度(附matlab代码)](https://download.csdn.net/download/weixin_44209907/88218415)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v92^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

数亦有道

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值