文章目录
一、DS的基本介绍
Apache DolphinScheduler是一个分布式、去中心化、易扩展的可视化DAG工作流任务调度系统 | 框架 | 组件
官网地址: https://dolphinscheduler.apache.org/zh-cn/
- 调度的作用:
- 当项目中存在多个多种定时任务调度需求时,调度工具实现多种任务的编排、计划、周期执行。
- 对于画像项目, 主要调度的任务, 数据从hive导入es, 执行标签更新的代码
二、DS的基本使用
-
租户
-
租户对应linux的用户,ds执行文件时,
需要让租户和文件的拥有者用户保持一致
,如果不一致,ds无法执行对应的脚本文件 -
就是执行脚本的用户
- 脚本就是linux上的执行文件,linux上文件是有用户权限,如果要取执行调用文件就需要指定执行文件的用户
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-D4t5Om7X-1690189158053)(assets/image-20220719101507962-1678849814440-5.png)]
-
租户要创建为linux上的用户
-
需要注意, 租户编码和租户名称一定是linux 用户名, 要调度的任务想使用哪个linux用户运行, 就用这个用户名来创建租户编码/租户名称
-
-
用户
- 登录ds的账户
- 有个默认用户 admin
- 操作的时候, 把多余的用户删除, 给admin用户添加root租户
-
告警组管理
- 如果配置了告警服务(邮箱告警),需要用到邮箱服务器。当前没有配置
- 作用:当执行任务是发生了错误会通知告警组中的用户,给这些用右键
-
worker组管理
- 将多个worker服务划分成多个组,在执行任务时,可以指定交个哪个组内的worker执行
- 在界面无法指定创建worker分组,需要在配置文件中指定
三、DS的安装
3.1、官网下载DS安装包,上传至目标服务器
3.2、解压文件到安装目录
cd /root/insurance/4_sofrware
tar zxf apache-dolphinscheduler-incubating-1.3.5-dolphinscheduler-bin.tar.gz -C /export/server/
cd /export/server
3.3、将ds的解压位置配置到环境变量etc/profile
export DOLPHINSCHEDULER_HOME=/export/server/dolphinscheduler
记得刷新source /etc/profile
3.4、mysql中创建ds的存储库
任务信息需要存储到mysql中
- 登录mysql
mysql -uroot -p123456
3.5 在mysql界面设置数据库
CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'root'@'%' IDENTIFIED BY '123456';
flush privileges;
3.6. 修改ds的数据库配置信息
vim /export/server/dolphinscheduler/conf/org/datasource.properties
spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://192.168.88.100:3306/dolphinscheduler?characterEncoding=UTF-8&allowMultiQueries=true
spring.datasource.username=root
spring.datasource.password=123456
- ds要连接mysql需要用到mysql驱动
- 拷贝mysql连接驱动
cp /export/server/hive/lib/mysql-connector-java-5.1.32.jar /export/server/dolphinscheduler/lib/
- 数据库信息初始化
cd /export/server/dolphinscheduler/script
sh create-dolphinscheduler.sh
3.7 ds的配置文件修改(环境变量配置文件)
vim /export/server/dolphinscheduler/conf/env/dolphinscheduler_env.sh
export HADOOP_HOME=/export/server/hadoop
export HADOOP_CONF_DIR=/export/server/hadoop/etc/hadoop
export SPARK_HOME=/export/server/spark
export PYTHON_HOME=/root/anaconda3/envs/pyspark_env
export JAVA_HOME=/export/server/jdk1.8.0_241
export HIVE_HOME=/export/server/hive
export PATH=$HADOOP_HOME/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
vim /export/server/dolphinscheduler/conf/config/install_config.conf
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# NOTICE : If the following config has special characters in the variable `.*[]^${}\+?|()@#&`, Please escape, for example, `[` escape to `\[`
# postgresql or mysql
# 指定ds存储数据使用数据库为mysql
dbtype="mysql"
# db config
# db address and port
# 指定数据库ip地址
dbhost="192.168.88.100:3306"
# db username
# 指定数据库用户
username="root"
# database name
# 指定数据库名字
dbname="dolphinscheduler"
# db passwprd
# NOTICE: if there are special characters, please use the \ to escape, for example, `[` escape to `\[`
# 指定数据库密码
password="123456"
# zk cluster
# 指定zk的服务地址
zkQuorum="192.168.88.100:2181,192.168.88.101:2181,192.168.88.102:2181"
# Note: the target installation path for dolphinscheduler, please not config as the same as the current path (pwd)
# 指定集群安装目录
installPath="/export/server/dolphinscheduler_install"
# deployment user
# Note: the deployment user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled, the root directory needs to be created by itself
# 指定连接其他服务器的用户
deployUser="root"
# alert config 告警服务配置
# mail server host 指定邮箱,需要用到邮箱服务器,没有邮箱服务可以不用配置
# mailServerHost="smtp.exmail.qq.com"
# mail server port
# note: Different protocols and encryption methods correspond to different ports, when SSL/TLS is enabled, make sure the port is correct.
# mailServerPort="25"
# sender
# mailSender="xxxxxxxxxx"
# user
# mailUser="xxxxxxxxxx"
# sender password
# note: The mail.passwd is email service authorization code, not the email login password.
# mailPassword="xxxxxxxxxx"
# TLS mail protocol support
# starttlsEnable="true"
# SSL mail protocol support
# only one of TLS and SSL can be in the true state.
# sslEnable="false"
#note: sslTrust is the same as mailServerHost
# sslTrust="smtp.exmail.qq.com"
# resource storage type:HDFS,S3,NONE
# 指定文件资源存储服务 脚本文件需要保存在hdfs上
resourceStorageType="HDFS"
# if resourceStorageType is HDFS,defaultFS write namenode address,HA you need to put core-site.xml and hdfs-site.xml in the conf directory.
# if S3,write S3 address,HA,for example :s3a://dolphinscheduler,
# Note,s3 be sure to create the root directory /dolphinscheduler
# 指定存储服务的地址
defaultFS="hdfs://node1:8020"
# if resourceStorageType is S3, the following three configuration is required, otherwise please ignore
# s3Endpoint="http://192.168.xx.xx:9010"
# s3AccessKey="xxxxxxxxxx"
# s3SecretKey="xxxxxxxxxx"
# if resourcemanager HA enable, please type the HA ips ; if resourcemanager is single, make this value empty
# 指定yarn集群服务,在yarn的高可用场景下使用 会有多个rm,多个rm运行不同服务器上
# yarnHaIps="192.168.xx.xx,192.168.xx.xx"
# if resourcemanager HA enable or not use resourcemanager, please skip this value setting; If resourcemanager is single, you only need to replace yarnIp1 to actual resourcemanager hostname.
# 指定yarn的ip地址 非高可用
singleYarnIp="node1"
# resource store on HDFS/S3 path, resource file will store to this hadoop hdfs path, self configuration, please make sure the directory exists on hdfs and have read write permissions。/dolphinscheduler is recommended
# 指定资源的存储目录名 文件在hdfs上存储的目录
resourceUploadPath="/dolphinscheduler"
# who have permissions to create directory under HDFS/S3 root path
# Note: if kerberos is enabled, please config hdfsRootUser=
# hdfsRootUser="hdfs"
# kerberos config
# whether kerberos starts, if kerberos starts, following four items need to config, otherwise please ignore
# 指定登录认证服务
kerberosStartUp="false"
# kdc krb5 config file path
krb5ConfPath="$installPath/conf/krb5.conf"
# keytab username
keytabUserName="hdfs-mycluster@ESZ.COM"
# username keytab path
keytabPath="$installPath/conf/hdfs.headless.keytab"
# api server port
# 指定api服务绑定的端口
apiServerPort="12345"
# install hosts
# Note: install the scheduled hostname list. If it is pseudo-distributed, just write a pseudo-distributed hostname
# 指定集群安装的服务器 ds需要安装在哪些服务器上,会对应服务上生成安装目录,是installPath指定的
ips="node1,node2,node3"
# ssh port, default 22
# Note: if ssh port is not default, modify here
# 指定ssh连接端口
sshPort="22"
# run master machine
# Note: list of hosts hostname for deploying master
# 指定master服务运行的服务器
masters="node1,node2"
# run worker machine
# note: need to write the worker group name of each worker, the default value is "default"
# workers="ds1:default,ds2:default,ds3:default,ds4:default,ds5:default"
# 指定worker服务运行的服务器
workers="node1,node2,node3"
# run alert machine
# note: list of machine hostnames for deploying alert server
# 指定告警服务运行的服务器
alertServer="node3"
# run api machine
# note: list of machine hostnames for deploying api server
# 指定api服务运行的服务器
apiServers="node1"
vim /export/server/dolphinscheduler/conf/common.properties
fs.defaultFS=hdfs://node1:8020
yarn.application.status.address=http://node1:8088/ws/v1/cluster/apps
3.8. 服务启动
3.8.1 先启动zookeeper服务 (三台)和Hadoop服务
启动zookeeper服务
- /export/server/zookeeper/bin/zkServer.sh start
# 启动node1
/export/server/zookeeper/bin/zkServer.sh start
# 在node1远程n启动node2和node3
ssh node2 /export/server/zookeeper/bin/zkServer.sh start
ssh node3 /export/server/zookeeper/bin/zkServer.sh start
启动hadoop服务
start-all.sh
3.8.2 首次启动ds服务
首次启动也包括集群服务的安装部署
1、现将配置好的安装同步到集群中的其他服务器上
2、然后再启动服务
当首次启动成功后,后续在启动ds服务就不再使用该指令
cd /export/server/dolphinscheduler
sh install.sh
3.8.3 访问服务
http://192.168.88.100:12345/dolphinscheduler
总结
DS是一个分布式、去中心化、易扩展的可视化DAG工作流任务调度系统,用于定时任务的调度,拥有一个可视化操作平台,简化了操作,只需进行简单的拖拽点击即可创建工作流。