【无标题】【博学谷学习记录】超强总结，用心分享 | DolphinScheduler狂野大数据-学习分享

最新推荐文章于 2024-03-03 20:16:22 发布

泽月贝

最新推荐文章于 2024-03-03 20:16:22 发布

阅读量166

点赞数

文章标签：学习大数据

本文链接：https://blog.csdn.net/qq_41151516/article/details/131900174

版权

文章目录

一、DS的基本介绍
二、DS的基本使用
三、DS的安装
总结

一、DS的基本介绍

Apache DolphinScheduler是一个分布式、去中心化、易扩展的可视化DAG工作流任务调度系统 | 框架 | 组件

官网地址: https://dolphinscheduler.apache.org/zh-cn/

调度的作用：
- 当项目中存在多个多种定时任务调度需求时，调度工具实现多种任务的编排、计划、周期执行。
- 对于画像项目, 主要调度的任务, 数据从hive导入es, 执行标签更新的代码

二、DS的基本使用

租户
- 租户对应linux的用户，ds执行文件时，需要让租户和文件的拥有者用户保持一致，如果不一致，ds无法执行对应的脚本文件
- 就是执行脚本的用户
  - 脚本就是linux上的执行文件，linux上文件是有用户权限，如果要取执行调用文件就需要指定执行文件的用户
  [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-D4t5Om7X-1690189158053)(assets/image-20220719101507962-1678849814440-5.png)]
- 租户要创建为linux上的用户
- 需要注意, 租户编码和租户名称一定是linux 用户名, 要调度的任务想使用哪个linux用户运行, 就用这个用户名来创建租户编码/租户名称
用户
- 登录ds的账户
- 有个默认用户 admin
- 操作的时候, 把多余的用户删除, 给admin用户添加root租户
告警组管理
- 如果配置了告警服务（邮箱告警），需要用到邮箱服务器。当前没有配置
- 作用：当执行任务是发生了错误会通知告警组中的用户，给这些用右键
worker组管理
- 将多个worker服务划分成多个组，在执行任务时，可以指定交个哪个组内的worker执行
- 在界面无法指定创建worker分组，需要在配置文件中指定

三、DS的安装

3.1、官网下载DS安装包，上传至目标服务器

3.2、解压文件到安装目录

cd /root/insurance/4_sofrware
tar zxf apache-dolphinscheduler-incubating-1.3.5-dolphinscheduler-bin.tar.gz -C /export/server/

cd /export/server

3.3、将ds的解压位置配置到环境变量etc/profile

export DOLPHINSCHEDULER_HOME=/export/server/dolphinscheduler

记得刷新source /etc/profile

3.4、mysql中创建ds的存储库

任务信息需要存储到mysql中

登录mysql

mysql -uroot -p123456

3.5 在mysql界面设置数据库

CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;


GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'root'@'%' IDENTIFIED BY '123456';

flush privileges;

3.6. 修改ds的数据库配置信息

vim /export/server/dolphinscheduler/conf/org/datasource.properties

spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://192.168.88.100:3306/dolphinscheduler?characterEncoding=UTF-8&allowMultiQueries=true
spring.datasource.username=root
spring.datasource.password=123456

ds要连接mysql需要用到mysql驱动
- 拷贝mysql连接驱动

cp /export/server/hive/lib/mysql-connector-java-5.1.32.jar /export/server/dolphinscheduler/lib/

数据库信息初始化

cd /export/server/dolphinscheduler/script
sh create-dolphinscheduler.sh

3.7 ds的配置文件修改(环境变量配置文件)

vim /export/server/dolphinscheduler/conf/env/dolphinscheduler_env.sh

export HADOOP_HOME=/export/server/hadoop
export HADOOP_CONF_DIR=/export/server/hadoop/etc/hadoop
export SPARK_HOME=/export/server/spark
export PYTHON_HOME=/root/anaconda3/envs/pyspark_env
export JAVA_HOME=/export/server/jdk1.8.0_241
export HIVE_HOME=/export/server/hive

export PATH=$HADOOP_HOME/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH

vim /export/server/dolphinscheduler/conf/config/install_config.conf

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#


# NOTICE :  If the following config has special characters in the variable `.*[]^${}\+?|()@#&`, Please escape, for example, `[` escape to `\[`
# postgresql or mysql
# 指定ds存储数据使用数据库为mysql
dbtype="mysql"

# db config
# db address and port
# 指定数据库ip地址
dbhost="192.168.88.100:3306"

# db username
# 指定数据库用户
username="root"

# database name
# 指定数据库名字
dbname="dolphinscheduler"

# db passwprd
# NOTICE: if there are special characters, please use the \ to escape, for example, `[` escape to `\[`
# 指定数据库密码
password="123456"

# zk cluster
# 指定zk的服务地址
zkQuorum="192.168.88.100:2181,192.168.88.101:2181,192.168.88.102:2181"

# Note: the target installation path for dolphinscheduler, please not config as the same as the current path (pwd)
# 指定集群安装目录
installPath="/export/server/dolphinscheduler_install"

# deployment user
# Note: the deployment user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled, the root directory needs to be created by itself
# 指定连接其他服务器的用户
deployUser="root"


# alert config  告警服务配置
# mail server host 指定邮箱，需要用到邮箱服务器，没有邮箱服务可以不用配置
# mailServerHost="smtp.exmail.qq.com"

# mail server port
# note: Different protocols and encryption methods correspond to different ports, when SSL/TLS is enabled, make sure the port is correct.
# mailServerPort="25"

# sender
# mailSender="xxxxxxxxxx"

# user
# mailUser="xxxxxxxxxx"

# sender password
# note: The mail.passwd is email service authorization code, not the email login password.
# mailPassword="xxxxxxxxxx"

# TLS mail protocol support
# starttlsEnable="true"

# SSL mail protocol support
# only one of TLS and SSL can be in the true state.
# sslEnable="false"

#note: sslTrust is the same as mailServerHost
# sslTrust="smtp.exmail.qq.com"


# resource storage type：HDFS,S3,NONE
# 指定文件资源存储服务  脚本文件需要保存在hdfs上
resourceStorageType="HDFS"

# if resourceStorageType is HDFS，defaultFS write namenode address，HA you need to put core-site.xml and hdfs-site.xml in the conf directory.
# if S3，write S3 address，HA，for example ：s3a://dolphinscheduler，
# Note，s3 be sure to create the root directory /dolphinscheduler
# 指定存储服务的地址
defaultFS="hdfs://node1:8020"

# if resourceStorageType is S3, the following three configuration is required, otherwise please ignore
# s3Endpoint="http://192.168.xx.xx:9010"
# s3AccessKey="xxxxxxxxxx"
# s3SecretKey="xxxxxxxxxx"

# if resourcemanager HA enable, please type the HA ips ; if resourcemanager is single, make this value empty
# 指定yarn集群服务，在yarn的高可用场景下使用  会有多个rm，多个rm运行不同服务器上
# yarnHaIps="192.168.xx.xx,192.168.xx.xx"

# if resourcemanager HA enable or not use resourcemanager, please skip this value setting; If resourcemanager is single, you only need to replace yarnIp1 to actual resourcemanager hostname.
# 指定yarn的ip地址  非高可用
singleYarnIp="node1"

# resource store on HDFS/S3 path, resource file will store to this hadoop hdfs path, self configuration, please make sure the directory exists on hdfs and have read write permissions。/dolphinscheduler is recommended
# 指定资源的存储目录名  文件在hdfs上存储的目录
resourceUploadPath="/dolphinscheduler"

# who have permissions to create directory under HDFS/S3 root path
# Note: if kerberos is enabled, please config hdfsRootUser=
# hdfsRootUser="hdfs"

# kerberos config
# whether kerberos starts, if kerberos starts, following four items need to config, otherwise please ignore
# 指定登录认证服务
kerberosStartUp="false"
# kdc krb5 config file path
krb5ConfPath="$installPath/conf/krb5.conf"
# keytab username
keytabUserName="hdfs-mycluster@ESZ.COM"
# username keytab path
keytabPath="$installPath/conf/hdfs.headless.keytab"


# api server port
# 指定api服务绑定的端口
apiServerPort="12345"


# install hosts
# Note: install the scheduled hostname list. If it is pseudo-distributed, just write a pseudo-distributed hostname
# 指定集群安装的服务器   ds需要安装在哪些服务器上，会对应服务上生成安装目录，是installPath指定的
ips="node1,node2,node3"

# ssh port, default 22
# Note: if ssh port is not default, modify here
# 指定ssh连接端口
sshPort="22"

# run master machine
# Note: list of hosts hostname for deploying master
# 指定master服务运行的服务器
masters="node1,node2"

# run worker machine
# note: need to write the worker group name of each worker, the default value is "default"
# workers="ds1:default,ds2:default,ds3:default,ds4:default,ds5:default"
# 指定worker服务运行的服务器
workers="node1,node2,node3"

# run alert machine
# note: list of machine hostnames for deploying alert server
# 指定告警服务运行的服务器
alertServer="node3"

# run api machine
# note: list of machine hostnames for deploying api server
# 指定api服务运行的服务器
apiServers="node1"

vim /export/server/dolphinscheduler/conf/common.properties

fs.defaultFS=hdfs://node1:8020
yarn.application.status.address=http://node1:8088/ws/v1/cluster/apps

3.8. 服务启动

3.8.1 先启动zookeeper服务（三台）和Hadoop服务

启动zookeeper服务

/export/server/zookeeper/bin/zkServer.sh start

# 启动node1
/export/server/zookeeper/bin/zkServer.sh  start

# 在node1远程n启动node2和node3
ssh node2 /export/server/zookeeper/bin/zkServer.sh  start
ssh node3 /export/server/zookeeper/bin/zkServer.sh  start

启动hadoop服务

start-all.sh

3.8.2 首次启动ds服务

首次启动也包括集群服务的安装部署

1、现将配置好的安装同步到集群中的其他服务器上

2、然后再启动服务

当首次启动成功后，后续在启动ds服务就不再使用该指令

cd /export/server/dolphinscheduler
sh install.sh

3.8.3 访问服务

http://192.168.88.100:12345/dolphinscheduler

总结

DS是一个分布式、去中心化、易扩展的可视化DAG工作流任务调度系统，用于定时任务的调度，拥有一个可视化操作平台，简化了操作，只需进行简单的拖拽点击即可创建工作流。

泽月贝

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【无标题】【博学谷学习记录】超强总结，用心分享 | DolphinScheduler狂野大数据-学习分享

Apache DolphinScheduler是一个分布式、去中心化、易扩展的可视化DAG工作流任务调度系统 | 框架 | 组件官网地址:调度的作用：当项目中存在多个多种定时任务调度需求时，调度工具实现多种任务的编排、计划、周期执行。对于画像项目, 主要调度的任务, 数据从hive导入es, 执行标签更新的代码DS是一个分布式、去中心化、易扩展的可视化DAG工作流任务调度系统，用于定时任务的调度，拥有一个可视化操作平台，简化了操作，只需进行简单的拖拽点击即可创建工作流。
复制链接

扫一扫