azkaban安装使用

一、安装

1. 准备工作

1.1 下载

下载软件源码:
https://github.com/azkaban/azkaban
或是
https://github.com/azkaban/azkaban/releases

下载插件源码
https://github.com/azkaban/azkaban-plugins

下载的.zip包,则安装环境中需要在root用户下安装unzip,用于解压,命令如下:yum -y install unzip

本文选择的软件包是azkaban-3.90.0.tar.gz,插件包是azkaban-plugins-3.0.0.zip(在本文安装过程中,发现插件包有问题,其实别用插件包为好,直接用源码包中的部分内容即可,插件包无需下载了)

root安装git:yum install -y git gcc-c++,不安装会导致之后源码包编译失败:

Failed to apply plugin [id 'com.cinnober.gradle.semver-git']

1.2 环境要求

有安装hadoop、hive、mysql等,本文的hadoop、hive等集成了kerberos(即为安全集群)

2. 配置

2.1 软件配置

解压tar -zxvf azkaban-3.83.0.tar.gz

### 进入到其解压的根目录(本文为`/usr/local/software/azkaban-3.90.0/`)
###
[rd@hadoop-server-002 /]$ cd /usr/local/software/azkaban-3.90.0/
### 执行命令如下命令进行编译:`./gradlew build installDist -x test`
###
[rd@hadoop-server-002 azkaban-3.90.0]$ ./gradlew build installDist -x test

每个组件的包在自己的目录里,这些目录里有build/distributions,构建结果就在这里,有tar包和zip包两种。

###  分别进/usr/local/software/azkaban-3.90.0/下的
###  azkaban-exec-server/build/distributions/、
###  azkaban-web-server/build/distributions/、
###  azkaban-db/build/distributions/中,
###  将其下的tar包拷贝到同一个目录(本文为/usr/local/apps/azkaban/)
###
cp /usr/local/software/azkaban-3.90.0/azkaban-exec-server/build/distributions/azkaban-exec-server-0.1.0-SNAPSHOT.tar.gz /usr/local/apps/azkaban/
cp /usr/local/software/azkaban-3.90.0/azkaban-web-server/build/distributions/azkaban-web-server-0.1.0-SNAPSHOT.tar.gz /usr/local/apps/azkaban/
cp /usr/local/software/azkaban-3.90.0/azkaban-db/build/distributions/azkaban-db-0.1.0-SNAPSHOT.tar.gz /usr/local/apps/azkaban/


###  之后分别解压拷贝过来的
###  azkaban-exec-server-0.1.0-SNAPSHOT.tar.gz、
###  azkaban-db-0.1.0-SNAPSHOT.tar.gz、
###  azkaban-web-server-0.1.0-SNAPSHOT.tar.gz
###
cd /usr/local/apps/azkaban/
tar -zxvf azkaban-exec-server-0.1.0-SNAPSHOT.tar.gz
tar -zxvf azkaban-db-0.1.0-SNAPSHOT.tar.gz
tar -zxvf azkaban-web-server-0.1.0-SNAPSHOT.tar.gz

### 并分别重命名###
mv azkaban-web-server-0.1.0-SNAPSHOT azkaban-web-server
mv azkaban-exec-server-0.1.0-SNAPSHOT azkaban-exec-server
mv azkaban-db-0.1.0-SNAPSHOT azkaban-db

另外注意:编译完的源码包别删,下面还有用

2.1.1 数据库准备

进入/usr/local/apps/azkaban/azkaban-db可以发现其中有一个create-all-sql-0.1.0-SNAPSHOT.sql的sql语句,如下:
在这里插入图片描述
进入mysql执行该语句:

#创建数据库azkaban(该数据库名随意,但之后修改azkaban的配置时需一致)
mysql> CREATE DATABASE azkaban;
mysql> use azkaban;
mysql> source /usr/local/apps/azkaban/azkaban-db/create-all-sql-0.1.0-SNAPSHOT.sql

之后查看mysql:
在这里插入图片描述

2.1.2 exec-server配置
2.1.2.1 exec的azkaban配置

进入/usr/local/apps/azkaban/azkaban-exec-server/conf,修改azkaban.properties

### 修改时区,
### 并注释如下部分,
### 该部分设置是web-server的个性化配置,exec-server用不到
###
# Azkaban Personalization Settings
#azkaban.name=Test
#azkaban.label=My Local Azkaban
#azkaban.color=#FF3601
#azkaban.default.servlet.path=/index
#web.resource.dir=web/
default.timezone.id=Asia/Shanghai

### 注释如下部分,该部分设置是web-server的用户管理类配置,exec-server用不到
###
# Azkaban UserManager class
#user.manager.class=azkaban.user.XmlUserManager
#user.manager.xml.file=conf/azkaban-users.xml

### 全局配置文件路径、以及项目文件路径地址,不做修改,默认
###
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects

### 不做修改,默认
###
# Velocity dev mode
velocity.dev.mode=false

### 注释如下部分,
### 该部分设置是web-server的Jetty服务器属性配置,exec-server用不到
###
# Azkaban Jetty server properties.
#jetty.use.ssl=false
#jetty.maxThreads=25
#jetty.port=8081

### 修改如下为自身地址,端口默认即可
### (该端口应与web-server中配置的jetty.port一致)
###
# Where the Azkaban web server is located
azkaban.webserver.url=http://hadoop-server-002:8081

# mail settings
mail.sender=
mail.host=

# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache

### 默认即可
###
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true

### jobtype插件地址
###
# Azkaban plugin settings
azkaban.jobtype.plugin.dir=plugins/jobtypes

### mysql配置,修改如下为自身配置,database要与之前在mysql中建立的数据库名一致
###
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=hadoop-server-002
mysql.database=azkaban
mysql.user=root
mysql.password=awifi@123
mysql.numconnections=100

### executor配置
### 添加执行器端口设置executor.port,
### web-server中配置的executor.port应与该端口一致
###
# Azkaban Executor settings
executor.maxThreads=50
executor.flow.threads=30
executor.port=12321
2.1.2.2 execute-as-user.c编译
### 进入软件源码包所在路径:
###
cd /usr/local/software/azkaban-3.90.0/az-exec-util/src/main/c

### 将其下的execute-as-user.c文件拷贝到azkaban的安装目录,
### (该目录自己指定即可,但后文配置时需指向该目录)
###
cp execute-as-user.c /usr/local/apps/azkaban/

### 之后到azkaban的安装目录中
###
cd /usr/local/apps/azkaban


### 执行如下命令进行编译
###
gcc execute-as-user.c -o execute-as-user

之后查看该目录:
在这里插入图片描述
发现编译多出了一个execute-as-user文件

2.1.2.3 jobtype配置
### 进入软件源码包目录
###
cd /usr/local/software/azkaban-3.90.0/az-hadoop-jobtype-plugin/src/jobtypes

查看如下:
在这里插入图片描述

### 将其下除了spark的所有文件拷贝到azkaban/azkaban-exec-server的安装目录下的plugins/jobtypes下
### (因为未安装spark,若安装了spark则全部都拷贝)
###
cp -r * /usr/local/apps/azkaban/azkaban-exec-server/plugins/jobtypes/
cd /usr/local/apps/azkaban/azkaban-exec-server/plugins/jobtypes/
rm -rf spark/

编辑/usr/local/apps/azkaban/azkaban-exec-server/plugins/jobtypes/下的commonprivate.properties文件:

### hadoop安全验证类,hadoop1.x用HadoopSecurityManager_H_1_0,
### hadoop2.x以上用HadoopSecurityManager_H_2_0
###
## hadoop security manager setting common to all hadoop jobs
#hadoop.security.manager.class=azkaban.security.HadoopSecurityManager_H_1_0
hadoop.security.manager.class=azkaban.security.HadoopSecurityManager_H_2_0

### kerberos配置,修改为你自身的keytab文件与账户
###
## hadoop security related settings
# proxy.keytab.location=
# proxy.user=
proxy.keytab.location=/srv/kerberos/keytab/rd.keytab
proxy.user=rd/hadoop-server-003@HADOOP.COM

### 去掉注释
###
#是否开启用户代理
azkaban.should.proxy=true
#是否应该请求令牌,hadoop为安全集群时必须开启
obtain.binary.token=true
#是否获取namenode令牌
obtain.namenode.token=true
#是否获取jobtracker令牌
obtain.jobtracker.token=true

# global classpath items for all jobs. e.g. hadoop-core jar, hadoop conf
#jobtype.global.classpath=${hadoop.home}/*,${hadoop.home}/conf

# global jvm args for all jobs. e.g. java.io.temp.dir, java.library.path
#jobtype.global.jvm.args=

### 修改为你自身的hadoop、hive、spark目录,jobtype.global.classpath目录,
### 并添加hadoop.conf.dir,注意该配置项必需
### (或者在环境变量中添加HADOOP_CONF_DIR好像也可以,未验证)
###
# hadoop
hadoop.home=/usr/local/apps/hadoop
hadoop.conf.dir=/usr/local/apps/hadoop/etc/hadoop
jobtype.global.classpath=${hadoop.conf.dir},${hadoop.home}/share/hadoop/common/*,${hadoop.home}/share/hadoop/common/lib/*,${hadoop.home}/share/hadoop/mapreduce/*,${hadoop.home}/share/hadoop/mapreduce/lib/*,${hadoop.home}/share/hadoop/yarn/*,${hadoop.home}/share/hadoop/yarn/lib/*,${hadoop.home}/share/hadoop/hdfs/*,${hadoop.home}/share/hadoop/hdfs/lib/*
#pig.home=
#hive.home=
hive.home=/usr/local/apps/hive
#spark.home=

# configs for jobtype security settings

### 修改为false
###
# set execute-as-user
execute.as.user=false

### 指向你编译出的execute-as-user文件目录地址
### 
#azkaban.native.lib=
azkaban.native.lib=/usr/local/apps/azkaban

修改common.properties文件,将hadoop.home等改为自己的目录,并添加hadoop.conf.dir配置:

## everything that the user job can know

hadoop.home=/usr/local/apps/hadoop
hadoop.conf.dir=/usr/local/apps/hadoop/etc/hadoop
hive.home=/usr/local/apps/hive
#pig.home=
#spark.home=

#azkaban.should.proxy=
2.1.3 web-server配置
2.1.3.1 web的azkaban配置

进入/usr/local/apps/azkaban/azkaban-web-server/conf,修改azkaban.properties

### 修改时区,
### web.resource.dir修改为绝对路径,指向azkaban-web-server下的web/目录
### 其余也可个性化修改
###
# Azkaban Personalization Settings
azkaban.name=Test
azkaban.label=My Local Azkaban
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=/usr/local/apps/azkaban/azkaban-web-server/web
default.timezone.id=Asia/Shanghai

### 保持默认,web端用户管理配置,可在azkaban-users.xml中编辑登录用户的信息
###
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml

### 全局配置文件路径、以及项目文件路径地址,不做修改,默认
###
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects

### 不做修改,默认
# Velocity dev mode
velocity.dev.mode=false

### jetty设置
### jetty.port网页端口,应与exec-server中配置的azkaban.webserver.url一致
###
# Azkaban Jetty server properties.
jetty.use.ssl=false
jetty.maxThreads=25
jetty.port=8081

### 执行器端口,应与exec-server中配置的executor.port一致
###
# Azkaban Executor settings
executor.port=12321

### 保持默认
# mail settings
mail.sender=
mail.host=

### 保持默认
###
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache

### 保持默认
###
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true

### 修改为与exec-server中配置的mysql属性一致
###
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=hadoop-server-002
mysql.database=azkaban
mysql.user=root
mysql.password=awifi@123
mysql.numconnections=100

### 多个Executor时必须配以下参数设置,并且在filters中去掉MinimumFreeMemory,其他默认
### 在每次分发job时,先过滤出满足条件的executor,然后再做比较筛选
### 如最小剩余内存,MinimumFreeMemory,过滤器会检查executor空余内存是否会大于6G,如果不足6G,则web-server不会将任务交由该executor执行。可参考Azkaban Github源码
### 如CpuStatus,过滤器会检查executor的cpu占用率是否达到95%,若达到95%,web-server也不会将任务交给该executor执行。可参考Azkaban Github源码。
### 参数含义参考官网说明
### http://azkaban.github.io/azkaban/docs/latest/#configuration
###
#Multiple Executor
azkaban.use.multiple.executors=true
#azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus
azkaban.executorselector.filters=StaticRemainingFlowSize,CpuStatus
# 由于是虚拟机,不需要过滤,只需要比较即可
# 某个任务是否指定了executor id
azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=1
# 是否比较内存
azkaban.executorselector.comparator.Memory=1
# 是否最后一次被分发
azkaban.executorselector.comparator.LastDispatched=1
# 是否比较CPU
azkaban.executorselector.comparator.CpuUsage=1

3. 启动

3.1 修改启动脚本

3.1.1 修改exec-server的启动脚本:

vi /usr/local/apps/azkaban/azkaban-exec-server/bin/internal/internal-start-executor.sh
找到下面部分,将其中的CLASSPATH部分改为你自身的hadoop环境:

if [ "$HADOOP_HOME" != "" ]; then
        echo "Using Hadoop from $HADOOP_HOME"
        CLASSPATH=$CLASSPATH:$HADOOP_HOME/etc/hadoop:$HADOOP_HOME/*:$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/common/lib/*:$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_HOME/share/hadoop/yarn/*:$HADOOP_HOME/share/hadoop/yarn/lib/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/hdfs/lib/*
        JAVA_LIB_PATH="-Djava.library.path=$HADOOP_HOME/lib/native/Linux-amd64-64"
else
        echo "Error: HADOOP_HOME is not set. Hadoop job types will not run properly."
fi
3.1.2 修改web-server的启动脚本

同理
vi /usr/local/apps/azkaban/azkaban-web-server/bin/internal/internal-start-web.sh
找到下面部分,将其中的CLASSPATH部分改为你自身的hadoop环境:

if [ "$HADOOP_HOME" != "" ]; then
        echo "Using Hadoop from $HADOOP_HOME"
        CLASSPATH=$CLASSPATH:$HADOOP_HOME/etc/hadoop:$HADOOP_HOME/*:$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/common/lib/*:$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_HOME/share/hadoop/yarn/*:$HADOOP_HOME/share/hadoop/yarn/lib/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/hdfs/lib/*
        JAVA_LIB_PATH="-Djava.library.path=$HADOOP_HOME/lib/native/Linux-amd64-64"
else
        echo "Error: HADOOP_HOME is not set. Hadoop job types will not run properly."
fi

3.2 命令启动关闭

先启动exec-server,后启动web-server,需要在其根目录运行启动命令。

若是想Multiple Executor方式启动,则将上面配置好的azkaban-exec-server分发到其他机器节点即可,然后在每一台上机器上都启动激活exec-server

启动exec-server,并激活:
激活命令如下:curl -G "localhost:$(<./executor.port)/executor?action=activate" && echo

### 启动执行器
###
[rd@hadoop-server-002 aa]$ cd /usr/local/apps/azkaban/azkaban-exec-server/
[rd@hadoop-server-002 azkaban-exec-server]$ ./bin/start-exec.sh
###
### 激活执行器,也必须在根目录运行
###
[rd@hadoop-server-002 azkaban-exec-server]$ curl -G "localhost:$(<./executor.port)/executor?action=activate" && echo

启动后,该目录如下:
在这里插入图片描述
再启动web-server:

[rd@hadoop-server-002 azkaban-exec-server]$ cd /usr/local/apps/azkaban/azkaban-web-server/
[rd@hadoop-server-002 azkaban-web-server]$ ./bin/start-web.sh

启动后该目录如下:
在这里插入图片描述

关闭命令(也需要在根目录执行):

### 关闭web-server
###
[rd@hadoop-server-002 azkaban-web-server]$ ./bin/shutdown-web.sh
### 关闭exec-server
###
[rd@hadoop-server-002 azkaban-exec-server]$ ./bin/shutdown-exec.sh

注意:每次重启执行器exec-server,都需要重新激活,且重启web-server

登录访问web端:http://hadoop-server-002:8081/

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值