azkaban是一个批量工作流调度器,由三部分组成 executor server,web server ,mysql
executor server 和 web server 是通过mysql进行数据交互的,可以独立部署
azkaban任务分为3个层次:project ,flow, job ,当上传project ,flow, job名称都一样的job的时候,会进行覆盖,一个project不能存在相同名称的flow,一个flow下不能存在相同名称的job
特点
1、 兼容任何版本的 hadoop,只要服务器上支持什么命令执行,azkaban就可以支持
2、 易于使用的 Web 用户界面,可以在用户界面上创建项目,上传zip包,执行工作流,配置调度信息,配置传参,监控工作流的执行情况,查看job运行的日志等等
3、 简单的工作流的上传
4、 方便设置任务之间的关系
5、 调度工作流:可以进行立即执行和自动调度两种调度方式
6、 模块化和可插拔的插件机制
7、 认证/授权(权限的工作)
8、 能够杀死并重新启动工作流
9、 有关失败和成功的电子邮件提醒:需要在配置文件中配置发送方的邮箱地址和邮箱服务器地址,在工作流中配置接收方的邮箱地址
zip压缩包中文件两种语法方式
如:zip包:,其中包名就是flow的名称
包里面是一下这些文件
job类型的文件名,就是job的名称,job之间的依赖是使用dependencies关键字
如:first.job中
type=command
command="echo first"
dependencies=start
start.job
type=noop
2、
basic.flow中的内容
config:
failure.emails: noreply@foo.com
nodes:
- name: jobC
type: noop
# jobC depends on jobA and jobB
dependsOn:
- jobA
- jobB
- name: jobA
type: command
config:
command: echo "This is an echoed text."
- name: jobB
type: command
config:
command: pwd
flow20.flow中的内容
azkaban-flow-version: 2.0
job之间的依赖性
其执行顺序是 A--->B--->C--->D串行执行
其执行是 A ---> B ,B1, B2并发执行---> 当B执行完了完后C ,C1执行(此时B1,B2也有可能没有执行完)--->当 C ,C1都执行完了,再执行D
job 中type两种类型
一种是command类型,一种是Javaprocess类型
command类型:可以执行shell,hive,spark-sql,presto,hadoop等,但只能执行服务器能支持的命令,如需要执行hadoop命令,则executor server需要安装在hadoop服务器上
javaprocess类型:其实就是通过java命令来进行执行,服务器要安装java虚拟机
还有一种类型type=noop,其实就是空操作
用java代码对azkaban进行操作
azkaban是由java实现的,看源码就知道其http服务都是用原生的servlet实现,所以地址一般都是?action=xxx格式
基于上述第一种语法格式,大致流程是这样的:登录--->创建项目---> zip文件--->upload zip文件---> 执行flow
用户名密码登录获取会话:
url: https://ip:port?action=login
request type: post
paramter: username(用户名), password(密码)
public Session login(String user,String env) throws ApplicationException {
Session session;
try{
Map<String, Object> dataMap = new HashMap<>();
dataMap.put(AzkabanConstant.ACTION, ActionConstant.LOGIN_ACTION);
dataMap.put(AzkabanConstant.AZKABAN_LOGIN_USERNAME, azkabanConfig.getAzkUsername(env));
dataMap.put(AzkabanConstant.AZKABAN_LOGIN_PWD, azkabanConfig.getAzkPassword(env));
String result = restTemplateService.post(azkabanConfig.getAzkUrl(env), dataMap);
Object obj = JsonUtils.fromValueByKey(result, AzkabanConstant.AZKABAN_REQUEST_STATUS);
if (!AzkabanConstant.AZK_SUCCESS.equals(obj)) {
LOGGER.error("Azkaban login failed! The error message returned is :" + result);
throw new ApplicationException(BDPResponseCode.AZKABAN_LOGIN_FILED.getCode(),"Azkaban login failed!");
}
String sessionId = JsonUtils.getJSONValueByKey(result, AzkabanConstant.AZKABAN_SESSION_ID);
session = new AzkabanSession();
((AzkabanSession)session).setSessionId(sessionId);
((AzkabanSession)session).setUser(azkabanConfig.getAzkUsername(env));
} catch (ResourceAccessException e) {
LOGGER.error("获取Azkaban Sesssion失败:", e);
if (e.getCause() instanceof ConnectTimeoutException || e.getCause() instanceof SocketTimeoutException) {
throw new ApplicationException(ExceptionInfo.FAIL_CODE, ExceptionInfo.CONNECTION_TIME_OUT);
}
throw new ApplicationException(BDPResponseCode.GET_AZKABAN_SESSION_FAILED.getCode(),
BDPResponseCode.GET_AZKABAN_SESSION_FAILED.getMessage() + ":" + e.getMessage());
} catch (Exception e){
LOGGER.error("获取Azkaban Sesssion失败:",e);
throw new ApplicationException(BDPResponseCode.GET_AZKABAN_SESSION_FAILED.getCode(),
BDPResponseCode.GET_AZKABAN_SESSION_FAILED.getMessage()+":"+e.getMessage());
}
return session;
}
创建项目:
url: https://ip:port?action=create
request type: post
paramter: name(项目名称), description(项目描述)
public Project createProject(Project project, String env) throws ApplicationException {
Map<String, Object> params = new HashMap<>();
BdpProject bdpProject = (BdpProject)project;
String projectCode = bdpProject.getCode();
String flowName = bdpProject.getLastFlow().getName();
log.info("[azk trace] env {} createProject project {} flowName {}", env, projectCode, flowName);
String azkProjectName = AzkabanUtils.generateProjectName(projectCode, flowName);
Session session = securityService.login(azkConfig.getAzkUsername(env), env);
params.put(AzkabanConstant.AZKABAN_SESSION_ID, session.getSessionId());
params.put(AzkabanConstant.ACTION, ActionConstant.CREATE_PROJECT_ACTION);
params.put(AzkabanConstant.AZKABAN_PROJECT_NAME, azkProjectName);
params.put(AzkabanConstant.AZKABAN_PROJECT_DESC, project.getDescription());
try {
String azkUrl = azkConfig.getAzkUrl(env) + "/manager";
log.info("[azk trace] env {} createProject {} azkUrl {}", env, projectCode, azkUrl);
String result = restTemplateService.post(azkUrl, params);
log.info("[azk trace] env {} createProject {} result {}", env, projectCode, result);
} catch (Exception e) {
log.error("[azk trace] env {} createProject {} error", env, projectCode, e);
throw new ApplicationException(90009, e.getMessage(), e);
}
return null;
}
生成zip文件:
其中就是创建job文件,还有sh脚本文件,配置信息文件
job文件,配置信息文件都是key:value的形式,生成之后把zip包放到服务器的一个目录,保存起来跟项目绑定
public String zipProject(Project project, String env) throws ApplicationException {
AzkabanSchedulerProject publishProject = (AzkabanSchedulerProject) project;
String projectPath = publishProject.getStorePath();
//封装Azkaban对象
String projectName = project.getName();
//生成文件存放到本地及HDFS
List<AzkabanJobZipEntity> azkabanList = null;
List<String> delFlowList = null;
String statusCode = null;
String zipPath;
BdpAzkabanGlobalPropertiesBO bdpAzkabanGlobalPropertiesBO = new BdpAzkabanGlobalPropertiesBO();
try {
zipPath = GenerateFileUtils.generateMultipleZip(projectPath, azkabanList,null,null,true,bdpAzkabanGlobalPropertiesBO);
} catch (Exception e) {
log.error("[azk trace] env {} zipProject {} failed {}", env, projectName, e);
throw new ApplicationException(BDPResponseCode.ZIP_AZKABAN_PROJECT_FAILED);
}
return zipPath;
}
上传zip包:
url: https://ip:port?action=upload
request type: post
paramter: project(项目名称), file(项目zip包存放在服务器上的路径)
private String uploadProject(String filePath, Project project, String env) throws ApplicationException {
FileSystemResource resource = new FileSystemResource(new File(filePath));
Map<String, Object> dataMap = new HashMap<>();
String projectName = project.getName();
Session session = securityService.login(azkConfig.getAzkUsername(env), env);
dataMap.put(AzkabanConstant.AZKABAN_SESSION_ID, session.getSessionId());
dataMap.put(AzkabanConstant.AJAX, ActionConstant.UPLOAD_PROJECT_ACTION);
dataMap.put(AzkabanConstant.AZKABAN_PROJECT, projectName);
dataMap.put(AzkabanConstant.AZKABAN_PROJECT_FILE, resource);
try {
String url = azkConfig.getAzkUrl(env) + "/manager";
log.info("[azk trace] env {} uploadProject {} url {}", env, projectName, url);
String result = restTemplateService.postForObject(url, dataMap);
Object obj = JsonUtils.fromValueByKey(result, "projectId");
if (StringUtils.isEmpty(String.valueOf(obj))) {
log.error("[azk trace] env {} uploadProject {} failed, filePath {}, result {}",
env ,projectName, filePath, result);
throw new ApplicationException(90013, "release project failed");
}
log.info("[azk trace] env {} uploadProject {} success, result {}", env, projectName, result);
return String.valueOf(JsonUtils.fromValueByKey(result, "projectId"));
} catch (Exception e) {
log.error("[azk trace] env {} uploadProject {} failed, error {}", env, projectName, e);
throw new ApplicationException(90014, e.getMessage(), e);
}
}
执行flow:
url: https://ip:port/executor?ajax=executeFlow&session.id=" + sessionId + "&project=" + azkProjectName + "&flow=" + flowName;
request type: get
session.id:会话id,login之后会返回
project:项目名称
flow:工作名称
@Override
public JSONObject executeFlow(BdpScheduleFlowBO bdpScheduleFlowBO) throws ApplicationException {
BdpScheduleAzkFlowBO azkFlowBO = (BdpScheduleAzkFlowBO) bdpScheduleFlowBO;
String env = azkFlowBO.getEnv();
String projectCode = azkFlowBO.getProjectCode();
String flowName = azkFlowBO.getFlowName();
String azkProjectName = AzkabanUtils.generateProjectName(projectCode ,flowName);
Session session = azkSecurityService.login(azkConfig.getAzkUsername(env), env);
String sessionId = session.getSessionId();
List<String> disabledJobs = azkFlowBO.getDisabledJobs();
String params;
if(disabledJobs != null && !disabledJobs.isEmpty()) {
JSONArray disableJobsArray = new JSONArray();
disableJobsArray.addAll(disabledJobs);
params = "?ajax=executeFlow&session.id=" + session.getSessionId()
+ "&project=" + azkProjectName + "&flow=" + flowName + "&disabled=" + disableJobsArray.toJSONString();
} else {
params = "?ajax=executeFlow&session.id=" + sessionId + "&project=" + azkProjectName + "&flow=" + flowName;
}
//不能为null
String flowParametersStr = azkFlowBO.getSingleExecFlowParamsStr() == null? StrUtil.EMPTY:azkFlowBO.getSingleExecFlowParamsStr();
params += flowParametersStr;
String azkUrl = azkConfig.getAzkUrl(env) + "/executor" + params;
logger.info("[azk trace] env {} project {} executeFlow azkUrl {}", env, azkProjectName, azkUrl);
String response = restTemplate.getForObject(azkUrl, String.class);
logger.info("[azk trace] env {} project {} execute executeFlow result {}", env, azkProjectName, response);
JSONObject result;
if(StringUtils.isNotBlank(response)){
result = JSON.parseObject(response);
}else{
throw new ApplicationException(BDPResponseCode.AZKABAN_FLOW_ERROR.getCode(),"启动工作流-Azkaban调用异常");
}
return result;
}
当然还有一些其他常见的http 接口:
如:获取flow实例,获取job实例,取消flow执行,获取调度信息,获取日志信息等等