作用
Azkarban流用来协调一堆脚本工作的.
入门用法
首先创建一个后缀project的文件,表明这是一个2.0版本的流.
azkaban-flow-version: 2.0
然后创建一个flow后缀的文件,内容如下
一个基本的流最少要包含以下三项类容. name,type,config
配置文件为yaml格式.
yaml格式的配置文件
nodes:
- name: jobA
type: command
config:
command: echo "This is an echoed text."
如果看不懂yaml格式,对比下json格式
{
"nodes":[
{
"name":"jobA",
"type":"command",
"config":{
"command":"echo \"This is an echoed text.\""
}
}
]
}
然后把上述两个文件压缩到一起.上传到azkarban,就可以执行了.!
job可以相互依赖
如下
- name: jobC
type: noop
# jobC depends on jobA and jobB
dependsOn:
- jobA
- jobB
- name: jobA
type: command
config:
command: echo "This is an echoed text."
- name: jobB
type: command
config:
command: pwd
看json对比下
[
{
"name":"jobC",
"type":"noop",
"dependsOn":[
"jobA",
"jobB"
]
},
{
"name":"jobA",
"type":"command",
"config":{
"command":"echo \"This is an echoed text.\""
}
},
{
"name":"jobB",
"type":"command",
"config":{
"command":"pwd"
}
}
]
也可以是其他类型,不一定是shell命令
nodes:
- name: pigJob
type: pig
config:
pig.script: sql/pig/script.pig
全局配置
共同的内容可以在多个job之间共享.
config:
user.to.proxy: foo
failure.emails: noreply@foo.com
nodes:
- name: jobA
type: command
config:
command: echo "This is an echoed text."
对比json
{
"config":{
"user.to.proxy":"foo",
"failure.emails":"noreply@foo.com"
},
"nodes":[
{
"name":"jobA",
"type":"command",
"config":{
"command":"echo \"This is an echoed text.\""
}
}
]
}
可以嵌套
nodes:
- name: embedded_flow
type: flow
config:
prop: value
nodes:
- name: jobB
type: noop
dependsOn:
- jobA
- name: jobA
type: command
config:
command: pwd
总结
- 写Azkaban 2.0的flow,要学会yaml配置的写法
- 理解name,type,config的含义