oozie远程触发一个工作,一个工作完成之后,返回oozie知行下一个
oozie是以DAG的形式执行,前一个执行完,下一个才能执行!
可以在工作流中使用参数化形式定义参数类似${inputDir}的形式,在提交工作时必须提供参数!
Oozie workflows contain control flow nodes and action nodes.
工作流,包括控制流节点(control flow nodes),action节点
控制flow 节点配置开始,结束,fail
action node配置任务的触发,执行...
Workflow Diagram:
hPDL Workflow Definition:
<workflow-app name='wordcount-wf' xmlns="uri:oozie:workflow:0.1"> <start to='wordcount'/> <action name='wordcount'> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.mapper.class</name> <value>org.myorg.WordCount.Map</value> </property> <property> <name>mapred.reducer.class</name> <value>org.myorg.WordCount.Reduce</value> </property> <property> <name>mapred.input.dir</name> <value>${inputDir}</value> </property> <property> <name>mapred.output.dir</name> <value>${outputDir}</value> </property> </configuration> </map-reduce> <ok to='end'/> <error to='end'/> </action> <kill name='kill'> <message>Something went wrong: ${wf:errorCode('wordcount')}</message> </kill/> <end name='end'/> </workflow-app>
using decision, fork and join nodes. Cycles in workflows are not supporteds
使用decision, fork and join nodes. 实现流控制
Possible states for a workflow jobs are: PREP , RUNNING , SUSPENDED , SUCCEEDED , KILLED and FAILED .
工作状态:PREP , RUNNING , SUSPENDED , SUCCEEDED , KILLED and FAILED .
Oozie can make HTTP callback notifications on action start/end/failure events and workflow end/failure events
oozie action 开始、end 、failuer ,workflow end/failure 事件触发,可以通过HTTP回调获得通知
Workflow Definition
flow nodes (start, end, decision, fork, join, kill) or action nodes (map-reduce, pig, etc.
flow node: (start, end, decision, fork, join, kill) 可以用于实现DAG 控制流的实现
oozie中不能有环,否则会部署失败
Workflow Nodes
- Control flow nodes: nodes that control the start and end of the workflow and workflow job execution path.
控制工作流的开始、结束,以及工作的执行路径的节点
- Action nodes: nodes that trigger the execution of a computation/processing task.
触发任务执行
Node names and transitions must be conform to the following pattern =[a-zA-Z][\-_a-zA-Z0-0]*=, of up to 20 characters long.
nodeName 只能以pattern =[a-zA-Z][\-_a-zA-Z0-0]*= 模式,做多20个字符
start:
start是工作流的起始入口,工作流开始自动找到指定的start节点
Syntax:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1"> ... <start to="[NODE-NAME]"/> ... </workflow-app>
The to attribute is the name of first workflow node to execute.
Example:
<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1"> ... <start to="firstHadoopJob"/> ... </workflow-app>
End Control Node
The end node is the end for a workflow job, it indicates that the workflow job has completed successfully.
When a workflow job reaches the end it finishes successfully (SUCCEEDED).
If one or more actions started by the workflow job are executing when the end node is reached, the actions will be killed. In this scenario the workflow job is still considered as successfully run.
A workflow definition must have one end node.
每个workflow必须有一个end node
end表示一个work flow job成功的执行
一单遇到end,work flow 中触发的多个action 都会被kill掉,认定工作成功结束
Syntax:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1"> ... <end name="[NODE-NAME]"/> ... </workflow-app>
The name attribute is the name of the transition to do to end the workflow job.
Example:
<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1"> ... <end name="end"/> </workflow-app>
Kill Control Node
The kill node allows a workflow job to kill itself.
When a workflow job reaches the kill it finishes in error (KILLED).
If one or more actions started by the workflow job are executing when the kill node is reached, the actions will be killed.
A workflow definition may have zero or more kill nodes.
kill 会节点会终止job
结束状态为error
一个job可能有0或多个kill
可以在kill node中输出错误信息,会被写到log中
Syntax:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1"> ... <kill name="[NODE-NAME]"> <message>[MESSAGE-TO-LOG]</message> </kill> ... </workflow-app>
The name attribute in the kill node is the name of the Kill action node.
The content of the message element will be logged as the kill reason for the workflow job.
A kill node does not have transition elements because it ends the workflow job, asKILLED.
Example:
<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1"> ... <kill name="killBecauseNoInput"> <message>Input unavailable</message> </kill> ... </workflow-app>
Decision Control Node
A decision node enables a workflow to make a selection on the execution path to follow.
The behavior of a decision node can be seen as a switch-case statement.
A decision node consists of a list of predicates-transition pairs plus a default transition. Predicates are evaluated in order or appearance until one of them evaluates totrue and the corresponding transition is taken. If none of the predicates evaluates totrue the default transition is taken.
Predicates are JSP Expression Language (EL) expressions (refer to section 4.2 of this document) that resolve into a boolean value,true or false. For example:
决策节点,类似于c语言中的switch case
有0/多个case外加defaul组成
知道有case的判定为treue,跳转到case的to节点,否者执行defaut
判定的表达式,是用JSP的EL表达式
${fs:fileSize('/usr/foo/myinputdir') gt 10 * GB}
Syntax:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1"> ... <decision name="[NODE-NAME]"> <switch> <case to="[NODE_NAME]">[PREDICATE]</case> ... <case to="[NODE_NAME]">[PREDICATE]</case> <default to="[NODE_NAME]"/> </switch> </decision> ... </workflow-app>
The name attribute in the decision node is the name of the decision node.
Each case elements contains a predicate and a transition name. The predicate ELs are evaluated in order until one returnstrue and the corresponding transition is taken.
The default element indicates the transition to take if none of the predicates evaluates totrue.
All decision nodes must have a default element to avoid bringing the workflow into an error state if none of the predicates evaluates to true.
Example:
<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1"> ... <decision name="mydecision"> <switch> <case to="reconsolidatejob"> ${fs:fileSize(secondjobOutputDir) gt 10 * GB} </case> <case to="rexpandjob"> ${fs:fileSize(secondjobOutputDir) lt 100 * MB} </case> <case to="recomputejob"> ${ hadoop:counters('secondjob')[RECORDS][REDUCE_OUT] lt 1000000 } </case> <default to="end"/> </switch> </decision> ... </workflow-app>
Fork and Join Control Nodes
A fork node splits one path of execution into multiple concurrent paths of execution.
A join node waits until every concurrent execution path of a previousfork node arrives to it.
The fork and join nodes must be used in pairs. The join node assumes concurrent execution paths are children of the samefork node.
fork node将当前路径分成多个路径
fork必须和join一起使用,join会等待所有的path都抵达
Syntax:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1"> ... <fork name="[FORK-NODE-NAME]"> <path start="[NODE-NAME]" /> ... <path start="[NODE-NAME]" /> </fork> ... <join name="[JOIN-NODE-NAME]" to="[NODE-NAME]" /> ... </workflow-app>
The name attribute in the fork node is the name of the workflow fork node. Thestart attribute in the pathelements in the fork node indicate the name of the workflow node that will be part of the concurrent execution paths.
The name attribute in the join node is the name of the workflow join node. Theto attribute in the join node indicates the name of the workflow node that will executed after all concurrent execution paths of the corresponding fork arrive to the join node.
Example:
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1"> ... <fork name="forking"> <path start="firstparalleljob"/> <path start="secondparalleljob"/> </fork> <action name="firstparallejob"> <map-reduce> <job-tracker>foo:8021</job-tracker> <name-node>bar:8020</name-node> <job-xml>job1.xml</job-xml> </map-reduce> <ok to="joining"/> <error to="kill"/> </action> <action name="secondparalleljob"> <map-reduce> <job-tracker>foo:8021</job-tracker> <name-node>bar:8020</name-node> <job-xml>job2.xml</job-xml> </map-reduce> <ok to="joining"/> <error to="kill"/> </action> <join name="joining" to="nextaction"/> ... </workflow-app>
By default, Oozie performs some validation that any forking in a workflow is valid and won't lead to any incorrect behavior or instability. However, if Oozie is preventing a workflow from being submitted and you are very certain that it should work, you can disable forkjoin validation so that Oozie will accept the workflow. To disable this validation just for a specific workflow, simply setoozie.wf.validate.ForkJoin to false in the job.properties file. To disable this validation for all workflows, simply set =oozie.validate.ForkJoin= tofalse in the oozie-site.xml file. Disabling this validation is determined by the AND of both of these properties, so it will be disabled if either or both are set to false and only enabled if both are set to true (or not specified).