Oozie(一)之基本概念及实现hbase表数据写入hive案例
一、Oozie简介
Oozie是Apache公司的顶级项目。
Oozie是大数据四大协作框架之一——任务调度框架,其他三个分别为数据转换工具Sqoop,文件收集库框架Flume,大数据WEB工具Hue。
它能提供对Hadoop MapReduce Jobs、Spark(Streaming) Jobs、Hive Jobs等的任务调度及协调,管理actions的有向无环图(DAG)。
二、Oozie三大功能模块
1、workflow(工作流):定义job任务执行。
2、Coordinator:定时触发workflow,周期性执行workflow。
3、Bundle Job:绑定多个coordinator,一起提交或触发所以coordinator。
Oozie定义了控制流节点(Control Flow Nodes)和动作节点(Action Nodes),其中控制流节点定义了流程的开始和结束,以及控制流程的执行路径(Execution Path),如start、kill、end、fork、join、decision等;而动作节点包括map reduce、pig、hive、ssh、java、email、sub workflow等。
oozie本质就是一个作业协调工具(底层原理是通过将xml语言转换成map reduce程序来做,但只是集中在map端做处理,避免shuffle过程)
三、workflow相关配置
1、job.properties: 定义job相关属性及参数
2、workflow.xml: 定义控制流节点和动作节点
3、lib:存放job任务运行的相关资料文件(jar包)
四、实现hbase表数据写入hive案例
4.1 文件准备
1、job.properties
nameNode=hdfs://sandbox-hdp.hortonworks.com:8020
jobTracker=sandbox-hdp.hortonworks.com:8032
queueName=default
#Set Oozie environment
oozie.wf.application.path=${nameNode}/events/demo
oozie.use.system.libpath=true
2、workflow.xml
<workflow-app name="hive_oozie_demo" xmlns="uri:oozie:workflow:0.5">
<start to="run"/>
<action name="run">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<jdbc-url>jdbc:hive2://localhost:10000/default</jdbc-url>
<script>hql/demo.hql</script>
</hive2>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>ETL task(d) failed,The error message is [${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
3、demo.hql
create database if not exists demo;
use demo;
drop table if exists employee;
create external table employee(account string,firstName string,lastName string,department string,emailAddress string,phone string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties('hbase.columns.mapping'=':key,profile:firstName,
profile:lastName,department:name,contact:emailAddress,contact:phone')
tblproperties('hbase.table.name'='employee');
drop table if exists users;
create table users as
select * from employee;
4.2 执行oozie案例流程
1、进入hbase
hbase shell
2、hbase中建表employee
create 'employee','profile','department','contact'
put 'employee','nml','profile:firstName','ml'
put 'employee','nml','profile:lastName','nie'
put 'employee','nml','department:name','bigdata'
put 'employee','nml','contact:emailAddress','ml.nie@bigdata.com'
put 'employee','nml','contact:phone','168-666-2786'
3、hdfs中创建对应目录
hdfs dfs -mkdir -p /events/demo/hql
hdfs dfs -put demo.hql /events/demo/hql
hdfs dfs -put workflow.xml /events/demo
4、提交oozie任务
oozie job --oozie http://sandbox-hdp.hortonworks.com:11000/oozie --config ./job.properties -run
5、进hive中查看导入的employee表
0: jdbc:hive2://localhost:10000> select * from employee;
+-------------------+---------------------+--------------------+----------------------+------------------------+-----------------+--+
| employee.account | employee.firstname | employee.lastname | employee.department | employee.emailaddress | employee.phone |
+-------------------+---------------------+--------------------+----------------------+------------------------+-----------------+--+
| nml | ml | nie | bigdata | ml.nie@bigdata.com | 168-666-2786 |
+-------------------+---------------------+--------------------+----------------------+------------------------+-----------------+--+