一、oozie执行shell脚本(执行mr任务,实现合并增量数据)
参考:http://gethue.com/use-the-shell-action-in-oozie/
1、点击创建、拖动到上面
2、添加命令:bash,当然也可以是linux的其他可执行的命令
3、添加参数:注意---》参数名称是shell脚本的全称(run-mr-compact.sh)
4、添加xxx.sh、xxx.jar、xxx.properties文件(注意:这些文件需要在同一个hdfs文件夹目录下,否则会报错!)
添加的文件顺序建议是:sh 、jar 、properties
相关文件内容:
(1)/user/greatgas/oozie/everyday/shell/run-mr-compact.sh.sh:
#!/bin/bash hadoop jar mr-compact.jar main类全类名 properties文件相对于该jar的路径
|
(2)/user/greatgas/oozie/everyday/MR/mr-compact.jar:
MR合并程序 |
(3)/user/greatgas/oozie/everyday/MR/conf.properties:
jobName=t_test
#baseDir=hdfs上数据库路径 baseDir=/user/hive/warehouse/origin_ennenergy_test.db/
#就是根据table name来寻找hdfs中的hive数据路径 #tableName=原始表,增量表,中转表 tableName= s_t_test,incr_ t_test,out_ t_test
keyIndex=0 #时间戳所在字段,用于判断哪条记录是最新的 timeStampIndex=24 reduceNum=2
|
5、点击红框添加属性
6、属性添加
注意:一定要添加”HADOOP_USER_NAME=hue用户名”,否则没有执行hdfs的文件目录权限!
7、点击6图片中的右上角,退出,然后保存
8、提交任务
二、oozie执行mapreduce任务(实现合并增量数据)
需要上传2个文件到hdfs(如果是集群模式执行)
/user/e_test/workflow/mr/mr-compact.jar(hdfslinux目录jar)
mapred.output.dir /user/hive/warehouse/origin_ennenergy_onecard.db/m(输出路径)
mapred.input.dir /user/hive/warehouse/origin_ennenergy_onecard.db/t(输入路径)
delete /user/hive/warehouse/origin_ennenergy_onecard.db/m(删除)
/user/e_test/workflow/mr/conf.properties(hdfslinux目录properties)
编辑完成后点击提交即可!
三、oozie执行spark任务(实现实时接收kafka消息队列的数据)
1、创建oozie项目
2、添加参数:
yarn-cluster (相当于setMaster(“yarn-cluster”))
cluster (client或者cluster模式,一般都是使用cluster)
MySpark (job名称,可以任意写)
hdfs://master-28.dev.cluster.enn.cn:8020/user/e_liuy/testspark/spark-test.jar(hdfs路径)
或者
hdfs:// nameservice/user/e_liuy/testspark/spark-test.jar(注意集群总名称可以替代url+端口)
enn.action.ConsumerRealDataKafka (需要运行的类的全路径)
四、oozie执行hive任务(实现在hive的origin_ennenergy_onecard数据库中创建一个表----->M_BD_CD_CarInfo_H)
1、创建oozie的hive项目
2、添加sh脚本和hive xml文件
文件内容
(1)onecardtomodel.q/
use origin_ennenergy_onecard;
drop table if exists M_BD_CD_CarInfo_H;
create table M_BD_CD_CarInfo_H
(
FGUID STRING,
FStationNo STRING,
FCOMPANYID STRING,
FCARNO STRING,
FCARER STRING,
FTEL STRING,
time_stamp STRING,
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY'\t'
STORED AS TEXTFILE;
(2)hive-site.xml
内容:
-
<?xml version="1.0" encoding="UTF-8"?>
-
hive.metastore.uristhrift://yours_hosts:9083hive.metastore.client.socket.timeout300hive.metastore.warehouse.dir/user/hive/warehouse(hive的hdfs目录)hive.warehouse.subdir.inherit.permstruehive.enable.spark.execution.enginefalsehive.conf.restricted.listhive.enable.spark.execution.enginehive.auto.convert.jointruehive.auto.convert.join.noconditionaltask.size20971520hive.optimize.bucketmapjoin.sortedmergefalsehive.smbjoin.cache.rows10000mapred.reduce.tasks-1hive.exec.reducers.bytes.per.reducer67108864hive.exec.copyfile.maxsize33554432hive.vectorized.groupby.checkinterval4096hive.vectorized.groupby.flush.percent0.1hive.compute.query.using.statsfalsehive.vectorized.execution.enabledtruehive.vectorized.execution.reduce.enabledfalsehive.merge.mapfilestruehive.merge.mapredfilesfalsehive.cbo.enablefalsehive.fetch.task.conversionminimalhive.fetch.task.conversion.threshold268435456hive.limit.pushdown.memory.usage0.1hive.merge.sparkfilestruehive.merge.smallfiles.avgsize16777216hive.merge.size.per.task268435456hive.optimize.reducededuplicationtruehive.optimize.reducededuplication.min.reducer4hive.map.aggrtruehive.map.aggr.hash.percentmemory0.5hive.optimize.sort.dynamic.partitionfalsespark.executor.memory268435456spark.driver.memory268435456spark.executor.cores1spark.yarn.driver.memoryOverhead26spark.yarn.executor.memoryOverhead26spark.dynamicAllocation.enabledtruespark.dynamicAllocation.initialExecutors1spark.dynamicAllocation.minExecutors1spark.dynamicAllocation.maxExecutors2147483647hive.metastore.execute.setugitruehive.support.concurrencytruehive.zookeeper.quorumslave-29.dev.cluster.enn.cn,slave-30.dev.cluster.enn.cn,slave-31.dev.cluster.enn.cnhive.zookeeper.client.port2181hive.zookeeper.namespacehive_zookeeper_namespace_hive2hbase.zookeeper.quorumzookeeper的机器列表(node1,node2,node3……)hbase.zookeeper.property.clientPort2181hive.cluster.delegation.token.store.classorg.apache.hadoop.hive.thrift.MemoryTokenStorehive.server2.enable.doAstruehive.metastore.sasl.enabledtruehive.server2.authenticationkerberoshive.metastore.kerberos.principalhive/_HOST@ENN.CNhive.server2.authentication.kerberos.principalhive/_HOST@ENN.CNspark.shuffle.service.enabledtruehive.cli.print.current.dbtruehive.exec.reducers.max32
3、保存项目后,提交任务即可
五、oozie执行sqoop任务(实现将hive数据导出到mysql中)
1、创建oozie的sqoop项目
2、输入sqoop命令
命令内容:
export --connectjdbc:mysql://10.37.149.183:3306/enn_application?characterEncoding=utf8--username root --password secret_password --table total_info --export-dir/user/hive/warehouse/origin_ennenergy_onecard.db/total_info/*--input-fields-terminated-by "\t" --update-mode allowinsert--update-key fstationno,fusercardno
3、保存执行即可
六、添加java程序
1、上传jar到liunx对应的oozie目录路径上
2、开始在Hue上操作:
以上测试已经通过,可以按要求更改后直接使用!