[oozie基础]-- 部署spark、hadoop、hive、shell和java程序举例

最新推荐文章于 2022-08-06 15:04:12 发布

往事随风ing

最新推荐文章于 2022-08-06 15:04:12 发布

阅读量2.7k

点赞数 1

分类专栏： Oozie 文章标签： oozie

本文链接：https://blog.csdn.net/high2011/article/details/52845669

版权

Oozie 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

一、oozie执行shell脚本(执行mr任务,实现合并增量数据)

参考：http://gethue.com/use-the-shell-action-in-oozie/

1、点击创建、拖动到上面

2、添加命令:bash，当然也可以是linux的其他可执行的命令

3、添加参数：注意---》参数名称是shell脚本的全称(run-mr-compact.sh)

4、添加xxx.sh、xxx.jar、xxx.properties文件（注意：这些文件需要在同一个hdfs文件夹目录下，否则会报错！）

添加的文件顺序建议是：sh 、jar 、properties

二、oozie执行mapreduce任务(实现合并增量数据)

需要上传2个文件到hdfs(如果是集群模式执行)

/user/e_test/workflow/mr/mr-compact.jar(hdfslinux目录jar)

mapred.output.dir /user/hive/warehouse/origin_ennenergy_onecard.db/m(输出路径)

mapred.input.dir /user/hive/warehouse/origin_ennenergy_onecard.db/t(输入路径)

delete /user/hive/warehouse/origin_ennenergy_onecard.db/m(删除)

/user/e_test/workflow/mr/conf.properties(hdfslinux目录properties)

编辑完成后点击提交即可！

三、oozie执行spark任务(实现实时接收kafka消息队列的数据)

1、创建oozie项目

2、添加参数：

yarn-cluster (相当于setMaster(“yarn-cluster”))

cluster (client或者cluster模式，一般都是使用cluster)

MySpark (job名称，可以任意写)

hdfs://master-28.dev.cluster.enn.cn:8020/user/e_liuy/testspark/spark-test.jar(hdfs路径)

或者

hdfs:// nameservice/user/e_liuy/testspark/spark-test.jar(注意集群总名称可以替代url+端口)

enn.action.ConsumerRealDataKafka (需要运行的类的全路径)

四、oozie执行hive任务(实现在hive的origin_ennenergy_onecard数据库中创建一个表----->M_BD_CD_CarInfo_H)

1、创建oozie的hive项目

2、添加sh脚本和hive xml文件

文件内容

(1)onecardtomodel.q/

use origin_ennenergy_onecard;

drop table if exists M_BD_CD_CarInfo_H;

create table M_BD_CD_CarInfo_H

(

FGUID STRING,

FStationNo STRING,

FCOMPANYID STRING,

FCARNO STRING,

FCARER STRING,

FTEL STRING,

time_stamp STRING,

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY'\t'

STORED AS TEXTFILE;

(2)hive-site.xml

内容：





  
    hive.metastore.uris
    thrift://yours_hosts:9083
  
  
    hive.metastore.client.socket.timeout
    300
  
  
    hive.metastore.warehouse.dir
    /user/hive/warehouse(hive的hdfs目录)
  
  
    hive.warehouse.subdir.inherit.perms
    true
  
  
    hive.enable.spark.execution.engine
    false
  
  
    hive.conf.restricted.list
    hive.enable.spark.execution.engine
  
  
    hive.auto.convert.join
    true
  
  
    hive.auto.convert.join.noconditionaltask.size
    20971520
  
  
    hive.optimize.bucketmapjoin.sortedmerge
    false
  
  
    hive.smbjoin.cache.rows
    10000
  
  
    mapred.reduce.tasks
    -1
  
  
    hive.exec.reducers.bytes.per.reducer
    67108864
  
  
    hive.exec.copyfile.maxsize
    33554432
  
  
  
    hive.vectorized.groupby.checkinterval
    4096
  
  
    hive.vectorized.groupby.flush.percent
    0.1
  
  
    hive.compute.query.using.stats
    false
  
  
    hive.vectorized.execution.enabled
    true
  
  
    hive.vectorized.execution.reduce.enabled
    false
  
  
    hive.merge.mapfiles
    true
  
  
    hive.merge.mapredfiles
    false
  
  
    hive.cbo.enable
    false
  
  
    hive.fetch.task.conversion
    minimal
  
  
    hive.fetch.task.conversion.threshold
    268435456
  
  
    hive.limit.pushdown.memory.usage
    0.1
  
  
    hive.merge.sparkfiles
    true
  
  
    hive.merge.smallfiles.avgsize
    16777216
  
  
    hive.merge.size.per.task
    268435456
  
  
    hive.optimize.reducededuplication
    true
  
  
    hive.optimize.reducededuplication.min.reducer
    4
  
  
    hive.map.aggr
    true
  
  
    hive.map.aggr.hash.percentmemory
    0.5
  
  
    hive.optimize.sort.dynamic.partition
    false
  
  
    spark.executor.memory
    268435456
  
  
    spark.driver.memory
    268435456
  
  
    spark.executor.cores
    1
  
  
    spark.yarn.driver.memoryOverhead
    26
  
  
    spark.yarn.executor.memoryOverhead
    26
  
  
    spark.dynamicAllocation.enabled
    true
  
  
    spark.dynamicAllocation.initialExecutors
    1
  
  
    spark.dynamicAllocation.minExecutors
    1
  
  
    spark.dynamicAllocation.maxExecutors
    2147483647
  
  
    hive.metastore.execute.setugi
    true
  
  
    hive.support.concurrency
    true
  
  
    hive.zookeeper.quorum
    slave-29.dev.cluster.enn.cn,slave-30.dev.cluster.enn.cn,slave-31.dev.cluster.enn.cn
  
  
    hive.zookeeper.client.port
    2181
  
  
    hive.zookeeper.namespace
    hive_zookeeper_namespace_hive2
  
  
    hbase.zookeeper.quorum
    zookeeper的机器列表（node1,node2,node3……）
  
  
    hbase.zookeeper.property.clientPort
    2181
  
  
    hive.cluster.delegation.token.store.class
    org.apache.hadoop.hive.thrift.MemoryTokenStore
  
  
    hive.server2.enable.doAs
    true
  
  
    hive.metastore.sasl.enabled
    true
  
  
    hive.server2.authentication
    kerberos
  
  
    hive.metastore.kerberos.principal
    hive/_HOST@ENN.CN
  
  
    hive.server2.authentication.kerberos.principal
    hive/_HOST@ENN.CN
  
  
    spark.shuffle.service.enabled
    true
  
  
    hive.cli.print.current.db
    true
  
  
    hive.exec.reducers.max
    32
  






  
    hive.metastore.uris
    thrift://yours_hosts:9083
  
  
    hive.metastore.client.socket.timeout
    300
  
  
    hive.metastore.warehouse.dir
    /user/hive/warehouse(hive的hdfs目录)
  
  
    hive.warehouse.subdir.inherit.perms
    true
  
  
    hive.enable.spark.execution.engine
    false
  
  
    hive.conf.restricted.list
    hive.enable.spark.execution.engine
  
  
    hive.auto.convert.join
    true
  
  
    hive.auto.convert.join.noconditionaltask.size
    20971520
  
  
    hive.optimize.bucketmapjoin.sortedmerge
    false
  
  
    hive.smbjoin.cache.rows
    10000
  
  
    mapred.reduce.tasks
    -1
  
  
    hive.exec.reducers.bytes.per.reducer
    67108864
  
  
    hive.exec.copyfile.maxsize
    33554432
  
  
  
    hive.vectorized.groupby.checkinterval
    4096
  
  
    hive.vectorized.groupby.flush.percent
    0.1
  
  
    hive.compute.query.using.stats
    false
  
  
    hive.vectorized.execution.enabled
    true
  
  
    hive.vectorized.execution.reduce.enabled
    false
  
  
    hive.merge.mapfiles
    true
  
  
    hive.merge.mapredfiles
    false
  
  
    hive.cbo.enable
    false
  
  
    hive.fetch.task.conversion
    minimal
  
  
    hive.fetch.task.conversion.threshold
    268435456
  
  
    hive.limit.pushdown.memory.usage
    0.1
  
  
    hive.merge.sparkfiles
    true
  
  
    hive.merge.smallfiles.avgsize
    16777216
  
  
    hive.merge.size.per.task
    268435456
  
  
    hive.optimize.reducededuplication
    true
  
  
    hive.optimize.reducededuplication.min.reducer
    4
  
  
    hive.map.aggr
    true
  
  
    hive.map.aggr.hash.percentmemory
    0.5
  
  
    hive.optimize.sort.dynamic.partition
    false
  
  
    spark.executor.memory
    268435456
  
  
    spark.driver.memory
    268435456
  
  
    spark.executor.cores
    1
  
  
    spark.yarn.driver.memoryOverhead
    26
  
  
    spark.yarn.executor.memoryOverhead
    26
  
  
    spark.dynamicAllocation.enabled
    true
  
  
    spark.dynamicAllocation.initialExecutors
    1
  
  
    spark.dynamicAllocation.minExecutors
    1
  
  
    spark.dynamicAllocation.maxExecutors
    2147483647
  
  
    hive.metastore.execute.setugi
    true
  
  
    hive.support.concurrency
    true
  
  
    hive.zookeeper.quorum
    slave-29.dev.cluster.enn.cn,slave-30.dev.cluster.enn.cn,slave-31.dev.cluster.enn.cn
  
  
    hive.zookeeper.client.port
    2181
  
  
    hive.zookeeper.namespace
    hive_zookeeper_namespace_hive2
  
  
    hbase.zookeeper.quorum
    zookeeper的机器列表（node1,node2,node3……）
  
  
    hbase.zookeeper.property.clientPort
    2181
  
  
    hive.cluster.delegation.token.store.class
    org.apache.hadoop.hive.thrift.MemoryTokenStore
  
  
    hive.server2.enable.doAs
    true
  
  
    hive.metastore.sasl.enabled
    true
  
  
    hive.server2.authentication
    kerberos
  
  
    hive.metastore.kerberos.principal
    hive/_HOST@ENN.CN
  
  
    hive.server2.authentication.kerberos.principal
    hive/_HOST@ENN.CN
  
  
    spark.shuffle.service.enabled
    true
  
  
    hive.cli.print.current.db
    true
  
  
    hive.exec.reducers.max
    32

3、保存项目后，提交任务即可

五、oozie执行sqoop任务（实现将hive数据导出到mysql中）

1、创建oozie的sqoop项目

2、输入sqoop命令

命令内容：

export --connectjdbc:mysql://10.37.149.183:3306/enn_application?characterEncoding=utf8--username root --password secret_password --table total_info --export-dir/user/hive/warehouse/origin_ennenergy_onecard.db/total_info/*--input-fields-terminated-by "\t" --update-mode allowinsert--update-key fstationno,fusercardno

3、保存执行即可