Oozie配合Hive、Sqoop定时导出Hive表数据至RDB

Oozie配合HiveSqoop完成定时导出Hive表数据至RDB

一、整体流程 

Oozie定时启动Hive程序,将Hive表数据根据需求导出至HDFS,而后启动Sqoop,将HDFS数据导入至RDB目标表。

二、HQL数据导出语句(需根据具体需求再行修改)

 略过

三、Sqoop导入数据至RDB

sqoop export --connect jdbc:mysql://moma03:3306/test --username hive --password password --table table --export-dir dir --input-fields-terminated-by '\001'

因所用hive版本过低,无法指定导出数据的column分隔符,只能使用其默认分隔符\x01,所以在使用Sqoop导数据时,需指定--input-fields-terminated-by '\001'

另:在使用load数据至hive表时,如使用默认分隔符,需指定为 \x01而不是 \001

四、注意事项

首先要将HiveMeta数据将Derby修改为Mysql。在这个过程中,需要特别注意配置等host地址需为远程地址,而不是localhost等,因为Oozie会在不定的节点上启动Hive任务,如果host地址不正确,将出现项目时而成功时而失败的现象。

OozieSqoop的执行命令有多种写法,其一较为常见的是将命令传入command标签和<arg>标签,如下

     <command>import  --connect jdbc:hsqldb:file:db.hsqldb --table TT --target-dir hdfs://localhost:8020/user/tucu/foo -m 1</command>     

<span style="font-family: Arial, Helvetica, sans-serif;"></span><pre name="code" class="html"><arg>import</arg>            
<arg>--connect</arg>           
<arg>jdbc:hsqldb:file:db.hsqldb</arg>     
…………………………
 

五、xml

<pre name="code" class="html"><workflow-app xmlns="uri:oozie:workflow:0.2" name="exportDataTRDB">
	<start to="importDefaultSegs" />
	<action name="importDefaultSegs">
		<hive xmlns="uri:oozie:hive-action:0.2">
			<job-tracker>${jobTracker}</job-tracker>
			<name-node>${nameNode}</name-node>
			<job-xml>/biginsights/oozie/sharedLibraries/hive/conf/hive-site.xml</job-xml>
			<configuration>
				<property>
					<name>mapred.compress.map.output</name>
					<value>true</value>
				</property>
				<property>
					<name>mapred.job.queue.name</name>
					<value>${queueName}</value>
				</property>
				<property>
					<name>oozie.hive.defaults</name>
					<value>/biginsights/oozie/sharedLibraries/hive/hive-default.xml
					</value>
				</property>
			</configuration>
			<script>adhoc.txt</script>
			<param>SCRIPT=${script1}</param>
		</hive>
		<ok to="cleanupTempBuffer" />
		<error to="fail" />
	</action>
	<action name="cleanupTempBuffer">
		 <fs>
            <delete path="hdfs://moma03:9000/data/contextualtrajsegmentstemp/"/>
            <mkdir path="hdfs://moma03:9000/data/contextualtrajsegmentstemp/"/>
        </fs>
		<ok to="segsPivot" />
		<error to="fail" />
	</action>
	<action name="segsPivot">
		<hive xmlns="uri:oozie:hive-action:0.2">
			<job-tracker>${jobTracker}</job-tracker>
			<name-node>${nameNode}</name-node>
			<job-xml>/biginsights/oozie/sharedLibraries/hive/conf/hive-site.xml</job-xml>
			<configuration>
				<property>
					<name>mapred.compress.map.output</name>
					<value>true</value>
				</property>
				<property>
					<name>mapred.job.queue.name</name>
					<value>${queueName}</value>
				</property>
				<property>
					<name>oozie.hive.defaults</name>
					<value>/biginsights/oozie/sharedLibraries/hive/hive-default.xml
					</value>
				</property>
			</configuration>
			<script>adhoc.txt</script>
			<param>SCRIPT=${script}</param>
		</hive>
		<ok to="export2RDB" />
		<error to="fail" />
	</action>
	<action name="export2RDB">
		<sqoop xmlns="uri:oozie:sqoop-action:0.2">
			<job-tracker>${jobTracker}</job-tracker>
			<name-node>${nameNode}</name-node>
			
			<configuration>
				<property>
					<name>mapred.compress.map.output</name>
					<value>true</value>
				</property>
				<property>
					<name>mapred.job.queue.name</name>
					<value>${queueName}</value>
				</property>
				<property>
					<name>oozie.hive.defaults</name>
					<value>/biginsights/oozie/sharedLibraries/hive/hive-default.xml
					</value>
				</property>
			</configuration>
			<arg>export</arg>
			<arg>--connect</arg>
			<arg>${connect}</arg>
			<arg>--username</arg>
			<arg>${username}</arg>
			<arg>--password</arg>
			<arg>${password}</arg>
			<arg>--input-fields-terminated-by</arg>
			<arg>\001</arg>
			<arg>--table</arg> 
			<arg>${table}</arg>
			<arg>--export-dir</arg> 
			<arg>/user/hive/warehouse/moma.db/t_mart_airqualitystatisticstemp</arg>
		</sqoop>
		<ok to="end" />
		<error to="fail" />
	</action>
	<kill name="fail">
		<message>hive failed, error
			message[${wf:errorMessage(wf:lastErrorNode())}]</message>
	</kill>
	<end name='end' />
</workflow-app>


 



coordinator.xml

<!--
  Licensed to the Apache Software Foundation (ASF) under one
  or more contributor license agreements.  See the NOTICE file
  distributed with this work for additional information
  regarding copyright ownership.  The ASF licenses this file
  to you under the Apache License, Version 2.0 (the
  "License"); you may not use this file except in compliance
  with the License.  You may obtain a copy of the License at
  
       http://www.apache.org/licenses/LICENSE-2.0
  
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
-->
<coordinator-app name="exportDateToRDB" frequency="${coord:days(1)}" start="${start}" end="${end}" timezone="UTC"
                 xmlns="uri:oozie:coordinator:0.2">
        <action>
        <workflow>
            <app-path>${workflowAppUri}</app-path>
            <configuration>
                <property>
                    <name>jobTracker</name>
                    <value>${jobTracker}</value>
                </property>
                <property>
                    <name>nameNode</name>
                    <value>${nameNode}</value>
                </property>
                <property>
                    <name>queueName</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
        </workflow>
    </action>
</coordinator-app>


 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值