通过sqoop增量传送oracle数据到hive

最新推荐文章于 2022-02-15 22:17:23 发布

BabyFish13

最新推荐文章于 2022-02-15 22:17:23 发布

阅读量2.6k

点赞数

分类专栏： BigData # HadoopTools 文章标签： hadoop oracle spark linux shell

本文链接：https://blog.csdn.net/BabyFish13/article/details/51792441

版权

BigData 同时被 2 个专栏收录

52 篇文章 2 订阅

订阅专栏

HadoopTools

6 篇文章 0 订阅

订阅专栏

从网上看到的一个oracle数据通过sqoop每日增量同步到hive的shell脚本，感觉以后会有用，作为参考，备查。
[spark@store ~]$ cat oracle2hive_imcrement.sh

#!/bin/bash


#Please set the synchronize interval,unit is hour.
update_interval=24

#Please set the RDBMS connection params
rdbms_connstr="jdbc:oracle:thin:@WIN-A1UAC36B1UC:1521:orcl"
rdbms_username="cfa"
rdbms_pwd="cfa"
rdbms_table="AIX_REPORT_DATA"
rdbms_columns="reportno,rowno,rowname,col2value,create_time"

#Please set the hive params
hive_increment_table="aix_report_data_increment"
hive_full_table="aix_report_data"

#---------------------------------------------------------
#Import icrement data in RDBMS into Hive

enddate=$(date '+%Y/%m/%d %H:%M:%S')
startdate=$(date '+%Y/%m/%d %H:%M:%S' -d '-'+${update_interval}+' hours')

$SQOOP_HOME/bin/sqoop import --connect ${rdbms_connstr} --username ${rdbms_username} --password ${rdbms_pwd} --table ${rdbms_table} --columns "${rdbms_columns}" --where "CREATE_TIME > to_date('${startdate}','yyyy-mm-dd hh24:mi:ss') and CREATE_TIME < to_date('${enddate}','yyyy-mm-dd hh24:mi:ss')" --hive-import --hive-overwrite --hive-table ${hive_increment_table}


#---------------------------------------------------------
#Update the old full data table to latest status
$HIVE_HOME/bin/hive -e "insert overwrite table ${hive_full_table} select * from ${hive_increment_table} union all select a.* from ${hive_full_table} a left outer join ${hive_increment_table} b on a.reportno = b.reportno and a.rowno = b.rowno where b.reportno is null;"

从脚本中主要学习了，变量参数的使用、hive下增量数据的又一算法、sqoop中where过滤的使用等。

另：通过spark-shell执行sql的样例：
sqlContext.sql("select * from 99_dorm limit 100").show