1.实现过程包括两步。
第一步将mysql的数据通过条件语句增量导入导入到hive的一个临时表中。
第二步将临时表中的数据通过动态分区的方式导入到最终的结果表。
增量导入hive临时表(可以不使用分区表,需要设置了资源队列):
sqoop import
-D mapred.job.queue.name=root.zm_yarn_pool.production
-Dorg.apache.sqoop.splitter.allow_text_splitter=true
-m 8
–connect “jdbc:mysql://
m
y
s
q
l
i
p
/
{mysql_ip}/
mysqlip/{db_name}?tinyInt1isBit=false”
–username “"
–password "*********”
–target-dir /user/admin/rank/ods_call_logs_from_tinet
–mapreduce-job-name sqoop_task_daily
–hive-import
–hive-overwrite
–hive-database “ods”
–hive-table “ods_call_logs_from_tinet”
–hive-drop-import-delims
–split-by ‘created_at’
–hive-partition-key ‘pt’
–hive-partition-value “2019-05-27”
–query ‘select * from call_logs_from_tinet where created_at > “2019-05-26” and $CONDITIONS’
–delete-target-dir
–null-string ‘\N’
–null-non-string ‘\N’
-z
临时表导入到最终的动态结果表.。(设置了资源队列)
hive 执行下面语句。
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.merge.mapredfiles=true;
set mapreduce.job.queuename=root.zm_yarn_pool.production;
use test;
INSERT OVERWRITE TABLE test.ods_call_logs_from_tinet PARTITION(pk_year,pk_month,pk_day)
select id,client_no,client_mobile,status,bridge_duration,start_time,end_time,created_at,updated_at,origin,custom_mobile,record_url,cust_callee_clid,stu_user_id,sel_user_id,sel_team_id,stu_recycle,unique_id,substr(created_at, 1, 4),substr(created_at, 1, 7),substr(created_at, 1, 10)
from ods.ods_call_logs_from_tinet where pt '2019-05-27 ';