为了好管理,我把sqoop命令写到了sh中,输入一个日期类似2017-03-06就可以把hdfs中的数据导入到oracle中。
sh脚本如下:
----------------------------------------------------------
#!/bin/sh
source ~/.bashrc
if [[ -z $1 ]];then
input_time=`date +%Y-%m-%d -d '-1day'`
else
input_time=$1
fi
echo "current_time:=>"`date '+%Y-%m-%d %H:%M:%S'`
url="jdbc:oracle:thin:@192.168.1.1:1521:demodb"
user_name="orcl"
password="orcl"
batch_nums=10000
map_nums=10
echo ""
echo "--------programme parameters--------"
echo "load_data_date:"${input_time}
echo "url:"${url}
echo "user_name:"${user_name}
echo "password:"${password}
echo "batch_nums:"${batch_nums}
echo "map_nums:"${map_nums}
echo "--------programme parameters end--------"
echo ""
echo `date '+%Y-%m-%d %H:%M:%S'`":=> tpl_mro_adjtdl_day start!"
sqoop export \
-Dsqoop.export.records.per.statement=${batch_nums} \
--map-column-java Scan_start_time=java.sql.Timestamp,Scan_stop_time=java.sql.Timestamp,insert_time=java.sql.Timestamp \
--map-column-hive Scan_start_time=TIMESTAMP,Scan_stop_time=TIMESTAMP,insert_time=TIMESTAMP \
--connect ${url} \
--username ${user_name} \
--password ${password} \
--table mruser.tpl_mro_adjtdl_day \
-m ${map_nums} \
--export-dir /user/hive/warehouse/tpl_mro_adjtdl_day/omcpart=*/datepart=${input_time} \
--input-fields-terminated-by '|' --input-lines-terminated-by '\n' \
--columns int_id,omc_id,vendor_id,m_name,m_int_id,m_celllocalid,enodeb_name,related_enodeb,n_name,n_int_id,n_celllocalid,enb_id,n_enb_id,Scan_start_time,Scan_stop_time,insert_time,ne_type,sum_level,distince,defined_ncell,error_flag,diff0,diff1,diff2,diff3,diff4,diff5,diff6,diff7,diff8,diff9,diff10,total_num,nc_total_num,nc_sc_overf6,nc_sc_overf3,nc_sc_overf12,nc_sc_overf6_rate,ncrsrp0,ncrsrp1,ncrsrp2,ncrsrp3,ncrsrp4,ncrsrp5,ncrsrp6,ncrsrp7,ncrsrp8,ncrsrp9,ncrsrp10 \
--batch ;
echo `date '+%Y-%m-%d %H:%M:%S'`":=> tpl_mro_adjtdl_day end!"
----------------------------------------
备注:
--map-column-java Scan_start_time=java.sql.Timestamp,Scan_stop_time=java.sql.Timestamp,insert_time=java.sql.Timestamp \
--map-column-hive Scan_start_time=TIMESTAMP,Scan_stop_time=TIMESTAMP,insert_time=TIMESTAMP \
这两个参数是由于在hdfs中这三列字段是string类型,而oracle中是date类型,如果不强制指定hive和java的数据格式,会导致报错:
程序会报出 Caused by: java.lang.RuntimeException: Can't parse input data:'2016-03-06 00:00:00'