(1)使用Text格式
sqoop import --connect jdbc:oracle:thin:@//IP:1521/ASMP2 --username --password --query "SELECT * FROM SBPOPT.TT_MAINTENANCE_TIMES_CORRECT where \$CONDITIONS" --fields-terminated-by '\t' --delete-target-dir --target-dir /user/asmp/hive/asmp/tt_maintenance_times_correct -m 1
导入到HDFS结果如下:
原本Oracle是NULL的字段,被转成字符串"null",结果我在SQL使用
NVL(c.business_correct_times,c.sys_definition_times) 才发现针对 null 无效。。。
(2)只能换成Parquet格式
sqoop import --connect jdbc:oracle:thin:@//IP:1521/ASMP2 --username --password --query "SELECT * FROM SBPOPT.TT_MAINTENANCE_TIMES_CORRECT where \$CONDITIONS" --as-parquetfile --delete-target-dir --target-dir /user/asmp/hive/asmp/tt_maintenance_times_correct -m 1
导入到HDFS结果如下:
Hive建表语句:(字段类型必须和sqoop导出的文件保持一致)
drop table asmp.tt_maintenance_times_correct;
create external table if not exists asmp.tt_maintenance_times_correct
(
id string,
product_code string,
product_name string,
first_billing_date bigint,
last_billing_date bigint,
sale_amount string,
sys_definition_times string,
business_correct_times string,
correct_status string,
correct_date bigint,
correct_people string,
create_by string,
create_date bigint,
update_by string,
update_date bigint
) COMMENT 'asmp临时表'
STORED AS parquet
location '/user/asmp/hive/asmp/tt_maintenance_times_correct';
如果oracle表中字段中会有换行符,会导致数据存入hive后,条数增多(每个换行符前后拆分成两行),所以需要特殊字符处理,方法如下:
#对换行等特殊字符的替换成" "
--hive-delims-replacement " "
#对换行等特殊字符删除
--hive-drop-import-delims