环境:
sqoop版本:1.4.6
hive版本:2.1.1
hadoop版本:2.7.3
起因:
orc表导出到mysql有时碰到空字符串的问题一直没有时间整理,这次测试一下
结论:
hive中为string类型的空字符串‘’数据导出到mysql中bigint或者int类型字段时会报错。
空值null没问题。
过程:
1、创建临时库分区orc表
CREATE TABLE temp.js_pianyuan_orc(
dates string,
city_id string,
starttime string,
endtime string,
programid string,
device_id string,
ip string)
PARTITIONED BY (
curdate string)
stored as orc;
2、创建mysql测试表
注意hive中的表字段和mysql中的表字段名字必须一致
CREATE TABLE js_pianyuan(
dates varchar(64) DEFAULT NULL,
city_id varchar(64) DEFAULT NULL,
starttime varchar(64) DEFAULT NULL,
endtime varchar(64) DEFAULT NULL,
programid varchar(64) DEFAULT NULL,
device_id varchar(64) DEFAULT NULL,
ip varchar(64) DEFAULT NULL)
ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
3、数据检查
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (dates is null or trim(dates) = '');
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (city_id is null or trim(city_id) = '');
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (starttime is null or trim(starttime) = '');
5040
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (endtime is null or trim(endtime) = '');
5040
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (programid is null or trim(programid) = '');
5144
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (device_id is null or trim(device_id) = '');
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (ip is null or trim(ip) = '');
最后确认starttime、endtime和programid 有空字符串数据
4、22号日期原数据测试导出
insert overwrite table temp.js_pianyuan_orc partition(curdate)
select * from dm_bas.js_pianyuan where curdate = '20180122';
sqoop export \
--hcatalog-database temp \
--hcatalog-table js_pianyuan_orc \
--hcatalog-partition-keys curdate \
--hcatalog-partition-values 20180122 \
--connect jdbc:mysql://ip:3306/test \
--username username \
--password passwd --m 10\
--table js_pianyuan
结果没有异常
18/01/24 10:34:58 INFO mapreduce.ExportJobBase: Transferred 178.8203 MB in 1,341.2754 seconds (136.5208 KB/sec)
18/01/24 10:34:58 INFO mapreduce.ExportJobBase: Exported 15558677 records.
5、空字符串改成字符串null测试导出
22号原始数据插入测试表23号
insert overwrite table temp.js_pianyuan_orc partition(curdate='20180123')
select
case when trim(dates) = '' then 'null' else dates end,
case when trim(city_id) = '' then 'null' else city_id end,
case when trim(starttime) = '' then 'null' else starttime end,
case when trim(endtime) = '' then 'null' else endtime end,
case when trim(programid) = '' then 'null' else programid end,
case when trim(device_id) = '' then 'null' else device_id end,
case when trim(ip) = '' then 'null' else ip end
from dm_bas.js_pianyuan where curdate = '20180122';
结果没有异常
18/01/24 10:48:23 INFO mapreduce.ExportJobBase: Transferred 178.8589 MB in 714.5818 seconds (256.3058 KB/sec)
18/01/24 10:48:23 INFO mapreduce.ExportJobBase: Exported 15558677 records.
6、空字符串改成null测试
22号原始数据插入测试表24号
insert overwrite table temp.js_pianyuan_orc partition(curdate='20180124')
select
case when trim(dates) = '' then null else dates end,
case when trim(city_id) = '' then null else city_id end,
case when trim(starttime) = '' then null else starttime end,
case when trim(endtime) = '' then null else endtime end,
case when trim(programid) = '' then null else programid end,
case when trim(device_id) = '' then null else device_id end,
case when trim(ip) = '' then null else ip end
from dm_bas.js_pianyuan where curdate = '20180122';
结果没有异常
18/01/24 11:25:53 INFO mapreduce.ExportJobBase: Transferred 178.8346 MB in 1,275.8381 seconds (143.5344 KB/sec)
18/01/24 11:25:53 INFO mapreduce.ExportJobBase: Exported 15558677 records.
7、更改mysql字段类型为bigint测试导出22号含有空字符串原数据
alter table js_pianyuan change programid `programid` bigint(64);
空字符串异常
结果:
hive中的string类型字段有空字符串数据的导出到mysql端varchar字段类型没问题,但是导出mysql端int或者bigint类型会报异常。这种数据要单独处理。
8、更改mysql字段类型为bigint测试导出24号含有空值null的原数据
18/01/24 12:35:00 INFO mapreduce.ExportJobBase: Transferred 178.8346 MB in 1,100.6016 seconds (166.3878 KB/sec)
18/01/24 12:35:00 INFO mapreduce.ExportJobBase: Exported 15558677 records.
结果没有异常。