sqoop导出 建表_hive中orc表sqoop导出到mysql

环境:

sqoop版本:1.4.6

hive版本:2.1.1

hadoop版本:2.7.3

起因:

orc表导出到mysql有时碰到空字符串的问题一直没有时间整理,这次测试一下

结论:

hive中为string类型的空字符串‘’数据导出到mysql中bigint或者int类型字段时会报错。

空值null没问题。

过程:

1、创建临时库分区orc表

CREATE TABLE temp.js_pianyuan_orc(

dates string,

city_id string,

starttime string,

endtime string,

programid string,

device_id string,

ip string)

PARTITIONED BY (

curdate string)

stored as orc;

2、创建mysql测试表

注意hive中的表字段和mysql中的表字段名字必须一致

CREATE TABLE js_pianyuan(

dates varchar(64) DEFAULT NULL,

city_id varchar(64) DEFAULT NULL,

starttime varchar(64) DEFAULT NULL,

endtime varchar(64) DEFAULT NULL,

programid varchar(64) DEFAULT NULL,

device_id varchar(64) DEFAULT NULL,

ip varchar(64) DEFAULT NULL)

ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;

3、数据检查

select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (dates is null or trim(dates) = '');

select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (city_id is null or trim(city_id) = '');

select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (starttime is null or trim(starttime) = '');

5040

select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (endtime is null or trim(endtime) = '');

5040

select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (programid is null or trim(programid) = '');

5144

select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (device_id is null or trim(device_id) = '');

select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (ip is null or trim(ip) = '');

最后确认starttime、endtime和programid 有空字符串数据

4、22号日期原数据测试导出

insert overwrite table temp.js_pianyuan_orc partition(curdate)

select * from dm_bas.js_pianyuan where curdate = '20180122';

sqoop export \

--hcatalog-database temp \

--hcatalog-table js_pianyuan_orc \

--hcatalog-partition-keys curdate \

--hcatalog-partition-values 20180122 \

--connect jdbc:mysql://ip:3306/test \

--username username \

--password passwd --m 10\

--table js_pianyuan

结果没有异常

18/01/24 10:34:58 INFO mapreduce.ExportJobBase: Transferred 178.8203 MB in 1,341.2754 seconds (136.5208 KB/sec)

18/01/24 10:34:58 INFO mapreduce.ExportJobBase: Exported 15558677 records.

5、空字符串改成字符串null测试导出

22号原始数据插入测试表23号

insert overwrite table temp.js_pianyuan_orc partition(curdate='20180123')

select

case when trim(dates) = '' then 'null' else dates end,

case when trim(city_id) = '' then 'null' else city_id end,

case when trim(starttime) = '' then 'null' else starttime end,

case when trim(endtime) = '' then 'null' else endtime end,

case when trim(programid) = '' then 'null' else programid end,

case when trim(device_id) = '' then 'null' else device_id end,

case when trim(ip) = '' then 'null' else ip end

from dm_bas.js_pianyuan where curdate = '20180122';

结果没有异常

18/01/24 10:48:23 INFO mapreduce.ExportJobBase: Transferred 178.8589 MB in 714.5818 seconds (256.3058 KB/sec)

18/01/24 10:48:23 INFO mapreduce.ExportJobBase: Exported 15558677 records.

6、空字符串改成null测试

22号原始数据插入测试表24号

insert overwrite table temp.js_pianyuan_orc partition(curdate='20180124')

select

case when trim(dates) = '' then null else dates end,

case when trim(city_id) = '' then null else city_id end,

case when trim(starttime) = '' then null else starttime end,

case when trim(endtime) = '' then null else endtime end,

case when trim(programid) = '' then null else programid end,

case when trim(device_id) = '' then null else device_id end,

case when trim(ip) = '' then null else ip end

from dm_bas.js_pianyuan where curdate = '20180122';

结果没有异常

18/01/24 11:25:53 INFO mapreduce.ExportJobBase: Transferred 178.8346 MB in 1,275.8381 seconds (143.5344 KB/sec)

18/01/24 11:25:53 INFO mapreduce.ExportJobBase: Exported 15558677 records.

7、更改mysql字段类型为bigint测试导出22号含有空字符串原数据

alter table js_pianyuan change programid `programid` bigint(64);

空字符串异常

结果:

hive中的string类型字段有空字符串数据的导出到mysql端varchar字段类型没问题,但是导出mysql端int或者bigint类型会报异常。这种数据要单独处理。

8、更改mysql字段类型为bigint测试导出24号含有空值null的原数据

18/01/24 12:35:00 INFO mapreduce.ExportJobBase: Transferred 178.8346 MB in 1,100.6016 seconds (166.3878 KB/sec)

18/01/24 12:35:00 INFO mapreduce.ExportJobBase: Exported 15558677 records.

结果没有异常。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值