sqoop mysql 乱码_sqoop从mysql导入到hdfs出现乱码问题

最新推荐文章于 2021-11-19 10:59:37 发布

王诗沐

最新推荐文章于 2021-11-19 10:59:37 发布

阅读量326

点赞数 1

文章标签： sqoop mysql 乱码

本文链接：https://blog.csdn.net/weixin_29156401/article/details/113543099

版权

最近把hive元数据库的快照数据导入到hdfs中，以便对历史的元数据进行查询。

命令如下：

sqoop import -D mapred.job.queue.name=do.production --connect 'jdbc:mysql://qqq:3306/hive?defaultFetchSize=1000&useUnicode=true&characterEncoding=UTF-8&characterSetResults=utf8&characterSetConnection=utf8&characterSetClient=utf8' --username aaa --password bbb --target-dir hdfs://xxxxxx/hive/warehouse/hive_metadata.db/nucleus_tables/daykey=20171024/1508813696533_22e17035-894f-41e4-869c-fb154673da9e --null-string '\\N' --split-by CLASS_NAME --hive-drop-import-delims --delete-target-dir --hive-overwrite --null-non-string '\\N' --lines-terminated-by "\n" --fields-terminated-by '\001' --query 'select *, 20171024 from hive.NUCLEUS_TABLES where $CONDITIONS'

但是导入过程中出现了错误：

Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'org.apache.hadoop.hive.metastore.model.MQä�§è�°è�³è�ªnì�£è�˜' )' at line 1

sqoop会按照split-by的字段组成这个sql，然后用这个区间的CLASS_NAME做切分，给mapreduce执行：

SELECT MIN(CLASS_NAME), MAX(CLASS_NAME) FROM (select *, 20171024 from hive.NUCLEUS_TABLES where (1 = 1) ) AS t1

看到乱码问题，直接在mysql连接串中加上:useUnicode=true&characterEncoding=UTF-8，但还是不行

试了一个早上，发现NUCLEUS_TABLES表都是varchar字段，而且长度还是蛮长的，心想是不是split-by不支持varchar字段还是要求有长度限制呢？

于是改了一下执行命令

在split-by中加上截取字符串的函数：

sqoop import -D mapred.job.queue.name=do.production --connect 'jdbc:mysql://qqq:3306/hive?defaultFetchSize=1000&useUnicode=true&characterEncoding=UTF-8&characterSetResults=utf8&characterSetConnection=utf8&characterSetClient=utf8' --username aaa --password bbb --target-dir hdfs://xxxxxx/hive/warehouse/hive_metadata.db/nucleus_tables/daykey=20171024/1508813696533_22e17035-894f-41e4-869c-fb154673da9e --null-string '\\N' --split-by 'substring(CLASS_NAME, 40, 1)' --hive-drop-import-delims --delete-target-dir --hive-overwrite --null-non-string '\\N' --lines-terminated-by "\n" --fields-terminated-by '\001' --query 'select *, 20171024 from hive.NUCLEUS_TABLES where $CONDITIONS'

于是成功了~

上面出错的org.apache.hadoop.hive.metastore.model.MQ到最后的Q就已经出错了，在mysql中找不到记录，数了一下前面的字符，到M有正好有40个。怀疑split-by一个字段有40个长度的限制，否则有问题

王诗沐

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
sqoop mysql 乱码_sqoop从mysql导入到hdfs出现乱码问题

最近把hive元数据库的快照数据导入到hdfs中，以便对历史的元数据进行查询。命令如下：sqoop import -D mapred.job.queue.name=do.production --connect 'jdbc:mysql://qqq:3306/hive?defaultFetchSize=1000&useUnicode=true&characterEncoding=UT...
复制链接

扫一扫