这是我hive表的数据结构
+------------+------------+----------+--+
| col_name | data_type | comment |
+------------+------------+----------+--+
| ipcount | bigint | |
| pv | bigint | |
| jump_num | bigint | |
| jump_rate | double | |
| reg_num | bigint | |
+------------+------------+----------+--+
这是我创建的mysql表的数据结构:
+-----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| ipcount | varchar(20) | YES | | NULL | |
| pv | varchar(20) | YES | | NULL | |
| jump_num | varchar(20) | YES | | NULL | |
| jump_rate | varchar(20) | YES | | NULL | |
| reg_num | varchar(20) | YES | | NULL | |
+-----------+-------------+------+-----+---------+-------+
这是我的sqoop语句:
bin/sqoop export \
--connect jdbc:mysql://Maricle05:3306/weblog \
--username root \
--password 123456 \
--table weblog_anlay \
--num-mappers 1 \
--input-fields-terminated-by "\t" \
--export-dir /user/hive/warehouse/weblog.db/weblog_anlay
这是报错信息:
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.RuntimeException: Can't parse input data: '317301948789105300.005403355622389083335'
at weblog_anlay.__loadFromFields(weblog_anlay.java:378)
at weblog_anlay.parse(weblog_anlay.java:306)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:834)
at weblog_anlay.__loadFromFields(weblog_anlay.java:358)
... 12 more
19/08/07 19:05:36 INFO mapreduce.Job: Task Id : attempt_1565174983974_0002_m_000000_1, Status : FAILED
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.RuntimeException: Can't parse input data: '317301948789105300.005403355622389083335'
at weblog_anlay.__loadFromFields(weblog_anlay.java:378)
at weblog_anlay.parse(weblog_anlay.java:306)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:834)
at weblog_anlay.__loadFromFields(weblog_anlay.java:358)
... 12 more
19/08/07 19:05:40 INFO mapreduce.Job: Task Id : attempt_1565174983974_0002_m_000000_2, Status : FAILED
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.RuntimeException: Can't parse input data: '317301948789105300.005403355622389083335'
at weblog_anlay.__loadFromFields(weblog_anlay.java:378)
at weblog_anlay.parse(weblog_anlay.java:306)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:834)
at weblog_anlay.__loadFromFields(weblog_anlay.java:358)
... 12 more
很明显说的是解析错误,字段不同,在认真确认两个表的字段相同后,认真检查报错发现一个语句:
Can't parse input data: '317301948789105300.005403355622389083335'
然后更改我的sqoop代码:
bin/sqoop export \
--connect jdbc:mysql://Maricle05:3306/weblog \
--username root \
--password 123456 \
--table weblog_anlay \
--num-mappers 1 \
--input-fields-terminated-by "\001" \
--export-dir /user/hive/warehouse/weblog.db/weblog_anlay
成功导入;
原因:因为我在hive中创建表是通过create table tablename as select …语句,没有指定他的分隔符,而hive默认的分隔符是\001,因此解析时,出现字段数量不同而无法导入。
再次强调:hive默认的分隔符是\001