目的:
为了保留ads(应用层)的历史数据,所以把ads层的表建成了带分区的表。
带来的问题:
sqoop导出不带分区字段的列,可以正常导出;
sqoop导出带分区字段的列,就报错如下:
Can't parse input data: '0'
2022-01-14 09:35:33,618 INFO mapreduce.Job: Running job: job_1641534259005_0140
2022-01-14 09:35:38,669 INFO mapreduce.Job: Job job_1641534259005_0140 running in uber mode : false
2022-01-14 09:35:38,670 INFO mapreduce.Job: map 0% reduce 0%
2022-01-14 09:35:41,702 INFO mapreduce.Job: Task Id : attempt_1641534259005_0140_m_000000_0, Status : FAILED
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.lang.RuntimeException: Can't parse input data: '0'
at ads_visit_stats.__loadFromFields(ads_visit_stats.java:378)
at ads_visit_stats.parse(ads_visit_stats.java:306)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:862)
at ads_visit_stats.__loadFromFields(ads_visit_stats.java:373)
... 12 more
分析:
经过测试,在带分区的hive表中,导出的时候columes不包括分区字段,是正常的。
在网上搜了很多也看了官网了,还是没有解决这个问题。
一直在问自己:“是不是hdfs上hive表的字段无法对应上mysql上字段?是不是对应上就可以正常导出,那在hive上再加一列date列不是就好了”
解决方法:
经过测试,此方法可行:就是加个冗余字段"date"做为导出到mysql中的日期字段。
大功告成,笨办法,也能解决问题!!!!