本人在同步两个集群的hive数据时遇到了一个坑爹的问题,A集群X表为RCFILE格式,B集群Y表为ORCFILE格式,现需要将X表的历史数据同步到Y表中。当初天真的以为直接导出导入就能解决问题了,完全忽略了存储格式不同的问题,结果同步完后,查询时直接一个错误砸我脸上。部分错误如下:
Caused by:java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:251)
... 11 more
Caused by:java.io.IOException: Malformed ORC filehdfs://tongjihadoop151:8020/user/hive/warehouse/client_uv_daily_test/dt=2015-07-23/client_uv_2015-07-23.Invalid postscript.