问题
spark-sql执行自己打包的代码,代码中最终把数据写入到hudi。
自己的代码是通过maven-shade-plugin打包后,在集群环境中运行,出现下面的报错java.lang.NoSuchMethodError: com.esotericsoftware.kryo.Kryo$DefaultInstantiatorStrategy.(Lorg/objenesis/strategy/InstantiatorStrategy;),说明是包冲突导致的。
Exception in thread "stream execution thread for hudi_stream_1 [id = 685574bf-78c3-48f1-8ffb-57775f6b49f1, runId = 613225c1-f455-42ff-8b80-b6824c9bd7e1]" java.lang.NoSuchMethodError: com.esotericsoftware.kryo.Kryo$DefaultInstantiatorStrategy.<init>(Lorg/objenesis/strategy/InstantiatorStrategy;)V
at org.apache.hudi.common.util.SerializationUtils$KryoInstantiator.newKryo(SerializationUtils.java:120)
at org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.<init>(SerializationUtils.java:89)
at java.lang.ThreadLocal$SuppliedThreadLocal.initialValue(ThreadLocal.java:305)
at java.lang.ThreadLocal.setInitialValue(ThreadLocal.java:195)
at java.lang.ThreadLocal.get(ThreadLocal.java:172)
at org.apache.hudi.common.util.SerializationUtils.serialize(SerializationUtils.java:52)
at org.apache.hudi.common.table.log.block.HoodieDeleteBlock.getContentBytes(HoodieDeleteBlock.java:71)
at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlock(HoodieLogFormatWriter.java:135)
at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.initializeFileGroups(HoodieBackedTableMetadataWriter.java:710)
at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.initializeEnabledFileGroups(HoodieBackedTableMetadataWriter.java:665)
at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.initializeFromFilesystem(HoodieBackedTableMetadataWriter.java:546)
at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.initializeIfNeeded(HoodieBackedTableMetadataWriter.java:380)
at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.initialize(SparkHoodieBackedTableMetadataWriter.java:120)
at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.<init>(HoodieBackedTableMetadataWriter.java:170)
at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.<init>(SparkHoodieBackedTableMetadataWriter.java:89)
at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.create(SparkHoodieBackedTableMetadataWriter.java:75)
at org.apache.hudi.client.SparkRDDWriteClient.initializeMetadataTable(SparkRDDWriteClient.java:446)
at org.apache.hudi.client.SparkRDDWriteClient.doInitTable(SparkRDDWriteClient.java:431)
at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1458)
at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1490)
at org.apache.hudi.client.SparkRDDWriteClient.insert(SparkRDDWriteClient.java:177)
at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:231)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:307)
at org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.executeUpsert(MergeIntoHoodieTableCommand.scala:294)
at org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.run(MergeIntoHoodieTableCommand.scala:158)
at org.apache.spark.sql.emr.commands.merge.MergeIntoHudiOperator.doMergeInto(MergeIntoHudiOperator.scala:48)
at org.apache.spark.sql.emr.commands.CreateStreamCommand$$anonfun$run$3$$anonfun$apply$2.apply(stream.scala:219)
at org.apache.spark.sql.emr.commands.CreateStreamCommand$$anonfun$run$3$$anonfun$apply$2.apply(stream.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.emr.commands.CreateStreamCommand$$anonfun$run$3.apply(stream.scala:175)
at org.apache.spark.sql.emr.commands.CreateStreamCommand$$anonfun$run$3.apply(stream.scala:157)
at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:35)
如何找到引起包冲突的jar
方法1
到应用程序运行的classpath目录下面执行下面的命令,找到存在DefaultInstantiatorStrategy的jar包
for jar in $(ls .);do echo $jar; jar -vtf $jar |grep DefaultInstantiatorStrategy;done
输出结果如下
emr-datasources_shaded_2.11-2.3.0.jar
1739 Sun May 04 15:02:20 CST 2014 com/esotericsoftware/kryo/Kryo$DefaultInstantiatorStrategy$1.class
1730 Sun May 04 15:02:20 CST 2014 com/esotericsoftware/kryo/Kryo$DefaultInstantiatorStrategy$2.class
3111 Sun May 04 15:02:20 CST 2014 com/esotericsoftware/kryo/Kryo$DefaultInstantiatorStrategy.class
方法2
spark-sql执行的命令中加上jvm的参数 -verbose:class(spark-sql --driver-java-options “-verbose:class”),在输出的日志中,也可以看到这个DefaultInstantiatorStrategy,是从哪个包加载的。输出结果如下。发现跟上面方法1找到的jar是一样的。
解决问题
relocation
maven-shade-plugin中加上relocation的配置
<relocation>
<pattern>com.esotericsoftware.kryo</pattern>
<shadedPattern>${emr.shade.packageName}.kryo</shadedPattern>
</relocation>
思考
解决方案其实很简单,其实花费了很多时间分析这个jar包从哪里引入的。之所以说花费很多时间找,是因为这个jar其实不是一个普通的jar引入的,而是由一个shade jar引入的,所以使用maven 的包检测都看到这个jar的版本。那么最终采用何种方式找到的呢。方式下面所示
- 使用idea反编译emr-datasources_shaded_2.11-2.3.0.jar,找到这个com.esotericsoftware.kryo的META-INF中记录的版本
- 进入emr-datasources_shaded_2.11-2.3.0.jar所在的工程,全局搜索groupId=com.esotericsoftware.kryo,发现是holo-client这个shaded jar引入的