1. 问题阐述
找不到方法: org.apache.hadoop.hbase.client.Put.add([B[B[B)Lorg/apache/hadoop/hbase/client/Put
java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.Put.add([B[B[B)Lorg/apache/hadoop/hbase/client/Put;
at org.apache.spark.examples.pythonconverters.StringListToPutConverter.convert(HBaseConverters.scala:81)
at org.apache.spark.examples.pythonconverters.StringListToPutConverter.convert(HBaseConverters.scala:77)
at org.apache.spark.api.python.PythonHadoopUtil$$anonfun$convertRDD$1.apply(PythonHadoopUtil.scala:181)
at org.apache.spark.api.python.PythonHadoopUtil$$anonfun$convertRDD$1.apply(PythonHadoopUtil.scala:181)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:129)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:127)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:139)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
21/04/07 16:58:58 ERROR Utils: Aborting task
2. 问题解析
spark-examples
里面的StringListToPutConverter
类调用了hbase-client
里面的Put.add
函数,由于hbase升级了到2.*之后,hbase-client
的Put.add
接口变了。从Put.add(Byte[...], Byte[...], Byte[...])
变成了 Put.addColumn(Byte[...], Byte[...], Byte[...])
。
3. 解决办法
3.1 更正代码
下载spark最新版的源码,修改StringListToPutConverter
类并重新打包spark-examples
。StringListToPutConverter
的源码:
class StringListToPutConverter extends Converter[Any, Put] {
override def convert(obj: Any): Put = {
val output = obj.asInstanceOf[java.util.ArrayList[String]].asScala.map(Bytes.toBytes).toArray
val put = new Put(output(0))
put.add(output(1), output(2), output(3))
}
}
更正后的代码:
class StringListToPutConverter extends Converter[Any, Put] {
override def convert(obj: Any): Put = {
val output = obj.asInstanceOf[java.util.ArrayList[String]].asScala.map(Bytes.toBytes).toArray
val put = new Put(output(0))
put.addColumn(output(1), output(2), output(3))
}
}
3.2 重新打包
重新用maven打包,使用的maven命令是:
mvn clean install -e -X -pl :spark-examples_2.11_hardfixed
注意:要在spark-examples的子项目的pom.xml中修改对应的artifact_id。
3.3 替换jar包
在target目录下得到兼容了新接口的original-spark-examples_2.11_hardfixed-2.4.3.jar
包。替换掉原来目录/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/jars/hbase/
下的spark-examples_2.11-1.6.0-typesafe-001.jar
包,最终成功写入。我修复过的jar包,下载解压后放到指定路径下可用。
附件:original-spark-examples_2.11_hardfixed-2.4.3.jar
4. 打包过程中可能会遇到的问题:
- compile插件报错:
注意要使用java1.8 JDK。使用java 12的编译环境会报编译器插件错误。
- checker报错:
scala-style-checker和maven checker的错误,按照google的解法。
方法2: 在主pom中注释掉整个插件。