spark写入hbase时报错java.lang.IllegalArgumentException: Can not create a Path from a null string

在使用setAsNewAPIHadoopDataset写数据到Hbase时发生如下错误:

java.lang.IllegalArgumentException: Can not create a Path from a null string
        at org.apache.hadoop.fs.Path.checkPathArg(Path.java:123)
        at org.apache.hadoop.fs.Path.<init>(Path.java:135)
        at org.apache.hadoop.fs.Path.<init>(Path.java:89)
        at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.absPathStagingDir(HadoopMapReduceCommitProtocol.scala:58)
        at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:132)
        at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.write(SparkHadoopMapReduceWriter.scala:101)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1085)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
        at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1084)

写Hbase的源代码如下:(Scala版) Spark 2.2

import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.Put
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.mapreduce.Job
import org.apache.spark.{SparkConf, SparkContext}

/**
  * Description: Put data into Hbase by map reduce Job.
  *
  * Author : Adore Chen
  * Created: 2017-12-22
  */
object SparkMapJob {

    /**
      * insert 100,000 cost 21035 ms
      *
      * @param args
      */
    def main(args: Array[String]): Unit = {
      val conf = new SparkConf().setAppName("SparkPutByMap")
      val context = new SparkContext(conf)

      val hbaseConf =HBaseConfiguration.create()
      hbaseConf.set(TableOutputFormat.OUTPUT_TABLE, "test_table")
      //IMPORTANT: must set the attribute to solve the problem (can't create path from null string )
      hbaseConf.set("mapreduce.output.fileoutputformat.outputdir", "/tmp")

      val job = Job.getInstance(hbaseConf)
      job.setOutputFormatClass(classOf[TableOutputFormat[ImmutableBytesWritable]])
      job.setOutputKeyClass(classOf[ImmutableBytesWritable])
      job.setOutputValueClass(classOf[Put])

      try{
        val rdd = context.makeRDD(1 to 100000)

        // column family
        val family = Bytes.toBytes("cf")
        // column counter --> ctr
        val column = Bytes.toBytes("ctr")

        rdd.map(value => {
          var put = new Put(Bytes.toBytes(value))
          put.addImmutable(family, column, Bytes.toBytes(value))
          (new ImmutableBytesWritable(), put)
          })
          .saveAsNewAPIHadoopDataset(job.getConfiguration)
      }finally{
        context.stop()
      }
    }

}

这是spark的一个bug,具体信息查看:

https://issues.apache.org/jira/browse/SPARK-21549

解决方案:

//IMPORTANT: must set the attribute to solve the problem (can’t create path from null string )
hbaseConf.set(“mapreduce.output.fileoutputformat.outputdir”, “/tmp”)

参考信息:

https://github.com/hortonworks-spark/shc/issues/15

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值