问题 spark读取hbase数据写入ES报错

业务流程

首先说说我们的业务实现,数据存储在hbase中,通过hive表关联,然后运行spark任务,读取hive表数据,调用es提供的saveToEs方法,将dataframe写入ES

错误日志
Job aborted due to stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 0.0 (TID 5, hdh12, executor 8): java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
	at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:869)
	at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:127)
	at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:115)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1251)
	at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:129)
	at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:136)
	at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:506)
	at org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:423)
	at org.apache.spark.storage.BlockManager.get(BlockManager.scala:658)
	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:229)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

上面错误提示,spark的单个partition数据量超过2G,导致任务失败。

解决方法

首先我们查看hbase UI查看读取的hbase 表的region分布,可以明显看到数据region数量很少,
region分布
手动 split region,通过 hbase shell执行 :
split ‘name_table’

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值