Spark数据写入Kudu,报Caused by: java.lang.RuntimeException: PendingErrors overflowed. Failed to write at..

错误如下:

2021-02-01 17:11:13 ERROR TaskSetManager:73 - Task 0 in stage 4.0 failed 1 times; aborting job
2021-02-01 17:11:13 INFO  TaskSchedulerImpl:57 - Removed TaskSet 4.0, whose tasks have all completed, from pool 
2021-02-01 17:11:13 INFO  TaskSchedulerImpl:57 - Cancelling stage 4
2021-02-01 17:11:13 INFO  TaskSchedulerImpl:57 - Killing all running tasks in stage 4: Stage cancelled
2021-02-01 17:11:13 INFO  DAGScheduler:57 - ResultStage 4 (foreachPartition at KuduContext.scala:350) failed in 141.587 s due to Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 4, localhost, executor driver): java.lang.RuntimeException: PendingErrors overflowed. Failed to write at least 1000 rows to Kudu; Sample errors: Timed out: cannot complete before timeout: Batch{operations=256, tablet="4a02e65bac264694b14faeee409987d1" [0x00000002, 0x00000003), ignoredErrors=[], rpc=KuduRpc(method=Write, tablet=4a02e65bac264694b14faeee409987d1, attempt=23, TimeoutTracker(timeout=30000, elapsed=28936), Trace Summary(28936 ms): Sent(23), Received(23), Delayed(23), MasterRefresh(0), AuthRefresh(0), Truncated: false
 Sent: (49d6bbea2c404778a65f3738a72c6ae9, [ Write, 23 ])
 Received: (49d6bbea2c404778a65f3738a72c6ae9, [ SERVICE_UNAVAILABLE, 23 ])
 Delayed: (UNKNOWN, [ Write, 23 ]))}Timed out: cannot complete before timeout: Batch{operations=256, tablet="4a02e65bac264694b14faeee409987d1" [0x00000002, 0x00000003), ignoredErrors=[], rpc=KuduRpc(method=Write, tablet=4a02e65bac264694b14faeee409987d1, attempt=23, TimeoutTracker(timeout=30000, elapsed=28936), Trace Summary(28936 ms): Sent(23), Received(23), Delayed(23), MasterRefresh(0), AuthRefresh(0), Truncated: false
 Sent: (49d6bbea2c404778a65f3738a72c6ae9, [ Write, 23 ])
 Received: (49d6bbea2c404778a65f3738a72c6ae9, [ SERVICE_UNAVAILABLE, 23 ])
 Delayed: (UNKNOWN, [ Write, 23 ]))}Timed out: cannot complete before timeout: Batch{operations=256, tablet="4a02e65bac264694b14faeee409987d1" [0x00000002, 0x00000003), ignoredErrors=[], rpc=KuduRpc(method=Write, tablet=4a02e65bac264694b14faeee409987d1, attempt=23, TimeoutTracker(timeout=30000, elapsed=28936), Trace Summary(28936 ms): Sent(23), Received(23), Delayed(23), MasterRefresh(0), AuthRefresh(0), Truncated: false
 Sent: (49d6bbea2c404778a65f3738a72c6ae9, [ Write, 23 ])
 Received: (49d6bbea2c404778a65f3738a72c6ae9, [ SERVICE_UNAVAILABLE, 23 ])
 Delayed: (UNKNOWN, [ Write, 23 ]))}Timed out: cannot complete before timeout: Batch{operations=256, tablet="4a02e65bac264694b14faeee409987d1" [0x00000002, 0x00000003), ignoredErrors=[], rpc=KuduRpc(method=Write, tablet=4a02e65bac264694b14faeee409987d1, attempt=23, TimeoutTracker(timeout=30000, elapsed=28936), Trace Summary(28936 ms): Sent(23), Received(23), Delayed(23), MasterRefresh(0), AuthRefresh(0), Truncated: false
 Sent: (49d6bbea2c404778a65f3738a72c6ae9, [ Write, 23 ])
 Received: (49d6bbea2c404778a65f3738a72c6ae9, [ SERVICE_UNAVAILABLE, 23 ])
 Delayed: (UNKNOWN, [ Write, 23 ]))}Timed out: cannot complete before timeout: Batch{operations=256, tablet="4a02e65bac264694b14faeee409987d1" [0x00000002, 0x00000003), ignoredErrors=[], rpc=KuduRpc(method=Write, tablet=4a02e65bac264694b14faeee409987d1, attempt=23, TimeoutTracker(timeout=30000, elapsed=28936), Trace Summary(28936 ms): Sent(23), Received(23), Delayed(23), MasterRefresh(0), AuthRefresh(0), Truncated: false
 Sent: (49d6bbea2c404778a65f3738a72c6ae9, [ Write, 23 ])
 Received: (49d6bbea2c404778a65f3738a72c6ae9, [ SERVICE_UNAVAILABLE, 23 ])
 Delayed: (UNKNOWN, [ Write, 23 ]))}
	at org.apache.kudu.spark.kudu.KuduContext$$anonfun$writeRows$1.apply(KuduContext.scala:362)
	at org.apache.kudu.spark.kudu.KuduContext$$anonfun$writeRows$1.apply(KuduContext.scala:350)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2121)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2121)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:121)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
2021-02-01 17:11:13 INFO  DAGScheduler:57 - Job 3 failed: foreachPartition at KuduContext.scala:350, took 141.590192 s
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 4, localhost, executor driver): java.lang.RuntimeException: PendingErrors overflowed. Failed to write at least 1000 rows to Kudu; Sample errors: Timed out: cannot complete before timeout: Batch{operations=256, tablet="4a02e65bac264694b14faeee409987d1" [0x00000002, 0x00000003), ignoredErrors=[], rpc=KuduRpc(method=Write, tablet=4a02e65bac264694b14faeee409987d1, attempt=23, TimeoutTracker(timeout=30000, elapsed=28936), Trace Summary(28936 ms): Sent(23), Received(23), Delayed(23), MasterRefresh(0), AuthRefresh(0), Truncated: false
 Sent: (49d6bbea2c404778a65f3738a72c6ae9, [ Write, 23 ])
 Received: (49d6bbea2c404778a65f3738a72c6ae9, [ SERVICE_UNAVAILABLE, 23 ])
 Delayed: (UNKNOWN, [ Write, 23 ]))}Timed out: cannot complete before timeout: Batch{operations=256, tablet="4a02e65bac264694b14faeee409987d1" [0x00000002, 0x00000003), ignoredErrors=[], rpc=KuduRpc(method=Write, tablet=4a02e65bac264694b14faeee409987d1, attempt=23, TimeoutTracker(timeout=30000, elapsed=28936), Trace Summary(28936 ms): Sent(23), Received(23), Delayed(23), MasterRefresh(0), AuthRefresh(0), Truncated: false
 Sent: (49d6bbea2c404778a65f3738a72c6ae9, [ Write, 23 ])
 Received: (49d6bbea2c404778a65f3738a72c6ae9, [ SERVICE_UNAVAILABLE, 23 ])
 Delayed: (UNKNOWN, [ Write, 23 ]))}Timed out: cannot complete before timeout: Batch{operations=256, tablet="4a02e65bac264694b14faeee409987d1" [0x00000002, 0x00000003), ignoredErrors=[], rpc=KuduRpc(method=Write, tablet=4a02e65bac264694b14faeee409987d1, attempt=23, TimeoutTracker(timeout=30000, elapsed=28936), Trace Summary(28936 ms): Sent(23), Received(23), Delayed(23), MasterRefresh(0), AuthRefresh(0), Truncated: false
 Sent: (49d6bbea2c404778a65f3738a72c6ae9, [ Write, 23 ])
 Received: (49d6bbea2c404778a65f3738a72c6ae9, [ SERVICE_UNAVAILABLE, 23 ])
 Delayed: (UNKNOWN, [ Write, 23 ]))}Timed out: cannot complete before timeout: Batch{operations=256, tablet="4a02e65bac264694b14faeee409987d1" [0x00000002, 0x00000003), ignoredErrors=[], rpc=KuduRpc(method=Write, tablet=4a02e65bac264694b14faeee409987d1, attempt=23, TimeoutTracker(timeout=30000, elapsed=28936), Trace Summary(28936 ms): Sent(23), Received(23), Delayed(23), MasterRefresh(0), AuthRefresh(0), Truncated: false
 Sent: (49d6bbea2c404778a65f3738a72c6ae9, [ Write, 23 ])
 Received: (49d6bbea2c404778a65f3738a72c6ae9, [ SERVICE_UNAVAILABLE, 23 ])
 Delayed: (UNKNOWN, [ Write, 23 ]))}Timed out: cannot complete before timeout: Batch{operations=256, tablet="4a02e65bac264694b14faeee409987d1" [0x00000002, 0x00000003), ignoredErrors=[], rpc=KuduRpc(method=Write, tablet=4a02e65bac264694b14faeee409987d1, attempt=23, TimeoutTracker(timeout=30000, elapsed=28936), Trace Summary(28936 ms): Sent(23), Received(23), Delayed(23), MasterRefresh(0), AuthRefresh(0), Truncated: false
 Sent: (49d6bbea2c404778a65f3738a72c6ae9, [ Write, 23 ])
 Received: (49d6bbea2c404778a65f3738a72c6ae9, [ SERVICE_UNAVAILABLE, 23 ])
 Delayed: (UNKNOWN, [ Write, 23 ]))}
	at org.apache.kudu.spark.kudu.KuduContext$$anonfun$writeRows$1.apply(KuduContext.scala:362)
	at org.apache.kudu.spark.kudu.KuduContext$$anonfun$writeRows$1.apply(KuduContext.scala:350)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2121)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2121)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:121)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1890)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1877)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:929)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:929)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:929)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2111)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2060)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2049)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:740)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2081)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2102)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2121)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2146)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:935)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:933)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:933)
	at org.apache.kudu.spark.kudu.KuduContext.writeRows(KuduContext.scala:350)
	at org.apache.kudu.spark.kudu.KuduContext.upsertRows(KuduContext.scala:291)
	at com.sparkel.rw.writer.kudu.KuduWriter.writerDatas(KuduWriter.java:100)
	at SparkEL.syncData(SparkEL.java:219)
	at SparkEL.main(SparkEL.java:255)
Caused by: java.lang.RuntimeException: PendingErrors overflowed. Failed to write at least 1000 rows to Kudu; Sample errors: Timed out: cannot complete before timeout: Batch{operations=256, tablet="4a02e65bac264694b14faeee409987d1" [0x00000002, 0x00000003), ignoredErrors=[], rpc=KuduRpc(method=Write, tablet=4a02e65bac264694b14faeee409987d1, attempt=23, TimeoutTracker(timeout=30000, elapsed=28936), Trace Summary(28936 ms): Sent(23), Received(23), Delayed(23), MasterRefresh(0), AuthRefresh(0), Truncated: false
 Sent: (49d6bbea2c404778a65f3738a72c6ae9, [ Write, 23 ])
 Received: (49d6bbea2c404778a65f3738a72c6ae9, [ SERVICE_UNAVAILABLE, 23 ])
 Delayed: (UNKNOWN, [ Write, 23 ]))}Timed out: cannot complete before timeout: Batch{operations=256, tablet="4a02e65bac264694b14faeee409987d1" [0x00000002, 0x00000003), ignoredErrors=[], rpc=KuduRpc(method=Write, tablet=4a02e65bac264694b14faeee409987d1, attempt=23, TimeoutTracker(timeout=30000, elapsed=28936), Trace Summary(28936 ms): Sent(23), Received(23), Delayed(23), MasterRefresh(0), AuthRefresh(0), Truncated: false
 Sent: (49d6bbea2c404778a65f3738a72c6ae9, [ Write, 23 ])
 Received: (49d6bbea2c404778a65f3738a72c6ae9, [ SERVICE_UNAVAILABLE, 23 ])
 Delayed: (UNKNOWN, [ Write, 23 ]))}Timed out: cannot complete before timeout: Batch{operations=256, tablet="4a02e65bac264694b14faeee409987d1" [0x00000002, 0x00000003), ignoredErrors=[], rpc=KuduRpc(method=Write, tablet=4a02e65bac264694b14faeee409987d1, attempt=23, TimeoutTracker(timeout=30000, elapsed=28936), Trace Summary(28936 ms): Sent(23), Received(23), Delayed(23), MasterRefresh(0), AuthRefresh(0), Truncated: false
 Sent: (49d6bbea2c404778a65f3738a72c6ae9, [ Write, 23 ])
 Received: (49d6bbea2c404778a65f3738a72c6ae9, [ SERVICE_UNAVAILABLE, 23 ])
 Delayed: (UNKNOWN, [ Write, 23 ]))}Timed out: cannot complete before timeout: Batch{operations=256, tablet="4a02e65bac264694b14faeee409987d1" [0x00000002, 0x00000003), ignoredErrors=[], rpc=KuduRpc(method=Write, tablet=4a02e65bac264694b14faeee409987d1, attempt=23, TimeoutTracker(timeout=30000, elapsed=28936), Trace Summary(28936 ms): Sent(23), Received(23), Delayed(23), MasterRefresh(0), AuthRefresh(0), Truncated: false
 Sent: (49d6bbea2c404778a65f3738a72c6ae9, [ Write, 23 ])
 Received: (49d6bbea2c404778a65f3738a72c6ae9, [ SERVICE_UNAVAILABLE, 23 ])
 Delayed: (UNKNOWN, [ Write, 23 ]))}Timed out: cannot complete before timeout: Batch{operations=256, tablet="4a02e65bac264694b14faeee409987d1" [0x00000002, 0x00000003), ignoredErrors=[], rpc=KuduRpc(method=Write, tablet=4a02e65bac264694b14faeee409987d1, attempt=23, TimeoutTracker(timeout=30000, elapsed=28936), Trace Summary(28936 ms): Sent(23), Received(23), Delayed(23), MasterRefresh(0), AuthRefresh(0), Truncated: false
 Sent: (49d6bbea2c404778a65f3738a72c6ae9, [ Write, 23 ])
 Received: (49d6bbea2c404778a65f3738a72c6ae9, [ SERVICE_UNAVAILABLE, 23 ])
 Delayed: (UNKNOWN, [ Write, 23 ]))}
	at org.apache.kudu.spark.kudu.KuduContext$$anonfun$writeRows$1.apply(KuduContext.scala:362)
	at org.apache.kudu.spark.kudu.KuduContext$$anonfun$writeRows$1.apply(KuduContext.scala:350)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2121)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2121)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:121)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

以上错误的信息中有几个关键信息“tablet”、"SERVICE_UNAVAILABLE" 、一串“49d6bbea2c404778a65f3738a72c6ae9” 

然后想到了这个一串uuid是不是这个表的某个tablet id?找到这个表的kudu描述

如上可以看出他是这个kudu表的某个副本备份层节点出了问题。等待恢复就行,或者重启Kudu节点

之后重跑任务发现数据写入成功

2021-02-03 09:19:05 INFO  KuduContext:458 - applied 60036 upserts to table 'db.data_20210201' in 5332ms
2021-02-03 09:19:05 INFO  Executor:57 - Finished task 0.0 in stage 4.0 (TID 4). 37312 bytes result sent to driver
2021-02-03 09:19:05 INFO  TaskSetManager:57 - Finished task 0.0 in stage 4.0 (TID 4) in 7801 ms on localhost (executor driver) (1/1)
2021-02-03 09:19:05 INFO  TaskSchedulerImpl:57 - Removed TaskSet 4.0, whose tasks have all completed, from pool 
2021-02-03 09:19:05 INFO  DAGScheduler:57 - ResultStage 4 (foreachPartition at KuduContext.scala:350) finished in 7.807 s
2021-02-03 09:19:05 INFO  DAGScheduler:57 - Job 3 finished: foreachPartition at KuduContext.scala:350, took 7.808699 s
2021-02-03 09:19:05 INFO  KuduContext:371 - completed upsert ops: duration histogram: 5332ms

 

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值