开发SparkStreaming消费Kafka的应用要注意OutOfMemoryError

最新推荐文章于 2024-07-31 14:27:48 发布

子秦1117

最新推荐文章于 2024-07-31 14:27:48 发布

阅读量322

点赞数

分类专栏： SparkStreaming 文章标签： spark

本文链接：https://blog.csdn.net/sunspeedzy/article/details/114060125

版权

SparkStreaming 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

遇到的问题

其他部门的同事开发了一个SparkStreaming消费Kafka数据的应用，运行了一个多月后，不能消费数据了，但是应用在Yarn上一直处于RUNNING状态。
进入ApplicationMaster查看Spark UI，发现了一个奇怪的现象

在Streaming页面中，原本4s执行一批次的数据处理，在某个时刻就不再执行了。
打开了stderr日志

Exception in thread “pool-23-thread-1” java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1025)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Exception in thread “JobGenerator” java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367)
at org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:290)
at org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:297)
at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:186)
at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:89)
at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:88)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

奇怪的是，Executor发生了OutOfMemory异常，却没有导致整个作业Fail掉，从而监控作业运行状态的应用程序没有发现这个作业已经挂掉了，导致没有及时预警

作业资源配置

Executor个数：1，每个Executor内存：1G

解决方案

1、将Executor内存提升至2G，重新提交运行。

2、建议代码开发者检查代码，找一找为什么Executor进程挂掉了而Driver进程仍然没有挂掉的原因。
个人猜测，可能代码导致了Executor抛出的Error没有传递给Driver，或者是Driver得到了Executor的Error但没有异常终止.。因为他们开发的其他SparkStreaming作业，在运行时Executor抛出的Exception都引起了Driver的Fail。