记spark与kafka连接的报错:org.apache.spark.SparkException: Couldn't find leaders for Set([bat_model_task,0])

程序一直运行正常,服务器异常kafka断开了一个broker,重新启动后抛出下面异常:

18/10/22 23:24:41 INFO YarnClientSchedulerBackend: Application application_1536983779148_0365 has started running.
18/10/22 23:24:41 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44301.
18/10/22 23:24:41 INFO NettyBlockTransferService: Server created on 10.50.141.169:44301
18/10/22 23:24:41 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/10/22 23:24:41 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.50.141.169, 44301, None)
18/10/22 23:24:41 INFO BlockManagerMasterEndpoint: Registering block manager 10.50.141.169:44301 with 366.3 MB RAM, BlockManagerId(driver, 10.50.141.169, 44301, None)
18/10/22 23:24:41 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.50.141.169, 44301, None)
18/10/22 23:24:41 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.50.141.169, 44301, None)
18/10/22 23:24:41 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6d294ddc{/metrics/json,null,AVAILABLE,@Spark}
18/10/22 23:24:44 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.50.141.229:50708) with ID 1
18/10/22 23:24:44 INFO BlockManagerMasterEndpoint: Registering block manager jiadun.slave02.com:42115 with 413.9 MB RAM, BlockManagerId(1, jiadun.slave02.com, 42115, None)
18/10/22 23:24:50 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.50.141.194:56506) with ID 2
18/10/22 23:24:50 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
18/10/22 23:24:50 INFO BlockManagerMasterEndpoint: Registering block manager jiadun.slave01.com:37485 with 413.9 MB RAM, BlockManagerId(2, jiadun.slave01.com, 37485, None)
18/10/22 23:24:50 INFO VerifiableProperties: Verifying properties
18/10/22 23:24:50 INFO VerifiableProperties: Property auto.offset.reset is overridden to largest
18/10/22 23:24:50 INFO VerifiableProperties: Property group.id is overridden to datainit_security_001
18/10/22 23:24:50 INFO VerifiableProperties: Property zookeeper.connect is overridden to 
Exception in thread "main" org.apache.spark.SparkException: org.apache.spark.SparkException: Couldn't find leaders for Set([bat_model_task,0])
	at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:385)
	at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:385)
	at scala.util.Either.fold(Either.scala:98)
	at org.apache.spark.streaming.kafka.KafkaCluster$.checkErrors(KafkaCluster.scala:384)
	at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:222)
	at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
	at com.jiadun.handler.SecurityModelMonitoring$.main(SecurityModelMonitoring.scala:52)
	at com.jiadun.handler.SecurityModelMonitoring.main(SecurityModelMonitoring.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:782)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/10/22 23:24:50 INFO SparkContext: Invoking stop() from shutdown hook
18/10/22 23:24:50 INFO AbstractConnector: Stopped Spark@40e4ea87{HTTP/1.1,[http/1.1]}{0.0.0.0:4044}
18/10/22 23:24:50 INFO SparkUI: Stopped Spark web UI at http://10.50.141.169:4044
18/10/22 23:24:50 INFO YarnClientSchedulerBackend: Interrupting monitor thread
18/10/22 23:24:50 INFO YarnClientSchedulerBackend: Shutting down all executors
18/10/22 23:24:50 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
18/10/22 23:24:50 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
18/10/22 23:24:50 INFO YarnClientSchedulerBackend: Stopped
18/10/22 23:24:50 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/10/22 23:24:50 INFO MemoryStore: MemoryStore cleared
18/10/22 23:24:50 INFO BlockManager: BlockManager stopped
18/10/22 23:24:50 INFO BlockManagerMaster: BlockManagerMaster stopped
18/10/22 23:24:50 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/10/22 23:24:50 INFO SparkContext: Successfully stopped SparkContext
18/10/22 23:24:50 INFO ShutdownHookManager: Shutdown hook called
18/10/22 23:24:50 INFO ShutdownHookManager: Deleting directory /tmp/spark-4684bd19-c498-4fff

异常是Spark找不到partition的Leader。查看监控后发现,在异常发生的时间点,有一个Broker挂掉了。可是对应Topic的replica设置的2,就算挂掉一个,应该有replica顶上啊。后来发现,这是由于存在Partition的Replica没有跟Leader保持同步更新,也就是通常所说的“没追上”。 查看某个Topic是否存在没追上的情况:


kafka-topics.sh --describe --zookeeper XXX --topic XXX


观察其中的Replicas和Isr是否一致,如果出现Isr少于Replicas,则对应Partition存在没追上的情况

解决方法:
增大num.replica.fetchers的值,此参数是Replicas从Leader同步数据的线程数,默认为1,增大此参数即增大了同步IO。经过测试,增大此值后,不再有追不上的情况,此处我设置为broker的数量

确定问题已解决的方法:
启动出现问题的SparkStreaming程序,在程序正常计算的状态下,kill掉任意一个Broker后,再观察运行情况。在增大同步线程数之前,kill后SparkStreaming会报同样的异常,而增大后程序依然正常运行,问题解决。
 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值