场景: kafka 版本kafka_2.10-0.8.1.1 ,spark任务刚开始正常运行,一段时间后,报下面的错误,但是数据准确性不影响,只是严重拖慢了实时任务
[Stage 46825:=========================================> (3 + 1) / 4]17/11/04 23:14:23 WARN TaskSetManager: Lost task 0.0 in stage 46825.0 (TID 1631417, hadoopslave1): java.nio.channels.ClosedChannelException at kafka.network.BlockingChannel.send(BlockingChannel.scala:100) at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:78) at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:68) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:112) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:112) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:112) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:111) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:111) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:111) |
分析:
查看源码 kafka 源码 kafka.network.BlockingChannel ,知道该异常抛出的时候,什么都木有做,所以只是拖慢实时任务,并不影响数据准确性
def send(request: RequestOrResponse):Int = { if(!connected) throw new ClosedChannelException()
val send = new BoundedByteBufferSend(request) send.writeCompletely(writeChannel) } |
那是什么什么造成连接kafka异常呢? 查找资料得知 kafka 的 server.properties 的下面 参数可能影响
num.network.threads=2 #处理网络请求的线程数 zookeeper.connection.timeout.ms=1000000 # kafka 连接 zookeeper 的 超时时间,如果过大,一次连接不上,就会卡死很久 |
改为
num.network.threads=3 zookeeper.connection.timeout.ms=6000
|
重启kafka ,该错误不再报,而且 每个实时任务的时间 减小魏原来的1/10 ,性能几乎提高 10倍