spark 终止 运行_如何在数据源运行ou时停止spark流

本文描述了一个使用Spark 1.6处理Kafka数据的流作业问题,作者希望每天运行一次并停止作业。通过设置`ssc.start()`和`ssc.awaitTermination(10)`,然后调用`ssc.stop()`来尝试停止作业,但遇到错误。错误提示显示在尝试获取新的通信通道时出现问题。解决方案是确保在停止作业前等待一段时间。
摘要由CSDN通过智能技术生成

我有一个spark流作业,它每5秒从Kafka读取一次,对传入的数据进行一些转换,然后写入文件系统。

这不一定是一个流媒体工作,实际上,我只想每天运行一次,将消息排放到文件系统中。不过,我不知道如何停止这项工作。

如果我将超时传递给streamingContext.awaittemination,它不会停止进程,它所做的一切都会导致进程在迭代流时产生错误(请参阅下面的错误)

我想做的最好的方法是什么

这是针对Python上的Spark 1.6的

编辑:

多亏了@marios,解决方案是:ssc.start()

ssc.awaitTermination(10)

ssc.stop()

在停止之前运行脚本10秒。

简化代码:conf = SparkConf().setAppName("Vehicle Data Consolidator").set('spark.files.overwrite','true')

sc = SparkContext(conf=conf)

ssc = StreamingContext(sc, 5)

stream = KafkaUtils.createStream(

ssc,

kafkaParams["zookeeper.connect"],

"vehicle-data-importer",

topicPartitions,

kafkaParams)

stream.saveAsTextFiles('stream-output/kafka-vehicle-data')

ssc.start()

ssc.awaitTermination(10)

错误:16/01/29 15:05:44 INFO BlockManagerInfo: Added input-0-1454097944200 in memory on localhost:58960 (size: 3.0 MB, free: 48.1 MB)

16/01/29 15:05:44 WARN BlockManager: Block input-0-1454097944200 replicated to only 0 peer(s) instead of 1 peers

16/01/29 15:05:44 INFO BlockGenerator: Pushed block input-0-1454097944200

16/01/29 15:05:45 ERROR JobScheduler: Error generating jobs for time 1454097945000 ms

py4j.Py4JException: Cannot obtain a new communication channel

at py4j.CallbackClient.sendCommand(CallbackClient.java:232)

at py4j.reflection.PythonProxyHandler.invoke(PythonProxyHandler.java:111)

at com.sun.proxy.$Proxy14.call(Unknown Source)

at org.apache.spark.streaming.api.python.TransformFunction.callPythonTransformFunction(PythonDStream.scala:92)

at org.apache.spark.streaming.api.python.TransformFunction.apply(PythonDStream.scala:78)

at org.apache.spark.streaming.api.python.PythonTransformedDStream.compute(PythonDStream.scala:230)

at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:352)

at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:352)

at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)

at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:351)

at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:351)

at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)

at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:346)

at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:344)

at scala.Option.orElse(Option.scala:257)

at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:341)

at org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:47)

at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:115)

at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:114)

at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)

at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)

at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)

at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)

at org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:114)

at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:248)

at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:246)

at scala.util.Try$.apply(Try.scala:161)

at org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:246)

at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:181)

at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87)

at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86)

at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

16/01/29 15:05:45 INFO MemoryStore: Block input-0-1454097944800 stored as bytes in memory (estimated size 3.0 MB, free 466.1 MB)

16/01/29 15:05:45 INFO BlockManagerInfo: Added input-0-1454097944800 in memory on localhost:58960 (size: 3.0 MB, free: 45.1 MB)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值