


package SparkStream

import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}

  * Created by admin on 2019/3/21.
  * 功能: 演示正常的使用SparkStreaming
object SparkStreaming {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("stop spark streaming").setMaster("local[2]")
    val ssc = new StreamingContext(conf, Seconds(5))
    val file = ssc.textFileStream("C://test//stu1.txt")
    val res = file.map { line =>
      val arr = line.split("\\|")
      //arr(0) + "888888" + arr(2)


如何优雅的停止掉SparkStreaming?这是本文的研究重点。 大致思路:


package SparkStream

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}

  * Created by admin on 2019/3/21.
  * 功能: 演示如何优雅的停止掉SparkStreaming
object StopSparkStreaming {
  val shutdownMarker = "c://test//source1"
  var stopFlag: Boolean = false

  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("stop spark streaming").setMaster("local[2]")
    val ssc = new StreamingContext(conf, Seconds(5))
    val file = ssc.textFileStream("C://test//stu1.txt")
    val res = file.map { line =>
      val arr = line.split("\\|")
      //arr(0) + "888888" + arr(2)
    val checkIntervalMillis = 10000
    var isStopped = false
    while (!isStopped) {
      println("calling awaitTerminationOrTimeout")
      isStopped = ssc.awaitTerminationOrTimeout(checkIntervalMillis)
      if (isStopped) {
        println("confirmed! The streaming context is stopped. Exiting application...")
      } else {
        println("Streaming App is still running. Timeout...")
      if (!isStopped && stopFlag) {
        println("stopping ssc right now")
        ssc.stop(true, true)
        println("ssc is stopped!!!!!!!")

  def checkShutdownMarker = {
    if (!stopFlag) {
      val fs = FileSystem.get(new Configuration())
      stopFlag = fs.exists(new Path(shutdownMarker))


Streaming App is still running. Timeout...
calling awaitTerminationOrTimeout
19/03/21 15:48:40 INFO InputInfoTracker: remove old batch metadata: 
Streaming App is still running. Timeout...
calling awaitTerminationOrTimeout
19/03/21 15:48:50 INFO InputInfoTracker: remove old batch metadata: 
Streaming App is still running. Timeout...
calling awaitTerminationOrTimeout
19/03/21 15:49:00 INFO InputInfoTracker: remove old batch metadata: 
Streaming App is still running. Timeout...
calling awaitTerminationOrTimeout
19/03/21 15:49:10 INFO InputInfoTracker: remove old batch metadata: 
Streaming App is still running. Timeout...
calling awaitTerminationOrTimeout
19/03/21 15:49:20 INFO InputInfoTracker: remove old batch metadata: 
Streaming App is still running. Timeout...
calling awaitTerminationOrTimeout
19/03/21 15:49:30 INFO InputInfoTracker: remove old batch metadata: 1553154505000 ms
Streaming App is still running. Timeout...
calling awaitTerminationOrTimeout
19/03/21 15:49:40 INFO InputInfoTracker: remove old batch metadata: 1553154515000 ms
Streaming App is still running. Timeout...
calling awaitTerminationOrTimeout
19/03/21 15:49:50 INFO InputInfoTracker: remove old batch metadata: 1553154525000 ms
Streaming App is still running. Timeout...
calling awaitTerminationOrTimeout
19/03/21 15:50:00 INFO InputInfoTracker: remove old batch metadata: 1553154535000 ms
19/03/21 15:50:00 INFO BlockManager: Removing RDD 35
Streaming App is still running. Timeout...
calling awaitTerminationOrTimeout
19/03/21 15:50:10 INFO InputInfoTracker: remove old batch metadata: 1553154545000 ms
Streaming App is still running. Timeout...
calling awaitTerminationOrTimeout
19/03/21 15:50:20 INFO InputInfoTracker: remove old batch metadata: 1553154555000 ms
Streaming App is still running. Timeout...
calling awaitTerminationOrTimeout
19/03/21 15:50:30 INFO InputInfoTracker: remove old batch metadata: 1553154565000 ms
Streaming App is still running. Timeout...
stopping ssc right now
19/03/21 15:50:32 INFO JobGenerator: Stopping JobGenerator gracefully
19/03/21 15:50:32 INFO JobGenerator: Waiting for all received blocks to be consumed for job generation
19/03/21 15:50:32 INFO JobGenerator: Waited for all received blocks to be consumed for job generation
19/03/21 15:50:35 INFO FileInputDStream: Finding new files took 2 ms
19/03/21 15:50:35 INFO FileInputDStream: New files at time 1553154635000 ms:

19/03/21 15:50:35 INFO JobScheduler: Added jobs for time 1553154635000 ms
19/03/21 15:50:35 INFO JobScheduler: Starting job streaming job 1553154635000 ms.0 from job set of time 1553154635000 ms
19/03/21 15:50:35 INFO SparkContext: Starting job: saveAsTextFiles at StopSparkStreaming.scala:28
19/03/21 15:50:35 INFO DAGScheduler: Registering RDD 132 (map at StopSparkStreaming.scala:23)
19/03/21 15:50:35 INFO DAGScheduler: Got job 26 (saveAsTextFiles at StopSparkStreaming.scala:28) with 2 output partitions
19/03/21 15:50:35 INFO DAGScheduler: Final stage: ResultStage 53 (saveAsTextFiles at StopSparkStreaming.scala:28)
19/03/21 15:50:35 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 52)
19/03/21 15:50:35 INFO DAGScheduler: Missing parents: List()
19/03/21 15:50:35 INFO DAGScheduler: Submitting ResultStage 53 (MapPartitionsRDD[134] at saveAsTextFiles at StopSparkStreaming.scala:28), which has no missing parents
19/03/21 15:50:35 INFO MemoryStore: Block broadcast_26 stored as values in memory (estimated size 64.5 KB, free 324.6 KB)
19/03/21 15:50:35 INFO MemoryStore: Block broadcast_26_piece0 stored as bytes in memory (estimated size 22.2 KB, free 346.8 KB)
19/03/21 15:50:35 INFO BlockManagerInfo: Added broadcast_26_piece0 in memory on localhost:50448 (size: 22.2 KB, free: 1121.9 MB)
19/03/21 15:50:35 INFO SparkContext: Created broadcast 26 from broadcast at DAGScheduler.scala:1006
19/03/21 15:50:35 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 53 (MapPartitionsRDD[134] at saveAsTextFiles at StopSparkStreaming.scala:28)
19/03/21 15:50:35 INFO TaskSchedulerImpl: Adding task set 53.0 with 2 tasks
19/03/21 15:50:35 INFO TaskSetManager: Starting task 0.0 in stage 53.0 (TID 52, localhost, partition 0,PROCESS_LOCAL, 1894 bytes)
19/03/21 15:50:35 INFO TaskSetManager: Starting task 1.0 in stage 53.0 (TID 53, localhost, partition 1,PROCESS_LOCAL, 1894 bytes)
19/03/21 15:50:35 INFO Executor: Running task 0.0 in stage 53.0 (TID 52)
19/03/21 15:50:35 INFO Executor: Running task 1.0 in stage 53.0 (TID 53)
19/03/21 15:50:35 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 0 blocks
19/03/21 15:50:35 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
19/03/21 15:50:35 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 0 blocks
19/03/21 15:50:35 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
19/03/21 15:50:35 INFO FileOutputCommitter: Saved output of task 'attempt_201903211550_0053_m_000001_53' to file:/c:/test/result-1553154635000/_temporary/0/task_201903211550_0053_m_000001
19/03/21 15:50:35 INFO SparkHadoopMapRedUtil: attempt_201903211550_0053_m_000001_53: Committed
19/03/21 15:50:35 INFO Executor: Finished task 1.0 in stage 53.0 (TID 53). 2080 bytes result sent to driver
19/03/21 15:50:35 INFO TaskSetManager: Finished task 1.0 in stage 53.0 (TID 53) in 277 ms on localhost (1/2)
19/03/21 15:50:35 INFO FileOutputCommitter: Saved output of task 'attempt_201903211550_0053_m_000000_52' to file:/c:/test/result-1553154635000/_temporary/0/task_201903211550_0053_m_000000
19/03/21 15:50:35 INFO SparkHadoopMapRedUtil: attempt_201903211550_0053_m_000000_52: Committed
19/03/21 15:50:35 INFO Executor: Finished task 0.0 in stage 53.0 (TID 52). 2080 bytes result sent to driver
19/03/21 15:50:35 INFO TaskSetManager: Finished task 0.0 in stage 53.0 (TID 52) in 335 ms on localhost (2/2)
19/03/21 15:50:35 INFO TaskSchedulerImpl: Removed TaskSet 53.0, whose tasks have all completed, from pool 
19/03/21 15:50:35 INFO DAGScheduler: ResultStage 53 (saveAsTextFiles at StopSparkStreaming.scala:28) finished in 0.335 s
19/03/21 15:50:35 INFO DAGScheduler: Job 26 finished: saveAsTextFiles at StopSparkStreaming.scala:28, took 0.351543 s
19/03/21 15:50:35 INFO JobScheduler: Finished job streaming job 1553154635000 ms.0 from job set of time 1553154635000 ms
19/03/21 15:50:35 INFO JobScheduler: Total delay: 0.680 s for time 1553154635000 ms (execution: 0.630 s)
19/03/21 15:50:35 INFO ShuffledRDD: Removing RDD 128 from persistence list
19/03/21 15:50:35 INFO BlockManager: Removing RDD 128
19/03/21 15:50:35 INFO MapPartitionsRDD: Removing RDD 127 from persistence list
19/03/21 15:50:35 INFO BlockManager: Removing RDD 127
19/03/21 15:50:35 INFO MapPartitionsRDD: Removing RDD 126 from persistence list
19/03/21 15:50:35 INFO BlockManager: Removing RDD 126
19/03/21 15:50:35 INFO UnionRDD: Removing RDD 70 from persistence list
19/03/21 15:50:35 INFO BlockManager: Removing RDD 70
19/03/21 15:50:35 INFO FileInputDStream: Cleared 1 old files that were older than 1553154575000 ms: 1553154570000 ms
19/03/21 15:50:35 INFO ReceivedBlockTracker: Deleting batches ArrayBuffer()
19/03/21 15:50:35 INFO InputInfoTracker: remove old batch metadata: 1553154570000 ms
19/03/21 15:50:40 INFO RecurringTimer: Stopped timer for JobGenerator after time 1553154640000
19/03/21 15:50:40 INFO FileInputDStream: Finding new files took 5 ms
19/03/21 15:50:40 INFO FileInputDStream: New files at time 1553154640000 ms:

19/03/21 15:50:40 INFO JobScheduler: Added jobs for time 1553154640000 ms
19/03/21 15:50:40 INFO JobScheduler: Starting job streaming job 1553154640000 ms.0 from job set of time 1553154640000 ms
19/03/21 15:50:40 INFO JobGenerator: Stopped generation timer
19/03/21 15:50:40 INFO JobGenerator: Waiting for jobs to be processed and checkpoints to be written
19/03/21 15:50:40 INFO SparkContext: Starting job: saveAsTextFiles at StopSparkStreaming.scala:28
19/03/21 15:50:40 INFO DAGScheduler: Registering RDD 137 (map at StopSparkStreaming.scala:23)
19/03/21 15:50:40 INFO DAGScheduler: Got job 27 (saveAsTextFiles at StopSparkStreaming.scala:28) with 2 output partitions
19/03/21 15:50:40 INFO DAGScheduler: Final stage: ResultStage 55 (saveAsTextFiles at StopSparkStreaming.scala:28)
19/03/21 15:50:40 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 54)
19/03/21 15:50:40 INFO DAGScheduler: Missing parents: List()
19/03/21 15:50:40 INFO DAGScheduler: Submitting ResultStage 55 (MapPartitionsRDD[139] at saveAsTextFiles at StopSparkStreaming.scala:28), which has no missing parents
19/03/21 15:50:40 INFO MemoryStore: Block broadcast_27 stored as values in memory (estimated size 64.5 KB, free 411.2 KB)
19/03/21 15:50:40 INFO MemoryStore: Block broadcast_27_piece0 stored as bytes in memory (estimated size 22.2 KB, free 433.4 KB)
19/03/21 15:50:40 INFO BlockManagerInfo: Added broadcast_27_piece0 in memory on localhost:50448 (size: 22.2 KB, free: 1121.9 MB)
19/03/21 15:50:40 INFO SparkContext: Created broadcast 27 from broadcast at DAGScheduler.scala:1006
19/03/21 15:50:40 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 55 (MapPartitionsRDD[139] at saveAsTextFiles at StopSparkStreaming.scala:28)
19/03/21 15:50:40 INFO TaskSchedulerImpl: Adding task set 55.0 with 2 tasks
19/03/21 15:50:40 INFO TaskSetManager: Starting task 0.0 in stage 55.0 (TID 54, localhost, partition 0,PROCESS_LOCAL, 1894 bytes)
19/03/21 15:50:40 INFO TaskSetManager: Starting task 1.0 in stage 55.0 (TID 55, localhost, partition 1,PROCESS_LOCAL, 1894 bytes)
19/03/21 15:50:40 INFO Executor: Running task 1.0 in stage 55.0 (TID 55)
19/03/21 15:50:40 INFO Executor: Running task 0.0 in stage 55.0 (TID 54)
19/03/21 15:50:40 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 0 blocks
19/03/21 15:50:40 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
19/03/21 15:50:40 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 0 blocks
19/03/21 15:50:40 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
19/03/21 15:50:40 INFO FileOutputCommitter: Saved output of task 'attempt_201903211550_0055_m_000001_55' to file:/c:/test/result-1553154640000/_temporary/0/task_201903211550_0055_m_000001
19/03/21 15:50:40 INFO SparkHadoopMapRedUtil: attempt_201903211550_0055_m_000001_55: Committed
19/03/21 15:50:40 INFO Executor: Finished task 1.0 in stage 55.0 (TID 55). 2080 bytes result sent to driver
19/03/21 15:50:40 INFO TaskSetManager: Finished task 1.0 in stage 55.0 (TID 55) in 303 ms on localhost (1/2)
19/03/21 15:50:40 INFO FileOutputCommitter: Saved output of task 'attempt_201903211550_0055_m_000000_54' to file:/c:/test/result-1553154640000/_temporary/0/task_201903211550_0055_m_000000
19/03/21 15:50:40 INFO SparkHadoopMapRedUtil: attempt_201903211550_0055_m_000000_54: Committed
19/03/21 15:50:40 INFO Executor: Finished task 0.0 in stage 55.0 (TID 54). 2080 bytes result sent to driver
19/03/21 15:50:40 INFO TaskSetManager: Finished task 0.0 in stage 55.0 (TID 54) in 359 ms on localhost (2/2)
19/03/21 15:50:40 INFO TaskSchedulerImpl: Removed TaskSet 55.0, whose tasks have all completed, from pool 
19/03/21 15:50:40 INFO DAGScheduler: ResultStage 55 (saveAsTextFiles at StopSparkStreaming.scala:28) finished in 0.361 s
19/03/21 15:50:40 INFO DAGScheduler: Job 27 finished: saveAsTextFiles at StopSparkStreaming.scala:28, took 0.378762 s
19/03/21 15:50:40 INFO JobScheduler: Finished job streaming job 1553154640000 ms.0 from job set of time 1553154640000 ms
19/03/21 15:50:40 INFO JobScheduler: Total delay: 0.772 s for time 1553154640000 ms (execution: 0.741 s)
19/03/21 15:50:40 INFO ShuffledRDD: Removing RDD 133 from persistence list
19/03/21 15:50:40 INFO MapPartitionsRDD: Removing RDD 132 from persistence list
19/03/21 15:50:40 INFO BlockManager: Removing RDD 132
19/03/21 15:50:40 INFO BlockManager: Removing RDD 133
19/03/21 15:50:40 INFO MapPartitionsRDD: Removing RDD 131 from persistence list
19/03/21 15:50:40 INFO BlockManager: Removing RDD 131
19/03/21 15:50:40 INFO UnionRDD: Removing RDD 75 from persistence list
19/03/21 15:50:40 INFO FileInputDStream: Cleared 1 old files that were older than 1553154580000 ms: 1553154575000 ms
19/03/21 15:50:40 INFO ReceivedBlockTracker: Deleting batches ArrayBuffer()
19/03/21 15:50:40 INFO InputInfoTracker: remove old batch metadata: 1553154575000 ms
19/03/21 15:50:40 INFO BlockManager: Removing RDD 75
19/03/21 15:50:40 INFO JobGenerator: Waited for jobs to be processed and checkpoints to be written
19/03/21 15:50:40 INFO JobGenerator: Stopped JobGenerator
19/03/21 15:50:40 INFO JobScheduler: Stopped JobScheduler
19/03/21 15:50:40 INFO StreamingContext: StreamingContext stopped successfully
19/03/21 15:50:40 INFO SparkUI: Stopped Spark web UI at
19/03/21 15:50:40 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/03/21 15:50:41 INFO MemoryStore: MemoryStore cleared
19/03/21 15:50:41 INFO BlockManager: BlockManager stopped
19/03/21 15:50:41 INFO BlockManagerMaster: BlockManagerMaster stopped
19/03/21 15:50:41 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/03/21 15:50:41 INFO SparkContext: Successfully stopped SparkContext
ssc is stopped!!!!!!!
calling awaitTerminationOrTimeout
confirmed! The streaming context is stopped. Exiting application...
19/03/21 15:50:41 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
19/03/21 15:50:41 INFO ShutdownHookManager: Shutdown hook called
19/03/21 15:50:41 INFO ShutdownHookManager: Deleting directory C:\Users\admin\AppData\Local\Temp\spark-ec9347c3-5dbd-4f27-a52f-40e71d565521
19/03/21 15:50:41 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

Process finished with exit code 0

可以看到从"19/03/21 15:49:10"到"19/03/21 15:50:30",ss就每隔10秒钟扫描目录(c://test//result)。因为目录此时没有,等超时之后,ss还在继续运行。此后监测到目录存在后,就调用ssc.stop(true, true)方法进行停止,进而退出整个while循环。

  • 1
  • 5
    觉得还不错? 一键收藏
  • 0


  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


