SparkStreaming随笔之Receivcer

1.sparkstreaming的receiver启动流程。

     首先是:StreamingContext的start方法。

    然后到JobScheduler的start方法,这个时候已经产生了ReceiverTracker对象。

  ReceiverTracker对象调用了start方法。start方法中核心代码如下:

         if (!receiverInputStreams.isEmpty) {
          endpoint = ssc.env.rpcEnv.setupEndpoint(
            "ReceiverTracker", new ReceiverTrackerEndpoint(ssc.env.rpcEnv))
          if (!skipReceiverLaunch) launchReceivers()
          logInfo("ReceiverTracker started")
          trackerState = Started
        }

 ReceiverTrackerEndpoint类中有一个很重要的方法receive,代码如下:

 
            override def receive: PartialFunction[Any, Unit] = {
                  
                  case StartAllReceivers(receivers) =>
                    val scheduledLocations = schedulingPolicy.scheduleReceivers(receivers, getExecutors)
                    for (receiver <- receivers) {
                      val executors = scheduledLocations(receiver.streamId)
                      updateReceiverScheduledExecutors(receiver.streamId, executors)
                      receiverPreferredLocations(receiver.streamId) = receiver.preferredLocation
                   //开始一个 Receiver

                   startReceiver(receiver, executors)
                    }
                  case RestartReceiver(receiver) =>
                    // Old scheduled executors minus the ones that are not active any more
                    val oldScheduledExecutors = getStoredScheduledExecutors(receiver.streamId)
                    val scheduledLocations = if (oldScheduledExecutors.nonEmpty) {
                        // Try global scheduling again
                        oldScheduledExecutors
                      } else {
                        val oldReceiverInfo = receiverTrackingInfos(receiver.streamId)
                        // Clear "scheduledLocations" to indicate we are going to do local scheduling
                        val newReceiverInfo = oldReceiverInfo.copy(
                          state = ReceiverState.INACTIVE, scheduledLocations = None)
                        receiverTrackingInfos(receiver.streamId) = newReceiverInfo
                        schedulingPolicy.rescheduleReceiver(
                          receiver.streamId,
                          receiver.preferredLocation,
                          receiverTrackingInfos,
                          getExecutors)
                      }
                    // Assume there is one receiver restarting at one time, so we don't need to update
                    // receiverTrackingInfos
                    startReceiver(receiver, scheduledLocations)
                  case c: CleanupOldBlocks =>
                    receiverTrackingInfos.values.flatMap(_.endpoint).foreach(_.send(c))
                  case UpdateReceiverRateLimit(streamUID, newRate) =>
                    for (info <- receiverTrackingInfos.get(streamUID); eP <- info.endpoint) {
                      eP.send(UpdateRateLimit(newRate))
                    }
                  // Remote messages
                  case ReportError(streamId, message, error) =>
                    reportError(streamId, message, error)
                }

startReceiver方法
private def startReceiver(
    receiver: Receiver[_],
    scheduledLocations: Seq[TaskLocation]): Unit = {
  def shouldStartReceiver: Boolean = {
    // It's okay to start when trackerState is Initialized or Started
    !(isTrackerStopping || isTrackerStopped)
  }

  val receiverId = receiver.streamId
  if (!shouldStartReceiver) {
    onReceiverJobFinish(receiverId)
    return
  }

  val checkpointDirOption = Option(ssc.checkpointDir)
  val serializableHadoopConf =
    new SerializableConfiguration(ssc.sparkContext.hadoopConfiguration)

  // Function to start the receiver on the worker node
  val startReceiverFunc: Iterator[Receiver[_]] => Unit =
    (iterator: Iterator[Receiver[_]]) => {
      if (!iterator.hasNext) {
        throw new SparkException(
          "Could not start receiver as object not found.")
      }
      if (TaskContext.get().attemptNumber() == 0) {
        val receiver = iterator.next()
        assert(iterator.hasNext == false)
        val supervisor = new ReceiverSupervisorImpl(
          receiver, SparkEnv.get, serializableHadoopConf.value, checkpointDirOption)
        //开始
         supervisor.start()
        supervisor.awaitTermination()
      } else {
        // It's restarted by TaskScheduler, but we want to reschedule it again. So exit it.
      }
    }
ReceiverSupervisor类的start方法:
/** Start the supervisor */
def start() {
  onStart()
  startReceiver()
}

ReceiverSupervisor类的startReceiver方法:

/** Start receiver */
def startReceiver(): Unit = synchronized {
  try {
    if (onReceiverStart()) {
      logInfo(s"Starting receiver $streamId")
      receiverState = Started
      receiver.onStart()
      logInfo(s"Called receiver $streamId onStart")
    } else {
      // The driver refused us
      stop("Registered unsuccessfully because Driver refused to start receiver " + streamId, None)
    }
  } catch {
    case NonFatal(t) =>
      stop("Error starting receiver " + streamId, Some(t))
  }
}

最后到了Receiver的onStart方法.

上面代码如果看着很复杂,那么我来讲个故事。

有个人想窥探8个房间信息。军师说道首先过了大门StreamingContext,当然都是先想好退路ShutdownHookManager,

然后还要统计一下时间,安排多少人啊等资源:metricsSystem还有别忘记时刻监控联系uiTab,最后进入大门。

 找到四处走的看守员JobScheduler,然后给他说你需要从哪里拿东西。我已经安排好了new ReceiverTracker(ssc)。

 看守员也很慌的,然后呢就只好打电话给看守长。然后在记录本上将一个房间一个房间的号码通过电话开始传过去。

 endpoint = ssc.env.rpcEnv.setupEndpoint( "ReceiverTracker", new ReceiverTrackerEndpoint(ssc.env.rpcEnv))。

for (receiver <- receivers) {
  val executors = scheduledLocations(receiver.streamId)
  updateReceiverScheduledExecutors(receiver.streamId, executors)
  receiverPreferredLocations(receiver.streamId) = receiver.preferredLocation
  startReceiver(receiver, executors)
}

内网打机密电话路途很坎坷,终于发过去了

val supervisor = new ReceiverSupervisorImpl(
  receiver, SparkEnv.get, serializableHadoopConf.value, checkpointDirOption)
supervisor.start()
ReceiverSupervisorImpl的onReceiverStart方法
 override protected def onReceiverStart(): Boolean = {
  val msg = RegisterReceiver(
    streamId, receiver.getClass.getSimpleName, host, executorId, endpoint)
  trackerEndpoint.askWithRetry[Boolean](msg)
}

 看守长说可以了

/** Start the supervisor */
def start() {
  onStart()
  startReceiver()
}

然后一个房间一个房间开始接受东西

 

def startReceiver(): Unit = synchronized {
  try {
    if (onReceiverStart()) {
      logInfo(s"Starting receiver $streamId")
      receiverState = Started
      receiver.onStart()
      logInfo(s"Called receiver $streamId onStart")
    } else {
      // The driver refused us
      stop("Registered unsuccessfully because Driver refused to start receiver " + streamId, None)
    }
  } catch {
    case NonFatal(t) =>
      stop("Error starting receiver " + streamId, Some(t))
  }
}

Receivcer的存储:

  KafkaInputDStream的onStart方法:

  def onStart() {

    logInfo("Starting Kafka Consumer Stream with group: " + kafkaParams("group.id"))

    // Kafka connection properties
    val props = new Properties()
    kafkaParams.foreach(param => props.put(param._1, param._2))

    val zkConnect = kafkaParams("zookeeper.connect")
    // Create the connection to the cluster
    logInfo("Connecting to Zookeeper: " + zkConnect)
    val consumerConfig = new ConsumerConfig(props)
    consumerConnector = Consumer.create(consumerConfig)
    logInfo("Connected to " + zkConnect)

    val keyDecoder = classTag[U].runtimeClass.getConstructor(classOf[VerifiableProperties])
      .newInstance(consumerConfig.props)
      .asInstanceOf[Decoder[K]]
    val valueDecoder = classTag[T].runtimeClass.getConstructor(classOf[VerifiableProperties])
      .newInstance(consumerConfig.props)
      .asInstanceOf[Decoder[V]]

    // Create threads for each topic/message Stream we are listening
    val topicMessageStreams = consumerConnector.createMessageStreams(
      topics, keyDecoder, valueDecoder)

    val executorPool =
      ThreadUtils.newDaemonFixedThreadPool(topics.values.sum, "KafkaMessageHandler")
    try {
      // Start the messages handler for each partition,变量List中的每个KafkaStream对象。

      topicMessageStreams.values.foreach { streams =>

//消息处理
        streams.foreach { stream => executorPool.submit(new MessageHandler(stream)) }
      }
    } finally {
      executorPool.shutdown() // Just causes threads to terminate after work is done
    }
  }

MessageHandler对象

private class MessageHandler(stream: KafkaStream[K, V])
  extends Runnable {
  def run() {
    logInfo("Starting MessageHandler.")
    try {
      val streamIterator = stream.iterator()
      while (streamIterator.hasNext()) {
        val msgAndMetadata = streamIterator.next()
        store((msgAndMetadata.key, msgAndMetadata.message))
      }
    } catch {
      case e: Throwable => reportError("Error handling message; exiting", e)
    }
  }
}
Receiver对象
def store(dataItem: T) {
  supervisor.pushSingle(dataItem)
}

def pushSingle(data: Any) {
  defaultBlockGenerator.addData(data)
}
BlockGenerator的addData方法
def addData(data: Any): Unit = {
  if (state == Active) {
    waitToPush()
    synchronized {
      if (state == Active) {
        currentBuffer += data
      } else {
        throw new SparkException(
          "Cannot add data as BlockGenerator has not been started or has been stopped")
      }
    }
  } else {
    throw new SparkException(
      "Cannot add data as BlockGenerator has not been started or has been stopped")
  }
}

 

 

 

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值