RpcEndpointRef
作用
RpcEndpointRef作为是RpcEndpoint的引用,观察类方法可以发现,其模版是用于节点发送消息。
包含 请求回调(ask) 、异步请求回调(askSync)、通知(send)等方法。
private[spark] abstract class RpcEndpointRef(conf: SparkConf)
extends Serializable with Logging {
private[this] val maxRetries = RpcUtils.numRetries(conf)
private[this] val retryWaitMs = RpcUtils.retryWaitMs(conf)
private[this] val defaultAskTimeout = RpcUtils.askRpcTimeout(conf)
/**
* return the address for the [[RpcEndpointRef]]
*/
def address: RpcAddress
def name: String
/**
* Sends a one-way asynchronous message. Fire-and-forget semantics.
*/
def send(message: Any): Unit
/**
* Send a message to the corresponding [[RpcEndpoint.receiveAndReply)]] and return a
* [[AbortableRpcFuture]] to receive the reply within the specified timeout.
* The [[AbortableRpcFuture]] instance wraps [[Future]] with additional `abort` method.
*
* This method only sends the message once and never retries.
*/
def askAbortable[T: ClassTag](message: Any, timeout: RpcTimeout): AbortableRpcFuture[T] = {
throw new UnsupportedOperationException()
}
/**
* Send a message to the corresponding [[RpcEndpoint.receiveAndReply)]] and return a [[Future]] to
* receive the reply within the specified timeout.
*
* This method only sends the message once and never retries.
*/
def ask[T: ClassTag](message: Any, timeout: RpcTimeout): Future[T]
/**
* Send a message to the corresponding [[RpcEndpoint.receiveAndReply)]] and return a [[Future]] to
* receive the reply within a default timeout.
*
* This method only sends the message once and never retries.
*/
def ask[T: ClassTag](message: Any): Future[T] = ask(message, defaultAskTimeout)
/**
* Send a message to the corresponding [[RpcEndpoint.receiveAndReply]] and get its result within a
* default timeout, throw an exception if this fails.
*
* Note: this is a blocking action which may cost a lot of time, so don't call it in a message
* loop of [[RpcEndpoint]].
* @param message the message to send
* @tparam T type of the reply message
* @return the reply message from the corresponding [[RpcEndpoint]]
*/
def askSync[T: ClassTag](message: Any): T = askSync(message, defaultAskTimeout)
/**
* Send a message to the corresponding [[RpcEndpoint.receiveAndReply]] and get its result within a
* specified timeout, throw an exception if this fails.
*
* Note: this is a blocking action which may cost a lot of time, so don't call it in a message
* loop of [[RpcEndpoint]].
*
* @param message the message to send
* @param timeout the timeout duration
* @tparam T type of the reply message
* @return the reply message from the corresponding [[RpcEndpoint]]
*/
def askSync[T: ClassTag](message: Any, timeout: RpcTimeout): T = {
val future = ask[T](message, timeout)
timeout.awaitResult(future)
}
}
NettyRpcEndpointRef
看源码中dispatcher.registerRpcEndpoint()
方法是初始化RpcEndpointRef的入口,真正的实现类是NettyRpcEndpointRef,内部有具体的请求逻辑。
def registerRpcEndpoint(name: String, endpoint: RpcEndpoint): NettyRpcEndpointRef = {
val addr = RpcEndpointAddress(nettyEnv.address, name)
val endpointRef = new NettyRpcEndpointRef(nettyEnv.conf, addr, nettyEnv)
synchronized {
if (stopped) {
throw new IllegalStateException("RpcEnv has been stopped")
}
if (endpoints.containsKey(name)) {
throw new IllegalArgumentException(s"There is already an RpcEndpoint called $name")
}
endpointRefs.put(endpoint, endpointRef)
var messageLoop: MessageLoop = null
try {
messageLoop = endpoint match {
case e: IsolatedRpcEndpoint =>
new DedicatedMessageLoop(name, e, this)
case _ =>
sharedLoop.register(name, endpoint)
sharedLoop
}
endpoints.put(name, messageLoop)
} catch {
case NonFatal(e) =>
endpointRefs.remove(endpoint)
throw e
}
}
endpointRef
}
RpcEndpoint
作用
RpcEndpoint 可以理解为接收信息的的站点 ,主要通过receive/receiveAndReply两个方法接收消息并处理的。
RpcEndpoint也可以获取自己的RpcEndpointRef引用(rpcEnv.endpointRef(this)) 实现发送消息的功能。
/**
* An end point for the RPC that defines what functions to trigger given a message.
*
* It is guaranteed that `onStart`, `receive` and `onStop` will be called in sequence.
*
* The life-cycle of an endpoint is:
*
* {@code constructor -> onStart -> receive* -> onStop}
*
* Note: `receive` can be called concurrently. If you want `receive` to be thread-safe, please use
* [[ThreadSafeRpcEndpoint]]
*
* If any error is thrown from one of [[RpcEndpoint]] methods except `onError`, `onError` will be
* invoked with the cause. If `onError` throws an error, [[RpcEnv]] will ignore it.
*/
private[spark] trait RpcEndpoint {
/**
* The [[RpcEnv]] that this [[RpcEndpoint]] is registered to.
*/
val rpcEnv: RpcEnv
/**
* The [[RpcEndpointRef]] of this [[RpcEndpoint]]. `self` will become valid when `onStart` is
* called. And `self` will become `null` when `onStop` is called.
*
* Note: Because before `onStart`, [[RpcEndpoint]] has not yet been registered and there is not
* valid [[RpcEndpointRef]] for it. So don't call `self` before `onStart` is called.
*/
final def self: RpcEndpointRef = {
require(rpcEnv != null, "rpcEnv has not been initialized")
rpcEnv.endpointRef(this)
}
/**
* Process messages from `RpcEndpointRef.send` or `RpcCallContext.reply`. If receiving a
* unmatched message, `SparkException` will be thrown and sent to `onError`.
*/
def receive: PartialFunction[Any, Unit] = {
case _ => throw new SparkException(self + " does not implement 'receive'")
}
/**
* Process messages from `RpcEndpointRef.ask`. If receiving a unmatched message,
* `SparkException` will be thrown and sent to `onError`.
*/
def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
case _ => context.sendFailure(new SparkException(self + " won't reply anything"))
}
/**
* Invoked when any exception is thrown during handling messages.
*/
def onError(cause: Throwable): Unit = {
// By default, throw e and let RpcEnv handle it
throw cause
}
/**
* Invoked when `remoteAddress` is connected to the current node.
*/
def onConnected(remoteAddress: RpcAddress): Unit = {
// By default, do nothing.
}
/**
* Invoked when `remoteAddress` is lost.
*/
def onDisconnected(remoteAddress: RpcAddress): Unit = {
// By default, do nothing.
}
/**
* Invoked when some network error happens in the connection between the current node and
* `remoteAddress`.
*/
def onNetworkError(cause: Throwable, remoteAddress: RpcAddress): Unit = {
// By default, do nothing.
}
/**
* Invoked before [[RpcEndpoint]] starts to handle any message.
*/
def onStart(): Unit = {
// By default, do nothing.
}
/**
* Invoked when [[RpcEndpoint]] is stopping. `self` will be `null` in this method and you cannot
* use it to send or ask messages.
*/
def onStop(): Unit = {
// By default, do nothing.
}
/**
* A convenient method to stop [[RpcEndpoint]].
*/
final def stop(): Unit = {
val _self = self
if (_self != null) {
rpcEnv.stop(_self)
}
}
}
实现类
所以集群中的每个节点到要继承这个RpcEndpoint类,包括Master、Worker、Driver、Executor等
RpcEndpoint
- ThreadSafeRpcEndpoint(接口)
- IsolatedRpcEndpoint(接口)
- RpcEndpointVerifier(校验节点存在)
- MapOutputTrackerMasterEndpoint(处理MapOutputTrackerMaster的节点)
- OutputCommitCoordinatorEndpoint(写hadoop文件时候用的通信节点)
- WorkerWatcher(监听worker节点)
ThreadSafeRpcEndpoint
- Master(master常驻进程)
- Worker(worker常驻进程)
- StateStoreCoordinator(streaming节点:状态检查)
- ContinuousRecordEndpoint(streaming节点:从driver不断获取记录)
- EpochCoordinator(streaming节点)
- HeartbeatReceiver(存在于driver端用于接收Executor的心跳)
- BarrierCoordinator(处理接收所有的barrier())
- BlockManagerMasterHeartbeatEndpoint(BlockManagerMaster的心跳节点)
- ClientEndpoint(spark集群启动的客户端)
- LocalEndpoint(standalone模式节点)
IsolatedRpcEndpoint
- BlockManagerMasterEndpoint(追踪所有BlockManager,操作元数据)
- BlockManagerStorageEndpoint(用于删除RDD、shuffle、block等操作)
- CoarseGrainedExecutorBackend(Executor)
- DriverEndpoint
- PluginEndpoint(插件节点)