继4-SparkEnv 之后,我们再来详细研究下SparkEnv中出现的一个核心RpcEnv。
首先看下RpcEnv.scala源文件:
主要有RpcEnv伴生对象和伴生类,伴生类为抽象类,主要是一些Rpc框架的接口函数的申明。伴生对象提供静态的create接口,调用具体的RpcEnv工厂(NettyRpcEnvFactory)创建RpcEnv。目前Spark(我使用的是2.1.0版本)的Rpc框架的实现使用的是Netty,已经没有Akka的实现了。
package org.apache.spark.rpc
......
/**
* A RpcEnv implementation must have a [[RpcEnvFactory]] implementation with an empty constructor
* so that it can be created via Reflection.
*/
private[spark] object RpcEnv {
def create(
name: String,
host: String,
port: Int,
conf: SparkConf,
securityManager: SecurityManager,
clientMode: Boolean = false): RpcEnv = {
create(name, host, host, port, conf, securityManager, clientMode)
}
def create(
name: String,
bindAddress: String,
advertiseAddress: String,
port: Int,
conf: SparkConf,
securityManager: SecurityManager,
clientMode: Boolean): RpcEnv = {
val config = RpcEnvConfig(conf, name, bindAddress, advertiseAddress, port, securityManager,
clientMode)
new NettyRpcEnvFactory().create(config)
}
}
/**
* An RPC environment. [[RpcEndpoint]]s need to register itself with a name to [[RpcEnv]] to
* receives messages. Then [[RpcEnv]] will process messages sent from [[RpcEndpointRef]] or remote
* nodes, and deliver them to corresponding [[RpcEndpoint]]s. For uncaught exceptions caught by
* [[RpcEnv]], [[RpcEnv]] will use [[RpcCallContext.sendFailure]] to send exceptions back to the
* sender, or logging them if no such sender or `NotSerializableException`.
*
* [[RpcEnv]] also provides some methods to retrieve [[RpcEndpointRef]]s given name or uri.
*/
private[spark] abstract class RpcEnv(conf: SparkConf) {
private[spark] val defaultLookupTimeout = RpcUtils.lookupRpcTimeout(conf)
/**
* Return RpcEndpointRef of the registered [[RpcEndpoint]]. Will be used to implement
* [[RpcEndpoint.self]]. Return `null` if the corresponding [[RpcEndpointRef]] does not exist.
*/
private[rpc] def endpointRef(endpoint: RpcEndpoint): RpcEndpointRef
/**
* Return the address that [[RpcEnv]] is listening to.
*/
def address: RpcAddress
/**
* Register a [[RpcEndpoint]] with a name and return its [[RpcEndpointRef]]. [[RpcEnv]] does not
* guarantee thread-safety.
*/
def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef
/**
* Retrieve the [[RpcEndpointRef]] represented by `uri` asynchronously.
*/
def asyncSetupEndpointRefByURI(uri: String): Future[RpcEndpointRef]
/**
* Retrieve the [[RpcEndpointRef]] represented by `uri`. This is a blocking action.
*/
def setupEndpointRefByURI(uri: String): RpcEndpointRef = {
defaultLookupTimeout.awaitResult(asyncSetupEndpointRefByURI(uri))
}
/**
* Retrieve the [[RpcEndpointRef]] represented by `address` and `endpointName`.
* This is a blocking action.
*/
def setupEndpointRef(address: RpcAddress, endpointName: String): RpcEndpointRef = {
setupEndpointRefByURI(RpcEndpointAddress(address, endpointName).toString)
}
/**
* Stop [[RpcEndpoint]] specified by `endpoint`.
*/
def stop(endpoint: RpcEndpointRef): Unit
/**
* Shutdown this [[RpcEnv]] asynchronously. If need to make sure [[RpcEnv]] exits successfully,
* call [[awaitTermination()]] straight after [[shutdown()]].
*/
def shutdown(): Unit
/**
* Wait until [[RpcEnv]] exits.
*
* TODO do we need a timeout parameter?
*/
def awaitTermination(): Unit
/**
* [[RpcEndpointRef]] cannot be deserialized without [[RpcEnv]]. So when deserializing any object
* that contains [[RpcEndpointRef]]s, the deserialization codes should be wrapped by this method.
*/
def deserialize[T](deserializationAction: () => T): T
/**
* Return the instance of the file server used to serve files. This may be `null` if the
* RpcEnv is not operating in server mode.
*/
def fileServer: RpcEnvFileServer
/**
* Open a channel to download a file from the given URI. If the URIs returned by the
* RpcEnvFileServer use the "spark" scheme, this method will be called by the Utils class to
* retrieve the files.
*
* @param uri URI with location of the file.
*/
def openChannel(uri: String): ReadableByteChannel
}
在RpcEnv.scala源文件中除了RpcEnv的伴生对象和伴生类外,还有一个样例类RpcEnvConfig和一个trait RpcEnvFileServer。RpcEnvConfig的实例作为参数传递给具体的RpcEnvFactory的create函数创建RpcEnv实例:
private[spark] case class RpcEnvConfig(
conf: SparkConf,
name: String,
bindAddress: String,
advertiseAddress: String,
port: Int,
securityManager: SecurityManager,
clientMode: Boolean)
/**
* A server used by the RpcEnv to server files to other processes owned by the application.
*
* The file server can return URIs handled by common libraries (such as "http" or "hdfs"), or
* it can return "spark" URIs which will be handled by `RpcEnv#fetchFile`.
*/
private[spark] trait RpcEnvFileServer {
/**
* Adds a file to be served by this RpcEnv. This is used to serve files from the driver
* to executors when they're stored on the driver's local file system.
......
*/
def addFile(file: File): String
/**
* Adds a jar to be served by this RpcEnv.
......
*/
def addJar(file: File): String
/**
* Adds a local directory to be served via this file server.
* ......
*/
def addDirectory(baseUri: String, path: File): String
/** Validates and normalizes the base URI for directories. */
protected def validateDirectoryUri(baseUri: String): String = {
val fixedBaseUri = "/" + baseUri.stripPrefix("/").stripSuffix("/")
require(fixedBaseUri != "/files" && fixedBaseUri != "/jars",
"Directory URI cannot be /files nor /jars.")
fixedBaseUri
}
}
RpcEnv.scala文件中已涉及类NettyRpcEnvFactory, RpcEndpoint, RpcEndpointRef。除了NettyRpcEnvFactory是Rpc框架实现层的类外,RpcEndpoint,RpcEndpointRef以及RpcEnv都是Rpc框架抽象层的类,这一节我们主要了解抽象层的类。
RpcEndpoint所在文件为RpcEndpoint.scala,该文件包含trait RpcEnvFactory的定义:
/**
* A factory class to create the [[RpcEnv]]. It must have an empty constructor so that it can be
* created using Reflection.
*/
private[spark] trait RpcEnvFactory {
def create(config: RpcEnvConfig): RpcEnv
}
然后主要是trait RpcEndpoint的定义,RpcEndpoint是Rpc通讯的端点,其trait包含启动/接受消息/停止/连接状态变化的处理等函数的default实现:
/**
* An end point for the RPC that defines what functions to trigger given a message.
*
* It is guaranteed that `onStart`, `receive` and `onStop` will be called in sequence.
*
* The life-cycle of an endpoint is:
*
* constructor -> onStart -> receive* -> onStop
*
* Note: `receive` can be called concurrently. If you want `receive` to be thread-safe, please use
* [[ThreadSafeRpcEndpoint]]
*
* If any error is thrown from one of [[RpcEndpoint]] methods except `onError`, `onError` will be
* invoked with the cause. If `onError` throws an error, [[RpcEnv]] will ignore it.
*/
private[spark] trait RpcEndpoint {
/**
* The [[RpcEnv]] that this [[RpcEndpoint]] is registered to.
*/
val rpcEnv: RpcEnv
/**
* The [[RpcEndpointRef]] of this [[RpcEndpoint]]. `self` will become valid when `onStart` is
* called. And `self` will become `null` when `onStop` is called.
*
* Note: Because before `onStart`, [[RpcEndpoint]] has not yet been registered and there is not
* valid [[RpcEndpointRef]] for it. So don't call `self` before `onStart` is called.
*/
final def self: RpcEndpointRef = {
require(rpcEnv != null, "rpcEnv has not been initialized")
rpcEnv.endpointRef(this)
}
/**
* Process messages from [[RpcEndpointRef.send]] or [[RpcCallContext.reply)]]. If receiving a
* unmatched message, [[SparkException]] will be thrown and sent to `onError`.
*/
def receive: PartialFunction[Any, Unit] = {
case _ => throw new SparkException(self + " does not implement 'receive'")
}
/**
* Process messages from [[RpcEndpointRef.ask]]. If receiving a unmatched message,
* [[SparkException]] will be thrown and sent to `onError`.
*/
def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
case _ => context.sendFailure(new SparkException(self + " won't reply anything"))
}
/**
* Invoked when any exception is thrown during handling messages.
*/
def onError(cause: Throwable): Unit = {
// By default, throw e and let RpcEnv handle it
throw cause
}
/**
* Invoked when `remoteAddress` is connected to the current node.
*/
def onConnected(remoteAddress: RpcAddress): Unit = {
// By default, do nothing.
}
/**
* Invoked when `remoteAddress` is lost.
*/
def onDisconnected(remoteAddress: RpcAddress): Unit = {
// By default, do nothing.
}
/**
* Invoked when some network error happens in the connection between the current node and
* `remoteAddress`.
*/
def onNetworkError(cause: Throwable, remoteAddress: RpcAddress): Unit = {
// By default, do nothing.
}
/**
* Invoked before [[RpcEndpoint]] starts to handle any message.
*/
def onStart(): Unit = {
// By default, do nothing.
}
/**
* Invoked when [[RpcEndpoint]] is stopping. `self` will be `null` in this method and you cannot
* use it to send or ask messages.
*/
def onStop(): Unit = {
// By default, do nothing.
}
/**
* A convenient method to stop [[RpcEndpoint]].
*/
final def stop(): Unit = {
val _self = self
if (_self != null) {
rpcEnv.stop(_self)
}
}
}
/**
* A trait that requires RpcEnv thread-safely sending messages to it.
*
......
*/
private[spark] trait ThreadSafeRpcEndpoint extends RpcEndpoint
RpcEndpointRef所在文件为RpcEndpointRef.scala, 该文件仅包含RpcEndpointRef这个类,该类是一个抽象类,它是对一个远程RpcEndpoint的引用,通过一些接口与远端RpcEndpoint通讯。它与RpcEndpoint一样也是Rpc框架抽象层的类,定义如下:
package org.apache.spark.rpc
......
/**
* A reference for a remote [[RpcEndpoint]]. [[RpcEndpointRef]] is thread-safe.
*/
private[spark] abstract class RpcEndpointRef(conf: SparkConf)
extends Serializable with Logging {
private[this] val maxRetries = RpcUtils.numRetries(conf)
private[this] val retryWaitMs = RpcUtils.retryWaitMs(conf)
private[this] val defaultAskTimeout = RpcUtils.askRpcTimeout(conf)
/**
* return the address for the [[RpcEndpointRef]]
*/
def address: RpcAddress
def name: String
/**
* Sends a one-way asynchronous message. Fire-and-forget semantics.
*/
def send(message: Any): Unit
/**
* Send a message to the corresponding [[RpcEndpoint.receiveAndReply)]] and return a [[Future]] to
* receive the reply within the specified timeout.
*
* This method only sends the message once and never retries.
*/
def ask[T: ClassTag](message: Any, timeout: RpcTimeout): Future[T]
/**
* Send a message to the corresponding [[RpcEndpoint.receiveAndReply)]] and return a [[Future]] to
* receive the reply within a default timeout.
*
* This method only sends the message once and never retries.
*/
def ask[T: ClassTag](message: Any): Future[T] = ask(message, defaultAskTimeout)
/**
* Send a message to the corresponding [[RpcEndpoint]] and get its result within a default
* timeout, or throw a SparkException if this fails even after the default number of retries.
* The default `timeout` will be used in every trial of calling `sendWithReply`. Because this
* method retries, the message handling in the receiver side should be idempotent.
*
* Note: this is a blocking action which may cost a lot of time, so don't call it in a message
* loop of [[RpcEndpoint]].
*
* @param message the message to send
* @tparam T type of the reply message
* @return the reply message from the corresponding [[RpcEndpoint]]
*/
def askWithRetry[T: ClassTag](message: Any): T = askWithRetry(message, defaultAskTimeout)
/**
* Send a message to the corresponding [[RpcEndpoint.receive]] and get its result within a
* specified timeout, throw a SparkException if this fails even after the specified number of
* retries. `timeout` will be used in every trial of calling `sendWithReply`. Because this method
* retries, the message handling in the receiver side should be idempotent.
*
* Note: this is a blocking action which may cost a lot of time, so don't call it in a message
* loop of [[RpcEndpoint]].
*
* @param message the message to send
* @param timeout the timeout duration
* @tparam T type of the reply message
* @return the reply message from the corresponding [[RpcEndpoint]]
*/
def askWithRetry[T: ClassTag](message: Any, timeout: RpcTimeout): T = {
// TODO: Consider removing multiple attempts
var attempts = 0
var lastException: Exception = null
while (attempts < maxRetries) {
attempts += 1
try {
val future = ask[T](message, timeout)
val result = timeout.awaitResult(future)
if (result == null) {
throw new SparkException("RpcEndpoint returned null")
}
return result
} catch {
case ie: InterruptedException => throw ie
case e: Exception =>
lastException = e
logWarning(s"Error sending message [message = $message] in $attempts attempts", e)
}
if (attempts < maxRetries) {
Thread.sleep(retryWaitMs)
}
}
throw new SparkException(
s"Error sending message [message = $message]", lastException)
}
}