class TestMathEndpoint(override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint {
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] ={
case TestAdd(a: Int, b: Int) =>
System.out.println("receive TestAdd:" + a + ", b :" + b);
context.reply((a + b))
case TestSub(a: Int, b: Int) =>
System.out.println("receive TestSub:" + a + ", b :" + b);
context.reply((a - b))
2,使用case class进行信息交互
case class TestAdd(a: Int, b: Int)
case class TestSub(a: Int, b: Int)
import org.apache.spark.rpc._
import org.apache.spark.util.{ RpcUtils, ThreadUtils}
// 将RpcEndpointRef引用放进来,写业务逻辑,然后将业务的结果给RpcEndpointRef
class MathMaster(var driverEndpoint:RpcEndpointRef) {
def testAdd(a: Int, b: Int): Int = {
driverEndpoint.askWithRetry[Int](TestAdd(a, b))
def testSub(a: Int, b: Int): Int = {
driverEndpoint.askWithRetry[Int](TestSub(a, b))
object MathMaster {
val DRIVER_ENDPOINT_NAME= "TestMathMaster"
import org.apache.spark.rpc._
import org.apache.spark._
import org.apache.spark.util.{ RpcUtils, ThreadUtils}
* 1,所有的RpcEndpoint都需要注册到RpcEnv实例对象中(注册的时候会指定注册的名称,
* 这样客户端就可以通过名称查询到RpcEndpoint的Ref的引用,进而进行通信),在RpcEndpoint接收到消息后会receive方法进行处理;
* (回复时候是通过RpcCallContext中的reply方法来回复发送者的),如果不需要reply的话就交给receive方法来处理;
object MathServer {
def main(args: Array[String]) ={
val conf= new SparkConf();
val systemName = "sparkDriver"
val hostname= ""
val port= 4040;
val securityManager= new SecurityManager(conf)
val rpcEnv= RpcEnv.create(systemName, hostname, port, conf, securityManager, false)
val testMathMaster= new MathMaster(rpcEnv.setupEndpoint(MathMaster.DRIVER_ENDPOINT_NAME , new TestMathEndpoint(rpcEnv)))//ref.testMaster("houzhizhen")
//结果信息是通过RpcEndpointRef.askWithRetry[Int](case class)得到的
val result= testMathMaster.testAdd( 3 , 4 )
println( "the result of test: " + result)
import org.apache.spark.rpc._
import org.apache.spark._
object MathClient {
def main(args: Array[String]) ={
val conf= new SparkConf();
val isDriver= false
val systemName = "sparkDriver"
val hostname= ""
val port= 4040;
val isLocal = true;
val securityManager= new SecurityManager(conf)
val rpcEnv= RpcEnv.create(systemName, hostname, port, conf, securityManager, clientMode= true)
//客户端从RpcEnv容器中取RpcEndpointRef。rpcEnv.setupEndpointRef("system名称",RpcAddress:表示远程的RpcEndpointRef的地址,Host + Port,"ref的名称"):
val testMathMaster= new MathMaster(rpcEnv.setupEndpointRef(systemName,RpcAddress(hostname, 4040), TestMathMaster.DRIVER_ENDPOINT_NAME))
val result= testMathMaster.testAdd(1, 2)
println("the result of test: " + result)
UsingSpark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/04/1619:53:19 INFO SecurityManager: Changing view acls to: luyl
18/04/1619:53:19 INFO SecurityManager: Changing modify acls to: luyl
18/04/1619:53:19 INFO SecurityManager: SecurityManager: authentication disabled; uiacls disabled; users with view permissions: Set(luyl); users with modifypermissions: Set(luyl)
18/04/1619:53:20 INFO Utils: Successfully started service 'sparkDriver' on port 4040.
receiveTestAdd:3, b :4
theresult of test: 7
receiveTestAdd:1, b :2
UsingSpark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/04/1619:53:26 INFO SecurityManager: Changing view acls to: luyl
18/04/1619:53:26 INFO SecurityManager: Changing modify acls to: luyl
18/04/1619:53:26 INFO SecurityManager: SecurityManager: authentication disabled; uiacls disabled; users with view permissions: Set(luyl); users with modifypermissions: Set(luyl)
theresult of test: 3
* A RpcEnv implementation must have a [[RpcEnvFactory]] implementationwith an empty constructor
* so that it can be created viaReflection.
* RpcEndpoint => Actor用于处理信息
RpcEndpointRef => ActorRef
RpcEnv => ActorSystem
* 1,RpcEnv负责RpcEndpoint整个生命周期的管理,(相当于Akka中的ActorSystem)注册并维护RpcEndpoint和RpcEndpointRef
* 2,所有的RpcEndpoint都需要注册到RpcEnv实例对象中,在RpcEndpoint接收到消息后会receive方法进行处理;
* 备:在 Spark 中具体的 rpc 对象:worker节点是一个endpoint, master 节点也是一个 endpoint,
* 包括其它的 driver,application, executorbackend
private[spark] object RpcEnv{
private def getRpcEnvFactory(conf:SparkConf): RpcEnvFactory = {
val rpcEnvNames= Map(
"akka" -> "org.apache.spark.rpc.akka.AkkaRpcEnvFactory",
"netty" -> "org.apache.spark.rpc.netty.NettyRpcEnvFactory")
val rpcEnvName= conf.get("spark.rpc", "netty") //默认是netty
val rpcEnvFactoryClassName= rpcEnvNames.getOrElse(rpcEnvName.toLowerCase, rpcEnvName)
def create(
name: String,
host: String,
port: Int,
conf: SparkConf,
securityManager: SecurityManager,
clientMode: Boolean = false): RpcEnv = {
// Using Reflection to create the RpcEnv to avoid todepend on Akka directly
// rpcEndpoint与rpcEndpointRef之间映射关系保存在dispatcher对应的成员变量中
val config= RpcEnvConfig(conf, name, host, port, securityManager, clientMode)
private[spark] abstract class RpcEnv(conf: SparkConf) {
private[spark]val defaultLookupTimeout = RpcUtils.lookupRpcTimeout(conf)
* Return RpcEndpointRef of theregistered [[RpcEndpoint]].Will be used to implement
* [[RpcEndpoint.self]]. Return `null` if the corresponding [[RpcEndpointRef]] does not exist.
* 返回注册[[RpcEndpoint]]的RpcEndpointRef。将用于实现[[RpcEndpoint.self]]。
* 如果对应的[[RpcEndpointRef]]不存在,则返回“null”。
* 根据RpcEndpoint返回RpcEndpointRef,如果RpcEndpointRef不存在,将返回null。
private[rpc]def endpointRef(endpoint:RpcEndpoint): RpcEndpointRef
* Return the address that [[RpcEnv]] is listening to.
def address: RpcAddress
* Register a [[RpcEndpoint]] with a name andreturn its [[RpcEndpointRef]]. [[RpcEnv]] does not
* guarantee thread-safety.
* 使用一个name名称注册一个[[RpcEndpoint]],并返回其[[RpcEndpointRef]]。[[RpcEnv]]没有保证线程安全。
* akka或netty都需要使用这个方法将Rpc的远程对象注册到Rpc容器中。
* 1,用法上对应例子中ActorSystem的actorOf方法,内部使用Dispatcher维护注册的RpcEndpoint,
* 也提供了多种获取RpcEndpointRef的方法,如asyncSetupEndpointRefByURI、setupEndpointRefByURI和setupEndpointRef,
* 以及移除RpcEndpoint的方法stop,关闭RpcEnv的方法shutdown,其还维护了RpcEnvFileServer,用于上传jar和file。
def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef
* Retrieve the [[RpcEndpointRef]] representedby `uri` asynchronously.
* (1)通过url异步获取RpcEndpointRef
def asyncSetupEndpointRefByURI(uri: String):Future[RpcEndpointRef]
* Retrieve the [[RpcEndpointRef]] representedby `uri`.This is a blocking action.
def setupEndpointRefByURI(uri: String):RpcEndpointRef = {
* Retrieve the [[RpcEndpointRef]] representedby `systemName`, `address` and `endpointName`.
* This is a blocking action.
def setupEndpointRef(
systemName: String, address:RpcAddress, endpointName: String):RpcEndpointRef = {
setupEndpointRefByURI(uriOf(systemName, address, endpointName))
* Stop [[RpcEndpoint]] specified by `endpoint`.
* 停止[[RpcEndpoint]]根据指定的`endpoint`
* 根据RpcEndpointRef停止RpcEndpoint
def stop(endpoint: RpcEndpointRef): Unit
* Shutdown this [[RpcEnv]] asynchronously. If need to make sure [[RpcEnv]] exits successfully,
* call [[awaitTermination()]] straight after [[shutdown()]].
def shutdown(): Unit
* Wait until [[RpcEnv]] exits.
* TODO do we need a timeout parameter?
* 等待直到RpcEnv退出
def awaitTermination(): Unit
* [[RpcEndpointRef]] cannot be deserialized without [[RpcEnv]]. So when deserializing any object
* that contains [[RpcEndpointRef]]s,the deserialization codes should be wrapped by this method.
*在[[RpcEnv]] 之外不能反系列化 [[RpcEndpointRef]]
* 所以RpcEndpointRef需要RpcEnv来反序列化,当反序列化RpcEndpointRefs的object时,需要通过该方法来操作
def deserialize[T](deserializationAction: () => T): T
* Return the instance of the fileserver used to serve files. This may be `null` if the
* RpcEnv is not operating in servermode.
* 打开从给定URI下载文件的通道。如果由RpcEnvFileServer返回的uri使用“spark”方案,则Utils类将调用该方法来检索文件。
def fileServer: RpcEnvFileServer
* A server used by the RpcEnv to serverfiles to other processes owned by the application.
* The file server can return URIshandled by common libraries (such as "http" or "hdfs"), or it can return "spark" URIs whichwill be handled by `RpcEnv#fetchFile`.
private[spark] trait RpcEnvFileServer{
* Adds a file to be served by thisRpcEnv. This is used to serve files from the driver
* to executors when they're stored onthe driver's local file system.
* 用RpcEnv保存增加的文件,当文件放在driver的本地文件系统中时,executors会去driver中取文件
* 也就是说,spark-submit提交的jar文件还是在driver节点中,只jettyServer提供一个文件服务而以
* @param file Local file toserve.
* @return A URI for thelocation of the file.
def addFile(file: File): String
* Adds a jar to be served by thisRpcEnv. Similar to `addFile` butfor jars added using
* `SparkContext.addJar`.
* @param file Local file toserve.
* @return A URI for thelocation of the file.
def addJar(file: File): String
* An end point for the RPC that defineswhat functions to trigger given a message.
* It is guaranteed that `onStart`, `receive` and `onStop` willbe called in sequence.
* The life-cycle of an endpoint is:
* constructor -> onStart ->receive* -> onStop
* Note: `receive` can be called concurrently. If you want `receive` to be thread-safe, please use
* [[ThreadSafeRpcEndpoint]]
* If any error is thrown from one of [[RpcEndpoint]] methods except `onError`, `onError` will be
* invoked with the cause. If `onError` throws an error, [[RpcEnv]] will ignore it.
* sparkRPC是基于actor来封装的;
* RpcEndpoint => Actor用于处理信息
RpcEndpointRef => ActorRef
RpcEnv => ActorSystem
* 2,RpcEndpoint:表示一个个需要通信的个体(如master,worker,driver),主要根据接收的消息来进行对应的处理。
* 3,一个RpcEndpoint生命周期:构建->onStart→receive→onStop。
* 4,其中onStart在接收任务消息前调用,receive和receiveAndReply分别用来接收另一个RpcEndpoint(也可以是本身)send和ask过来的消息。
private[spark] trait RpcEndpoint{
* The [[RpcEnv]] that this [[RpcEndpoint]] is registered to.
val rpcEnv:RpcEnv
* 当“onStart”被调用时,“self”将会变得有效。当“onStop”被调用时,“self”将变为“null”。
final def self: RpcEndpointRef ={
require(rpcEnv != null, "rpcEnvhas not been initialized")
* Process messages from [[RpcEndpointRef.send]] or [[RpcCallContext.reply)]]. Ifreceiving a
* unmatched message, [[SparkException]] willbe thrown and sent to `onError`.
* 处理RpcEndpointRef.send或RpcCallContext.reply方法,如果收到不匹配的消息,将抛出SparkException
* 处理一个Ref调用send或者reply发送过过来的消息
def receive: PartialFunction[Any, Unit] = {
case _=> throw new SparkException(self + " does not implement 'receive'")
* Process messages from [[RpcEndpointRef.ask]]. Ifreceiving a unmatched message,
* [[SparkException]] will be thrown and sent to `onError`.
* 处理从RpcEndpointRef.ask和RpcEndpointRef.askWithRetry过来的信息
def receiveAndReply(context: RpcCallContext):PartialFunction[Any, Unit] = {
case _=> context.sendFailure(new SparkException(self + " won't reply anything"))
* Invoked when any exception is thrownduring handling messages.
def onError(cause: Throwable): Unit = {
// By default, throw e and let RpcEnv handle it
throw cause
* Invoked when `remoteAddress` isconnected to the current node.
* 当远程地址连接到当前的节点地址时触发
def onConnected(remoteAddress: RpcAddress): Unit = {
// By default, do nothing.
* Invoked when `remoteAddress` islost.
* 当远程地址连接断开时触发
def onDisconnected(remoteAddress: RpcAddress): Unit = {
// By default, do nothing.
* Invoked when some network errorhappens in the connection between the current node and
* `remoteAddress`.
* 当远程地址和当前节点的连接发生网络异常时触发
def onNetworkError(cause: Throwable, remoteAddress: RpcAddress): Unit = {
// By default, do nothing.
* Invoked before [[RpcEndpoint]] starts to handleany message.
def onStart(): Unit = {
// By default, do nothing.
* Invoked when [[RpcEndpoint]] is stopping. `self` will be `null` in this method and you cannot
* use it to send or ask messages.
* 当[[RpcEndpoint]]停止时调用。“self”将在这个方法中为“null”,你不能用它来发送或询问消息。
def onStop(): Unit = {
// By default, do nothing.
* A reference for a remote [[RpcEndpoint]]. [[RpcEndpointRef]] isthread-safe.
* RpcEndpoint => Actor用于处理信息
RpcEndpointRef => ActorRef
RpcEnv => ActorSystem
* RpcEndpointRef对应actor例子中的ActorRef,向对应的RpcEndpoint发送信息
private[spark] abstract class RpcEndpointRef(conf: SparkConf)
extends Serializable with Logging {
private[this] val maxRetries = RpcUtils.numRetries(conf)
private[this] val retryWaitMs = RpcUtils.retryWaitMs(conf)
private[this] val defaultAskTimeout = RpcUtils.askRpcTimeout(conf)
* return the address for the [[RpcEndpointRef]]
* address和name,用于对应这个RpcEndpointRef所属的RpcEndpoint。
* NettyRpcEndpointRef是其具体实现,内部使用Dispatcher、Inbox、Outbox等组件发送信息
def address: RpcAddress
def name: String
* Sends a one-way asynchronousmessage. Fire-and-forget semantics.
* 只发送消息和akka的!一样
def send(message: Any): Unit
* Send a message to the corresponding [[RpcEndpoint.receiveAndReply)]] andreturn a [[Future]] to
* receive the reply within thespecified timeout.
* This method only sends the messageonce and never retries.
* 发送消息同时接收回馈信息
def ask[T: ClassTag](message: Any, timeout: RpcTimeout): Future[T]
* Send a message to the corresponding [[RpcEndpoint.receiveAndReply)]] andreturn a [[Future]] to
* receive the reply within a defaulttimeout.
* This method only sends the messageonce and never retries.
def ask[T: ClassTag](message: Any): Future[T] = ask(message, defaultAskTimeout)
def askWithRetry[T: ClassTag](message: Any): T =askWithRetry(message, defaultAskTimeout)