RpcEnv底层就是Netty实现
一、自定义RpcEnv模拟Master和Worker进行通信
1、自定义的RpcEndPoint,需要实现RpcEndPoint
//自定义的RPC实现ThreadSafeRpcEndpoint,说明它是线程安全的
class TestMathEndpoint(override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint {
//如果不需要回复信息,需要重写父类RpcEndpoint的receive方法
//如果要给发件人回信息,需要重写父类RpcEndpoint的receiveAndReply方法,该方法会注入发件人的引用即RpcCallContext,回复信息直接使用RpcCallContext.reply(..)
//偏函数里可以直接放模式,偏函数的第1个泛型参数是给case接收的,第2个泛型参数表示case的返回值
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] ={
case TestAdd(a: Int, b: Int) =>
System.out.println("receive TestAdd:" + a + ", b :" + b);
context.reply((a + b))
case TestSub(a: Int, b: Int) =>
System.out.println("receive TestSub:" + a + ", b :" + b);
context.reply((a - b))
}
}
2,使用case class进行信息交互
case class TestAdd(a: Int, b: Int)
case class TestSub(a: Int, b: Int)
3,使用业务封装类进行发送信息
import org.apache.spark.rpc._
import org.apache.spark.util.{ RpcUtils, ThreadUtils}
//使用MathMaster做为RpcEndpointRef业务逻辑类,类似spring的service:
// 将RpcEndpointRef引用放进来,写业务逻辑,然后将业务的结果给RpcEndpointRef
class MathMaster(var driverEndpoint:RpcEndpointRef) {
def testAdd(a: Int, b: Int): Int = {
//接收RpcEndpointRef回复信息
driverEndpoint.askWithRetry[Int](TestAdd(a, b))
}
def testSub(a: Int, b: Int): Int = {
driverEndpoint.askWithRetry[Int](TestSub(a, b))
}
}
object MathMaster {
val DRIVER_ENDPOINT_NAME= "TestMathMaster"
}
4,创建一个Server,生成RpcEnv,并将TestMathEndpoint放在RpcEnv中,然后调用业务类
import org.apache.spark.rpc._
import org.apache.spark._
import org.apache.spark.util.{ RpcUtils, ThreadUtils}
/**
* 1,所有的RpcEndpoint都需要注册到RpcEnv实例对象中(注册的时候会指定注册的名称,
* 这样客户端就可以通过名称查询到RpcEndpoint的Ref的引用,进而进行通信),在RpcEndpoint接收到消息后会receive方法进行处理;
*
*2,RpcEndpoint如果接收到需要reply的消息的话就会交给自己的receiveAndReply来处理
* (回复时候是通过RpcCallContext中的reply方法来回复发送者的),如果不需要reply的话就交给receive方法来处理;
*/
object MathServer {
def main(args: Array[String]) ={
val conf= new SparkConf();
val systemName = "sparkDriver"
val hostname= "127.0.0.1"
val port= 4040;
//得到一个RpcEnv容器需要使用RpcEnv.create,最后一个参数是否是client端,false表示在server端
val securityManager= new SecurityManager(conf)
val rpcEnv= RpcEnv.create(systemName, hostname, port, conf, securityManager, false)
//RpcEndpointRef是放到RpcEnv容器中的。注册RpcEndpointRef到RpcEnv中:rpcEnv.setupEndpoint("rpcEndPoint的名称",ThreadSafeRpcEndpoint实例类):
val testMathMaster= new MathMaster(rpcEnv.setupEndpoint(MathMaster.DRIVER_ENDPOINT_NAME , new TestMathEndpoint(rpcEnv)))//ref.testMaster("houzhizhen")
//结果信息是通过RpcEndpointRef.askWithRetry[Int](case class)得到的
val result= testMathMaster.testAdd( 3 , 4 )
println( "the result of test: " + result)
rpcEnv.awaitTermination()
}
}
5,创建一个client和server进行交互
import org.apache.spark.rpc._
import org.apache.spark._
object MathClient {
def main(args: Array[String]) ={
val conf= new SparkConf();
//println("TestRpcEnv");
val isDriver= false
val systemName = "sparkDriver"
val hostname= "127.0.0.1"
val port= 4040;
val isLocal = true;
//得到一个RpcEnv容器需要使用RpcEnv.create,最后一个参数是否是client端,false表示在server端
val securityManager= new SecurityManager(conf)
val rpcEnv= RpcEnv.create(systemName, hostname, port, conf, securityManager, clientMode= true)
//客户端从RpcEnv容器中取RpcEndpointRef。rpcEnv.setupEndpointRef("system名称",RpcAddress:表示远程的RpcEndpointRef的地址,Host + Port,"ref的名称"):
val testMathMaster= new MathMaster(rpcEnv.setupEndpointRef(systemName,RpcAddress(hostname, 4040), TestMathMaster.DRIVER_ENDPOINT_NAME))
val result= testMathMaster.testAdd(1, 2)
println("the result of test: " + result)
rpcEnv.awaitTermination()
}
}
=====》启动server打印信息:
log4j:WARNNo appenders could be found for logger(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARNPlease initialize the log4j system properly.
log4j:WARNSee http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
UsingSpark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/04/1619:53:19 INFO SecurityManager: Changing view acls to: luyl
18/04/1619:53:19 INFO SecurityManager: Changing modify acls to: luyl
18/04/1619:53:19 INFO SecurityManager: SecurityManager: authentication disabled; uiacls disabled; users with view permissions: Set(luyl); users with modifypermissions: Set(luyl)
18/04/1619:53:20 INFO Utils: Successfully started service 'sparkDriver' on port 4040.
receiveTestAdd:3, b :4
theresult of test: 7
receiveTestAdd:1, b :2
=====》启动client打印:
log4j:WARNNo appenders could be found for logger(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARNPlease initialize the log4j system properly.
log4j:WARNSee http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
UsingSpark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/04/1619:53:26 INFO SecurityManager: Changing view acls to: luyl
18/04/1619:53:26 INFO SecurityManager: Changing modify acls to: luyl
18/04/1619:53:26 INFO SecurityManager: SecurityManager: authentication disabled; uiacls disabled; users with view permissions: Set(luyl); users with modifypermissions: Set(luyl)
theresult of test: 3
二、RpcEnv进行源码分析,RpcEnv相当于ActorSystem
/**
* A RpcEnv implementation must have a [[RpcEnvFactory]] implementationwith an empty constructor
* so that it can be created viaReflection.
* RpcEndpoint => Actor用于处理信息
RpcEndpointRef => ActorRef
RpcEnv => ActorSystem
* 1,RpcEnv负责RpcEndpoint整个生命周期的管理,(相当于Akka中的ActorSystem)注册并维护RpcEndpoint和RpcEndpointRef
*包括:注册endpoint(实现并调用setupEndpoint),endpoint之间消息的路由,以及停止endpoint。
* 2,所有的RpcEndpoint都需要注册到RpcEnv实例对象中,在RpcEndpoint接收到消息后会receive方法进行处理;
*如果要回复信息使用receiveAndReply来处理,(回复时候是通过RpcCallContext中的reply方法来回复发送者的)
*
*
* 备:在 Spark 中具体的 rpc 对象:worker节点是一个endpoint, master 节点也是一个 endpoint,
* 包括其它的 driver,application, executorbackend
*/
private[spark] object RpcEnv{
//通过反射的方式将RpcEnvFactory具体的实例反射出来,有两种akka或netty,没有设置spar.rpc的信息就会使用netty的RPC
private def getRpcEnvFactory(conf:SparkConf): RpcEnvFactory = {
val rpcEnvNames= Map(
"akka" -> "org.apache.spark.rpc.akka.AkkaRpcEnvFactory",
"netty" -> "org.apache.spark.rpc.netty.NettyRpcEnvFactory")
val rpcEnvName= conf.get("spark.rpc", "netty") //默认是netty
val rpcEnvFactoryClassName= rpcEnvNames.getOrElse(rpcEnvName.toLowerCase, rpcEnvName)
Utils.classForName(rpcEnvFactoryClassName).newInstance().asInstanceOf[RpcEnvFactory]
}
//如果port是0话,会给RpcEnv.address.prot动态分配一个非0的端口
def create(
name: String,
host: String,
port: Int,
conf: SparkConf,
securityManager: SecurityManager,
clientMode: Boolean = false): RpcEnv = {
// Using Reflection to create the RpcEnv to avoid todepend on Akka directly
//1,通过工厂RpcEnvFactory来产生一个RpcEnv,而NettyRpcEnvFactory用来生成NettyRpcEnv的一个对象
//2,当我们调用RpcEnv中的setupEndpoint来注册一个endpoint到rpcEnv的时候,在NettyRpcEnv内部,会将该endpoint的名称与其本省的映射关系,
// rpcEndpoint与rpcEndpointRef之间映射关系保存在dispatcher对应的成员变量中
val config= RpcEnvConfig(conf, name, host, port, securityManager, clientMode)
//RpcEnvFactory是负责创建RpcEnv的,通过create方法创建RpcEnv实例对象,默认是用的Netty
getRpcEnvFactory(conf).create(config)
}
}
。。
private[spark] abstract class RpcEnv(conf: SparkConf) {
private[spark]val defaultLookupTimeout = RpcUtils.lookupRpcTimeout(conf)
/**
* Return RpcEndpointRef of theregistered [[RpcEndpoint]].Will be used to implement
* [[RpcEndpoint.self]]. Return `null` if the corresponding [[RpcEndpointRef]] does not exist.
* 返回注册[[RpcEndpoint]]的RpcEndpointRef。将用于实现[[RpcEndpoint.self]]。
* 如果对应的[[RpcEndpointRef]]不存在,则返回“null”。
*
* 根据RpcEndpoint返回RpcEndpointRef,如果RpcEndpointRef不存在,将返回null。
*/
private[rpc]def endpointRef(endpoint:RpcEndpoint): RpcEndpointRef
/**
* Return the address that [[RpcEnv]] is listening to.
*/
def address: RpcAddress
/**
* Register a [[RpcEndpoint]] with a name andreturn its [[RpcEndpointRef]]. [[RpcEnv]] does not
* guarantee thread-safety.
* 使用一个name名称注册一个[[RpcEndpoint]],并返回其[[RpcEndpointRef]]。[[RpcEnv]]没有保证线程安全。
*
* akka或netty都需要使用这个方法将Rpc的远程对象注册到Rpc容器中。
* 1,用法上对应例子中ActorSystem的actorOf方法,内部使用Dispatcher维护注册的RpcEndpoint,
* 也提供了多种获取RpcEndpointRef的方法,如asyncSetupEndpointRefByURI、setupEndpointRefByURI和setupEndpointRef,
* 以及移除RpcEndpoint的方法stop,关闭RpcEnv的方法shutdown,其还维护了RpcEnvFileServer,用于上传jar和file。
*/
def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef
/**
* Retrieve the [[RpcEndpointRef]] representedby `uri` asynchronously.
* (1)通过url异步获取RpcEndpointRef
*/
def asyncSetupEndpointRefByURI(uri: String):Future[RpcEndpointRef]
/**
* Retrieve the [[RpcEndpointRef]] representedby `uri`.This is a blocking action.
*/
def setupEndpointRefByURI(uri: String):RpcEndpointRef = {
defaultLookupTimeout.awaitResult(asyncSetupEndpointRefByURI(uri))
}
/**
* Retrieve the [[RpcEndpointRef]] representedby `systemName`, `address` and `endpointName`.
* This is a blocking action.
*/
def setupEndpointRef(
systemName: String, address:RpcAddress, endpointName: String):RpcEndpointRef = {
setupEndpointRefByURI(uriOf(systemName, address, endpointName))
}
/**
* Stop [[RpcEndpoint]] specified by `endpoint`.
*
* 停止[[RpcEndpoint]]根据指定的`endpoint`
* 根据RpcEndpointRef停止RpcEndpoint
*/
def stop(endpoint: RpcEndpointRef): Unit
/**
* Shutdown this [[RpcEnv]] asynchronously. If need to make sure [[RpcEnv]] exits successfully,
* call [[awaitTermination()]] straight after [[shutdown()]].
*/
def shutdown(): Unit
/**
* Wait until [[RpcEnv]] exits.
*
* TODO do we need a timeout parameter?
* 等待直到RpcEnv退出
*/
def awaitTermination(): Unit
。。。。
/**
* [[RpcEndpointRef]] cannot be deserialized without [[RpcEnv]]. So when deserializing any object
* that contains [[RpcEndpointRef]]s,the deserialization codes should be wrapped by this method.
*
*在[[RpcEnv]] 之外不能反系列化 [[RpcEndpointRef]]
* 所以RpcEndpointRef需要RpcEnv来反序列化,当反序列化RpcEndpointRefs的object时,需要通过该方法来操作
*/
def deserialize[T](deserializationAction: () => T): T
/**
* Return the instance of the fileserver used to serve files. This may be `null` if the
* RpcEnv is not operating in servermode.
* 打开从给定URI下载文件的通道。如果由RpcEnvFileServer返回的uri使用“spark”方案,则Utils类将调用该方法来检索文件。
*/
def fileServer: RpcEnvFileServer
。。。
}
/**
* A server used by the RpcEnv to serverfiles to other processes owned by the application.
* The file server can return URIshandled by common libraries (such as "http" or "hdfs"), or it can return "spark" URIs whichwill be handled by `RpcEnv#fetchFile`.
*/
private[spark] trait RpcEnvFileServer{
/**
* Adds a file to be served by thisRpcEnv. This is used to serve files from the driver
* to executors when they're stored onthe driver's local file system.
* 用RpcEnv保存增加的文件,当文件放在driver的本地文件系统中时,executors会去driver中取文件
* 也就是说,spark-submit提交的jar文件还是在driver节点中,只jettyServer提供一个文件服务而以
* @param file Local file toserve.
* @return A URI for thelocation of the file.
*/
def addFile(file: File): String
/**
* Adds a jar to be served by thisRpcEnv. Similar to `addFile` butfor jars added using
* `SparkContext.addJar`.
*
* @param file Local file toserve.
* @return A URI for thelocation of the file.
*/
def addJar(file: File): String
}
。。
三、RpcEndpoint源码分析,相当于Actor
/**
* An end point for the RPC that defineswhat functions to trigger given a message.
*
* It is guaranteed that `onStart`, `receive` and `onStop` willbe called in sequence.
*
* The life-cycle of an endpoint is:
*
* constructor -> onStart ->receive* -> onStop
*
* Note: `receive` can be called concurrently. If you want `receive` to be thread-safe, please use
* [[ThreadSafeRpcEndpoint]]
*
* If any error is thrown from one of [[RpcEndpoint]] methods except `onError`, `onError` will be
* invoked with the cause. If `onError` throws an error, [[RpcEnv]] will ignore it.
* sparkRPC是基于actor来封装的;
* RpcEndpoint => Actor用于处理信息
RpcEndpointRef => ActorRef
RpcEnv => ActorSystem
1,如果继承RpcEndpoint表明可以并发的调用该服务,如果继承自ThreadSafeRpcEndpoint则表明该Endpoint不允许并发
* 2,RpcEndpoint:表示一个个需要通信的个体(如master,worker,driver),主要根据接收的消息来进行对应的处理。
* 3,一个RpcEndpoint生命周期:构建->onStart→receive→onStop。
* 4,其中onStart在接收任务消息前调用,receive和receiveAndReply分别用来接收另一个RpcEndpoint(也可以是本身)send和ask过来的消息。
*/
private[spark] trait RpcEndpoint{
/**
* The [[RpcEnv]] that this [[RpcEndpoint]] is registered to.
*/
val rpcEnv:RpcEnv
/**
* 当“onStart”被调用时,“self”将会变得有效。当“onStop”被调用时,“self”将变为“null”。
*/
final def self: RpcEndpointRef ={
require(rpcEnv != null, "rpcEnvhas not been initialized")
rpcEnv.endpointRef(this)
}
/**
* Process messages from [[RpcEndpointRef.send]] or [[RpcCallContext.reply)]]. Ifreceiving a
* unmatched message, [[SparkException]] willbe thrown and sent to `onError`.
*
*
* 处理RpcEndpointRef.send或RpcCallContext.reply方法,如果收到不匹配的消息,将抛出SparkException
* 处理一个Ref调用send或者reply发送过过来的消息
*
*/
def receive: PartialFunction[Any, Unit] = {
case _=> throw new SparkException(self + " does not implement 'receive'")
}
/**
* Process messages from [[RpcEndpointRef.ask]]. Ifreceiving a unmatched message,
* [[SparkException]] will be thrown and sent to `onError`.
* 处理从RpcEndpointRef.ask和RpcEndpointRef.askWithRetry过来的信息
*/
def receiveAndReply(context: RpcCallContext):PartialFunction[Any, Unit] = {
case _=> context.sendFailure(new SparkException(self + " won't reply anything"))
}
/**
* Invoked when any exception is thrownduring handling messages.
*/
def onError(cause: Throwable): Unit = {
// By default, throw e and let RpcEnv handle it
throw cause
}
/**
* Invoked when `remoteAddress` isconnected to the current node.
* 当远程地址连接到当前的节点地址时触发
*/
def onConnected(remoteAddress: RpcAddress): Unit = {
// By default, do nothing.
}
/**
* Invoked when `remoteAddress` islost.
* 当远程地址连接断开时触发
*/
def onDisconnected(remoteAddress: RpcAddress): Unit = {
// By default, do nothing.
}
/**
* Invoked when some network errorhappens in the connection between the current node and
* `remoteAddress`.
* 当远程地址和当前节点的连接发生网络异常时触发
*/
def onNetworkError(cause: Throwable, remoteAddress: RpcAddress): Unit = {
// By default, do nothing.
}
/**
* Invoked before [[RpcEndpoint]] starts to handleany message.
*初始化构造器之后再调用start()方法
*/
def onStart(): Unit = {
// By default, do nothing.
}
/**
* Invoked when [[RpcEndpoint]] is stopping. `self` will be `null` in this method and you cannot
* use it to send or ask messages.
* 当[[RpcEndpoint]]停止时调用。“self”将在这个方法中为“null”,你不能用它来发送或询问消息。
*/
def onStop(): Unit = {
// By default, do nothing.
}
…
}
四、RpcEndpointRef源码分析,相当于ActorRef
/**
* A reference for a remote [[RpcEndpoint]]. [[RpcEndpointRef]] isthread-safe.
* RpcEndpoint => Actor用于处理信息
RpcEndpointRef => ActorRef
RpcEnv => ActorSystem
* RpcEndpointRef对应actor例子中的ActorRef,向对应的RpcEndpoint发送信息
*
*/
private[spark] abstract class RpcEndpointRef(conf: SparkConf)
extends Serializable with Logging {
private[this] val maxRetries = RpcUtils.numRetries(conf)
private[this] val retryWaitMs = RpcUtils.retryWaitMs(conf)
private[this] val defaultAskTimeout = RpcUtils.askRpcTimeout(conf)
/**
* return the address for the [[RpcEndpointRef]]
* address和name,用于对应这个RpcEndpointRef所属的RpcEndpoint。
* NettyRpcEndpointRef是其具体实现,内部使用Dispatcher、Inbox、Outbox等组件发送信息
*/
def address: RpcAddress
def name: String
/**
* Sends a one-way asynchronousmessage. Fire-and-forget semantics.
* 只发送消息和akka的!一样
*/
def send(message: Any): Unit
/**
* Send a message to the corresponding [[RpcEndpoint.receiveAndReply)]] andreturn a [[Future]] to
* receive the reply within thespecified timeout.
*
* This method only sends the messageonce and never retries.
* 发送消息同时接收回馈信息
*/
def ask[T: ClassTag](message: Any, timeout: RpcTimeout): Future[T]
/**
* Send a message to the corresponding [[RpcEndpoint.receiveAndReply)]] andreturn a [[Future]] to
* receive the reply within a defaulttimeout.
*
* This method only sends the messageonce and never retries.
*/
def ask[T: ClassTag](message: Any): Future[T] = ask(message, defaultAskTimeout)
def askWithRetry[T: ClassTag](message: Any): T =askWithRetry(message, defaultAskTimeout)