spark-core_17:自定义RpcEnv模拟Master和Worker通信及RpcEnv、RpcEndpoint、RpcEndpointRef源码分析

29 篇文章 4 订阅

RpcEnv底层就是Netty实现

一、自定义RpcEnv模拟Master和Worker进行通信

1、自定义的RpcEndPoint,需要实现RpcEndPoint

//自定义的RPC实现ThreadSafeRpcEndpoint,说明它是线程安全的
class TestMathEndpoint(override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint {
 
//如果不需要回复信息,需要重写父类RpcEndpoint的receive方法
  //如果要给发件人回信息,需要重写父类RpcEndpoint的receiveAndReply方法,该方法会注入发件人的引用即RpcCallContext,回复信息直接使用RpcCallContext.reply(..)
  //偏函数里可以直接放模式,偏函数的第1个泛型参数是给case接收的,第2个泛型参数表示case的返回值

  override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] ={
   
case TestAdd(a: Int, b: Int) =>
     
System.out.println("receive TestAdd:" + a + ", b :" + b);
     
context.reply((a + b))
   
case TestSub(a: Int, b: Int) =>
     
System.out.println("receive TestSub:" + a + ", b :" + b);
     
context.reply((a - b))
 
}
}

2,使用case class进行信息交互

case class TestAdd(a: Int, b: Int)

case class TestSub(a: Int, b: Int)

3,使用业务封装类进行发送信息

import org.apache.spark.rpc._
import org.apache.spark.util.{ RpcUtils, ThreadUtils}
//使用MathMaster做为RpcEndpointRef业务逻辑类,类似spring的service:
// 将RpcEndpointRef引用放进来,写业务逻辑,然后将业务的结果给RpcEndpointRef

class MathMaster(var driverEndpoint:RpcEndpointRef) {

 
def testAdd(a: Int, b: Int): Int = {
    //接收RpcEndpointRef回复信息
    driverEndpoint.askWithRetry[Int](TestAdd(a, b))
 
}

  def testSub(a: Int, b: Int): Int = {
   
driverEndpoint.askWithRetry[Int](TestSub(a, b))
 
}
}

object MathMaster {
 
val DRIVER_ENDPOINT_NAME= "TestMathMaster"
}

4,创建一个Server,生成RpcEnv,并将TestMathEndpoint放在RpcEnv中,然后调用业务类

import org.apache.spark.rpc._
import org.apache.spark._
import org.apache.spark.util.{ RpcUtils, ThreadUtils}

/**
  * 1,所有的RpcEndpoint都需要注册到RpcEnv实例对象中(注册的时候会指定注册的名称,
  * 这样客户端就可以通过名称查询到RpcEndpoint的Ref的引用,进而进行通信),在RpcEndpoint接收到消息后会receive方法进行处理;
  *
  *2,RpcEndpoint如果接收到需要reply的消息的话就会交给自己的receiveAndReply来处理
  * (回复时候是通过RpcCallContext中的reply方法来回复发送者的),如果不需要reply的话就交给receive方法来处理;
  */

object MathServer {
 
def main(args: Array[String]) ={
   
val conf= new SparkConf();
   
val systemName = "sparkDriver"
   
val hostname= "127.0.0.1"
   
val port= 4040;
   
//得到一个RpcEnv容器需要使用RpcEnv.create,最后一个参数是否是client端,false表示在server端
    val securityManager= new SecurityManager(conf)
   
val rpcEnv= RpcEnv.create(systemName, hostname, port, conf, securityManager, false)

    //RpcEndpointRef是放到RpcEnv容器中的。注册RpcEndpointRef到RpcEnv中:rpcEnv.setupEndpoint("rpcEndPoint的名称",ThreadSafeRpcEndpoint实例类):

    val testMathMaster= new MathMaster(rpcEnv.setupEndpoint(MathMaster.DRIVER_ENDPOINT_NAME , new TestMathEndpoint(rpcEnv)))
  
  //ref.testMaster("houzhizhen")
    //结果信息是通过RpcEndpointRef.askWithRetry[Int](case class)得到的

    val result= testMathMaster.testAdd( 3 , 4 )
   
println( "the result of test: " + result)
   
rpcEnv.awaitTermination()
  }
}

5,创建一个client和server进行交互

import org.apache.spark.rpc._
import org.apache.spark._

object MathClient {
 
def main(args: Array[String]) ={
   
val conf= new SparkConf();
   
//println("TestRpcEnv");
   
val isDriver= false
   
val
systemName = "sparkDriver"
   
val hostname= "127.0.0.1"
   
val port= 4040;
   
val isLocal = true;
   
//得到一个RpcEnv容器需要使用RpcEnv.create,最后一个参数是否是client端,false表示在server端
    val securityManager= new SecurityManager(conf)
   
val rpcEnv= RpcEnv.create(systemName, hostname, port, conf, securityManager, clientMode= true)
   
//客户端从RpcEnv容器中取RpcEndpointRef。rpcEnv.setupEndpointRef("system名称",RpcAddress:表示远程的RpcEndpointRef的地址,Host + Port,"ref的名称"):
    val testMathMaster= new MathMaster(rpcEnv.setupEndpointRef(systemName,RpcAddress(hostname, 4040), TestMathMaster.DRIVER_ENDPOINT_NAME))
   
val result= testMathMaster.testAdd(1, 2)
   
println("the result of test: " + result)
   
rpcEnv.awaitTermination()
  }
}

=====》启动server打印信息:

log4j:WARNNo appenders could be found for logger(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).

log4j:WARNPlease initialize the log4j system properly.

log4j:WARNSee http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

UsingSpark's default log4j profile: org/apache/spark/log4j-defaults.properties

18/04/1619:53:19 INFO SecurityManager: Changing view acls to: luyl

18/04/1619:53:19 INFO SecurityManager: Changing modify acls to: luyl

18/04/1619:53:19 INFO SecurityManager: SecurityManager: authentication disabled; uiacls disabled; users with view permissions: Set(luyl); users with modifypermissions: Set(luyl)

18/04/1619:53:20 INFO Utils: Successfully started service 'sparkDriver' on port 4040.

receiveTestAdd:3, b :4

theresult of test: 7

receiveTestAdd:1, b :2

=====》启动client打印:

 

log4j:WARNNo appenders could be found for logger(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).

log4j:WARNPlease initialize the log4j system properly.

log4j:WARNSee http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

UsingSpark's default log4j profile: org/apache/spark/log4j-defaults.properties

18/04/1619:53:26 INFO SecurityManager: Changing view acls to: luyl

18/04/1619:53:26 INFO SecurityManager: Changing modify acls to: luyl

18/04/1619:53:26 INFO SecurityManager: SecurityManager: authentication disabled; uiacls disabled; users with view permissions: Set(luyl); users with modifypermissions: Set(luyl)

theresult of test: 3

 

二、RpcEnv进行源码分析,RpcEnv相当于ActorSystem

 

/**
 * A RpcEnv implementation must have a
[[RpcEnvFactory]]
implementationwith an empty constructor
 * so that it can be created viaReflection.
  * RpcEndpoint => Actor用于处理信息
    RpcEndpointRef => ActorRef
    RpcEnv => ActorSystem
  * 1,RpcEnv负责RpcEndpoint整个生命周期的管理,(相当于Akka中的ActorSystem)注册并维护RpcEndpoint和RpcEndpointRef
  *包括:注册endpoint(实现并调用setupEndpoint),endpoint之间消息的路由,以及停止endpoint。
  * 2,所有的RpcEndpoint都需要注册到RpcEnv实例对象中,在RpcEndpoint接收到消息后会receive方法进行处理;
  *如果要回复信息使用receiveAndReply来处理,(回复时候是通过RpcCallContext中的reply方法来回复发送者的)
  *
  *
  * 备:在 Spark 中具体的 rpc 对象:worker节点是一个endpoint, master 节点也是一个 endpoint,
  * 包括其它的 driver,application, executorbackend
 */

private[spark] object RpcEnv{
 
//通过反射的方式将RpcEnvFactory具体的实例反射出来,有两种akka或netty,没有设置spar.rpc的信息就会使用netty的RPC

  private def getRpcEnvFactory(conf:SparkConf): RpcEnvFactory = {
   
val rpcEnvNames= Map(
     
"akka" -> "org.apache.spark.rpc.akka.AkkaRpcEnvFactory",
     
"netty" -> "org.apache.spark.rpc.netty.NettyRpcEnvFactory")
   
val rpcEnvName= conf.get("spark.rpc", "netty") //默认是netty
   
val rpcEnvFactoryClassName= rpcEnvNames.getOrElse(rpcEnvName.toLowerCase, rpcEnvName)
   
Utils.classForName(rpcEnvFactoryClassName).newInstance().asInstanceOf[RpcEnvFactory]
  }
  //如果port是0话,会给RpcEnv.address.prot动态分配一个非0的端口
  def create(
     
name: String,
     
host: String,
     
port: Int,
     
conf: SparkConf,
     
securityManager: SecurityManager,
     
clientMode: Boolean = false): RpcEnv = {
   
// Using Reflection to create the RpcEnv to avoid todepend on Akka directly
    //1,通过工厂RpcEnvFactory来产生一个RpcEnv,而NettyRpcEnvFactory用来生成NettyRpcEnv的一个对象
    //2,当我们调用RpcEnv中的setupEndpoint来注册一个endpoint到rpcEnv的时候,在NettyRpcEnv内部,会将该endpoint的名称与其本省的映射关系,
    // rpcEndpoint与rpcEndpointRef之间映射关系保存在dispatcher对应的成员变量中

    val config= RpcEnvConfig(conf, name, host, port, securityManager, clientMode)
 
   //RpcEnvFactory是负责创建RpcEnv的,通过create方法创建RpcEnv实例对象,默认是用的Netty
    getRpcEnvFactory(conf).create(config)
 
}
}
。。

private[spark] abstract class RpcEnv(conf: SparkConf) {

 
private[spark]val defaultLookupTimeout = RpcUtils.lookupRpcTimeout(conf)

 
/**
   * Return RpcEndpointRef of theregistered
[[RpcEndpoint]].Will be used to implement
   *
[[RpcEndpoint.self]]. Return `null` if the corresponding [[RpcEndpointRef]] does not exist.
    * 返回注册
[[RpcEndpoint]]的RpcEndpointRef。将用于实现[[RpcEndpoint.self]]
    * 如果对应的
[[RpcEndpointRef]]
不存在,则返回“null”。
    *
    * 根据RpcEndpoint返回RpcEndpointRef,如果RpcEndpointRef不存在,将返回null。
   */

 
private[rpc]def endpointRef(endpoint:RpcEndpoint): RpcEndpointRef

 
/**
   * Return the address that
[[RpcEnv]]
is listening to.
   */

 
def address: RpcAddress

 
/**
   * Register a
[[RpcEndpoint]] with a name andreturn its [[RpcEndpointRef]]. [[RpcEnv]] does not
   * guarantee thread-safety.
    * 使用一个name名称注册一个
[[RpcEndpoint]],并返回其[[RpcEndpointRef]][[RpcEnv]]
没有保证线程安全。
    *
    * akka或netty都需要使用这个方法将Rpc的远程对象注册到Rpc容器中。
    * 1,用法上对应例子中ActorSystem的actorOf方法,内部使用Dispatcher维护注册的RpcEndpoint,
    * 也提供了多种获取RpcEndpointRef的方法,如asyncSetupEndpointRefByURI、setupEndpointRefByURI和setupEndpointRef,
    * 以及移除RpcEndpoint的方法stop,关闭RpcEnv的方法shutdown,其还维护了RpcEnvFileServer,用于上传jar和file。
   */

 
def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef

 
/**
   * Retrieve the
[[RpcEndpointRef]] representedby `uri`
asynchronously.
    * (1)通过url异步获取RpcEndpointRef
   */

 
def asyncSetupEndpointRefByURI(uri: String):Future[RpcEndpointRef]

 
/**
   * Retrieve the
[[RpcEndpointRef]] representedby `uri`
.This is a blocking action.
   */

 
def setupEndpointRefByURI(uri: String):RpcEndpointRef = {
   
defaultLookupTimeout.awaitResult(asyncSetupEndpointRefByURI(uri))
 
}

  /**
   * Retrieve the
[[RpcEndpointRef]] representedby `systemName`, `address` and `endpointName`
.
   * This is a blocking action.
   */

 
def setupEndpointRef(
     
systemName: String, address:RpcAddress, endpointName: String):RpcEndpointRef = {
   
setupEndpointRefByURI(uriOf(systemName, address, endpointName))
 
}

  /**
   * Stop
[[RpcEndpoint]] specified by `endpoint`.
    *
    * 停止
[[RpcEndpoint]]根据指定的`endpoint`
   
* 根据RpcEndpointRef停止RpcEndpoint
   */

 
def stop(endpoint: RpcEndpointRef): Unit

 
/**
   * Shutdown this
[[RpcEnv]] asynchronously. If need to make sure [[RpcEnv]] exits successfully,
   * call
[[awaitTermination()]] straight after [[shutdown()]]
.
   */

 
def shutdown(): Unit

 
/**
   * Wait until
[[RpcEnv]]
exits.
   *
   * TODO do we need a timeout parameter?
    * 等待直到RpcEnv退出
   */

 
def awaitTermination(): Unit
。。。。
 
  /**
   *
[[RpcEndpointRef]] cannot be deserialized without [[RpcEnv]]. So when deserializing any object
   * that contains
[[RpcEndpointRef]]s,the deserialization codes should be wrapped by this method.
    *
    *在
[[RpcEnv]] 之外不能反系列化 [[RpcEndpointRef]]
   
* 所以RpcEndpointRef需要RpcEnv来反序列化,当反序列化RpcEndpointRefs的object时,需要通过该方法来操作
   */

 
def deserialize[T](deserializationAction: () => T): T

 
/**
   * Return the instance of the fileserver used to serve files. This may be
`null`
if the
   * RpcEnv is not operating in servermode.
    * 打开从给定URI下载文件的通道。如果由RpcEnvFileServer返回的uri使用“spark”方案,则Utils类将调用该方法来检索文件。
   */

 
def fileServer: RpcEnvFileServer
 
。。。
}

/**
 * A server used by the RpcEnv to serverfiles to other processes owned by the application.
 * The file server can return URIshandled by common libraries (such as "http" or "hdfs"), or  it can return "spark" URIs whichwill be handled by
`RpcEnv#fetchFile`
.
 */

private[spark] trait RpcEnvFileServer{

 
/**
   * Adds a file to be served by thisRpcEnv. This is used to serve files from the driver
   * to executors when they're stored onthe driver's local file system.
   * 用RpcEnv保存增加的文件,当文件放在driver的本地文件系统中时,executors会去driver中取文件
    * 也就是说,spark-submit提交的jar文件还是在driver节点中,只jettyServer提供一个文件服务而以
   * @param file Local file toserve.
   * @return A URI for thelocation of the file.
   */

 
def addFile(file: File): String

 
/**
   * Adds a jar to be served by thisRpcEnv. Similar to
`addFile` butfor jars added using
   *
`SparkContext.addJar`
.
   *
   * @param file Local file toserve.
   * @return A URI for thelocation of the file.
   */

 
def addJar(file: File): String

}
。。

三、RpcEndpoint源码分析,相当于Actor

/**
 * An end point for the RPC that defineswhat functions to trigger given a message.
 *
 * It is guaranteed that
`onStart`, `receive` and `onStop` willbe called in sequence.
 *
 * The life-cycle of an endpoint is:
 *
 * constructor -> onStart ->receive* -> onStop
 *
 * Note:
`receive` can be called concurrently. If you want `receive` to be thread-safe, please use
 *
[[ThreadSafeRpcEndpoint]]
 *
 * If any error is thrown from one of
[[RpcEndpoint]] methods except `onError`, `onError` will be
 * invoked with the cause. If
`onError` throws an error, [[RpcEnv]]
will ignore it.
  * sparkRPC是基于actor来封装的;
  * RpcEndpoint => Actor用于处理信息
    RpcEndpointRef => ActorRef
    RpcEnv => ActorSystem
    1,如果继承RpcEndpoint表明可以并发的调用该服务,如果继承自ThreadSafeRpcEndpoint则表明该Endpoint不允许并发
  * 2,RpcEndpoint:表示一个个需要通信的个体(如master,worker,driver),主要根据接收的消息来进行对应的处理。
  * 3,一个RpcEndpoint生命周期:构建->onStart→receive→onStop。
  * 4,其中onStart在接收任务消息前调用,receive和receiveAndReply分别用来接收另一个RpcEndpoint(也可以是本身)send和ask过来的消息。
 */

private[spark] trait RpcEndpoint{

 
/**
   * The
[[RpcEnv]] that this [[RpcEndpoint]]
is registered to.
   */

 
val rpcEnv:RpcEnv

 
/**
    * 当“onStart”被调用时,“self”将会变得有效。当“onStop”被调用时,“self”将变为“null”。
   */

 
final def self: RpcEndpointRef ={
   
require(rpcEnv != null, "rpcEnvhas not been initialized")
   
rpcEnv.endpointRef(this)
 
}

  /**
   * Process messages from
[[RpcEndpointRef.send]] or [[RpcCallContext.reply)]]. Ifreceiving a
   * unmatched message,
[[SparkException]] willbe thrown and sent to `onError`
.
    *
    *
    * 处理RpcEndpointRef.send或RpcCallContext.reply方法,如果收到不匹配的消息,将抛出SparkException
    * 处理一个Ref调用send或者reply发送过过来的消息
    *
   */

 
def receive: PartialFunction[Any, Unit] = {
   
case _=> throw new SparkException(self + " does not implement 'receive'")
 
}

  /**
   * Process messages from
[[RpcEndpointRef.ask]]. Ifreceiving a unmatched message,
   *
[[SparkException]] will be thrown and sent to `onError`
.
    * 处理从RpcEndpointRef.ask和RpcEndpointRef.askWithRetry过来的信息
   */

 
def receiveAndReply(context: RpcCallContext):PartialFunction[Any, Unit] = {
   
case _=> context.sendFailure(new SparkException(self + " won't reply anything"))
 
}

  /**
   * Invoked when any exception is thrownduring handling messages.
   */

 
def onError(cause: Throwable): Unit = {
   
// By default, throw e and let RpcEnv handle it
   
throw cause
 
}

  /**
   * Invoked when
`remoteAddress`
isconnected to the current node.
    * 当远程地址连接到当前的节点地址时触发
   */

 
def onConnected(remoteAddress: RpcAddress): Unit = {
   
// By default, do nothing.
 
}

 
/**
   * Invoked when
`remoteAddress`
islost.
    * 当远程地址连接断开时触发
   */

 
def onDisconnected(remoteAddress: RpcAddress): Unit = {
   
// By default, do nothing.
 
}

 
/**
   * Invoked when some network errorhappens in the connection between the current node and
   *
`remoteAddress`
.
    * 当远程地址和当前节点的连接发生网络异常时触发
   */

 
def onNetworkError(cause: Throwable, remoteAddress: RpcAddress): Unit = {
   
// By default, do nothing.
 
}

 
/**
   * Invoked before
[[RpcEndpoint]]
starts to handleany message.
    *初始化构造器之后再调用start()方法
   */

 
def onStart(): Unit = {
   
// By default, do nothing.
 
}

 
/**
   * Invoked when
[[RpcEndpoint]] is stopping. `self` will be `null` in this method and you cannot
   * use it to send or ask messages.
    * 当
[[RpcEndpoint]]
停止时调用。“self”将在这个方法中为“null”,你不能用它来发送或询问消息。
   */

 
def onStop(): Unit = {
   
// By default, do nothing.
 
}
 
 
}

四、RpcEndpointRef源码分析,相当于ActorRef

/**
 * A reference for a remote
[[RpcEndpoint]]. [[RpcEndpointRef]]
isthread-safe.
  * RpcEndpoint => Actor用于处理信息
    RpcEndpointRef => ActorRef
    RpcEnv => ActorSystem
  * RpcEndpointRef对应actor例子中的ActorRef,向对应的RpcEndpoint发送信息
  *
 */

private[spark] abstract class RpcEndpointRef(conf: SparkConf)
 
extends Serializable with Logging {

 
private[this] val maxRetries = RpcUtils.numRetries(conf)
 
private[this] val retryWaitMs = RpcUtils.retryWaitMs(conf)
 
private[this] val defaultAskTimeout = RpcUtils.askRpcTimeout(conf)

 
/**
   * return the address for the
[[RpcEndpointRef]]
   
* address和name,用于对应这个RpcEndpointRef所属的RpcEndpoint。
    * NettyRpcEndpointRef是其具体实现,内部使用Dispatcher、Inbox、Outbox等组件发送信息
   */

 
def address: RpcAddress

 
def name: String

 
/**
   * Sends a one-way asynchronousmessage. Fire-and-forget semantics.
    * 只发送消息和akka的!一样
   */

 
def send(message: Any): Unit

  /**
   * Send a message to the corresponding
[[RpcEndpoint.receiveAndReply)]] andreturn a [[Future]]
to
   * receive the reply within thespecified timeout.
   *
   * This method only sends the messageonce and never retries.
    * 发送消息同时接收回馈信息
   */

 
def ask[T: ClassTag](message: Any, timeout: RpcTimeout): Future[T]

 
/**
   * Send a message to the corresponding
[[RpcEndpoint.receiveAndReply)]] andreturn a [[Future]]
to
   * receive the reply within a defaulttimeout.
   *
   * This method only sends the messageonce and never retries.
   */

 
def ask[T: ClassTag](message: Any): Future[T] = ask(message, defaultAskTimeout)

 
  def askWithRetry[T: ClassTag](message: Any): T =askWithRetry(message, defaultAskTimeout)


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值