阅读代码的版本为2.0.0-SNAPSHOT
。在最新的Spark源码中,RPC统一采用了Netty作为传输框架。主要由RpcEnv
,RpcEndpoint
和RpcEndpointRef
组成,他们之间的关系图如下:
RpcEnv
必须由RpcEnvFactory
实现,RpcEnv
可以理解一个容器,所有的RpcEndpoint
都必须像RpcEnv
注册,并得到对应的RpcEndpointRef
,这样就可以由RpcEndpointRef
发送消息到RpcEnv
,由RpcEnv
找到对应的RpcEndpoint
并对收到的消息进行处理和反馈。接下来简单介绍下这三个类:
RpcEnv
RpcEnv
的类的注释如下:
An RPC environment. RpcEndpoints need to register itself will process messages sent from RpcEndpointRef or remote nodes, and deliver them to corresponding RpcEndpoints. For uncaught exceptions caught by RpcEnv, RpcEnv will use RpcCallContext.sendFailure to send exceptions back to the sender, or logging them if no such sender or to retrieve RpcEndpointRefs given name or uri.
大致意思就是上面我写的那段话,对于出现了错误,RpcEnv
会将错误发回到Sender或者日志记录。
注册方法:
/**
Register a RpcEndpoint with a name and return its RpcEndpointRef. RpcEnv
does not guarantee thread-safety.
*/
def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef
注册RpcEndpoint
到RpcEnv
,并返回对应的RpcEndpointRef
.
类中还有一些获得对应RpcEndpointRef
的方法。
RpcEndpoint
RpcEndpoint
类的注释:
An end point for the RPC that defines what functions to trigger given a message. It is guaranteed that onStart
, receive
and onStop
will be called in sequence. The life-cycle of an endpoint is: constructor -> onStart -> receive -> onStop
Note: receive
can be called concurrently. If you want receive
to be thread-safe, please use ThreadSafeRpcEndpoint. If any error is thrown from one of RpcEndpoint methods except onError
, onError
will be invoked with the cause. If onError
throws an error, RpcEnv will ignore it.
RpcEndpoint
定义了一系列对接受到的消息做出反应的方法.RpcEndpoint
的内部的生命周期是constructor -> onStart -> receive -> onStop
。主要的处理逻辑是根据在receive
收到消息,然后调用相应的处理方法。
receive
可以并发访问,Spark提供了ThreadSafeRpcEndpoint
线程安全的版本。
RpcEndpointRef
RpcEndpointRef
类的注释:
A reference for a remote RpcEndpoint. RpcEndpointRef is thread-safe.
可以理解为是RpcEndpoint
的远程引用,内部的方法主要是发送消息的一些方法。