2021SC@SDUSC
一、RPC简介
作为一个分布式系统,HBase的设计是典型的master-salve架构,HBase中主要有Master,RegionServer,Client三个角色,而RPC是Master、RegionServer和Client三者之间通信交流的纽带。
Client
Client有很多,比如hbase shell,java client API等,client没有提供任何RPC服务,只是调用RegionServer或Master提供的服务。
Master
Master主要实现了MasterService和RegionServerStatus协议,分别供Client和RegionServer调用。
MaterService
MasterService主要定义了获取集群状态,以及获取表的元信息,添加/删除列,assign region,enable/disable table,负载均衡等DML相关的一些服务。而Master提供了对这些服务的实现,并且供客户端去调用。比如,当我们在hbase shell中运行enable/disable table等相关的命令时,client会首先将该RPC请求发送到Master。
RegionServerStatus
RegionServerStatus主要定义了RegionServer向Master汇报集群状态,RegionServer启动向Master发送RPC请求等相关的服务,而Master根据这些RPC请求信息,可以了解整个集群中RegionServer的状态。
ReginServer
RegionServer主要实现了AdminService和ClientService协议,供client端调用。而与此同时,ReginServer也会调用RegionServerStatus服务,将相关信息汇报给master。
AdminService
AmdinService主要定义了获取table Regin信息,操作region(Open,Flush,Split,Compact, Merge等)相关服务。
ClientService
ClientService主要定了获取数据,更新,添加数据,Scan等相关的服务。
二、HBase中RPC概况
RPC(remote procedure call)即远程过程调用。对于本地调用,定义好一个函数以后,程序的其他部分通过调用该函数,就可以返回想要的结果。而RPC唯一的区别就是函数定义和函数调用通常位于不同的机器,因为涉及到不同的机器,所以RPC相比较本地函数调用多了通信部分,主要涉及到两个角色调用方(Client端)和函数定义实现(Server端)。
Server端RPC实现
1.RPC初始化
public HRegionServer(final Configuration conf) throws IOException {
super("RegionServer"); // thread name
TraceUtil.initTracer(conf);
//...
rpcServices = createRpcServices();
//...
在HRegionServer启动类的源码中,有如上代码,进行对RPCServer的初始化。
private void preRegistrationInitialization() {
//...
this.rpcClient = RpcClientFactory.createClient(conf, clusterId, new InetSocketAddress(
this.rpcServices.isa.getAddress(), 0), clusterConnection.getConnectionMetrics());
//...
}
在HRegionServer启动类的源码中,有如上代码,进行对RpcClient的初始化。
public MasterRpcServices(HMaster m) throws IOException {
super(m);
master = m;
}
建立HMaster,MasterRpcServices的构造方法调用父类RSRpcServices的构造方法
RSRpcServices(final HRegionServer rs, final LogDelegate ld) throws IOException {
final Configuration conf = rs.getConfiguration();
this.ld = ld;
regionServer = rs;
//...
final RpcSchedulerFactory rpcSchedulerFactory;
try {
rpcSchedulerFactory = getRpcSchedulerFactoryClass().asSubclass(RpcSchedulerFactory.class)
.getDeclaredConstructor().newInstance();
} catch (NoSuchMethodException |
// Creation of a HSA will force a resolve.
initialIsa = new InetSocketAddress(hostname, port);
bindAddress =
new InetSocketAddress(conf.get("hbase.regionserver.ipc.address", hostname), port);
}
//...
priority = createPriority();
//...
ConnectionUtils.setServerSideHConnectionRetriesConfig(conf, name, LOG);
rpcServer = createRpcServer(rs, rpcSchedulerFactory, bindAddress, name);
rpcServer.setRsRpcServices(this);
//...
通过regionServer=rs
设定好HRegionServer;
接下来RpcSchedulerFactory rpcSchedulerFactory=...
初始化RpcSchedurFactory,反射hbase.region.server.rpc.scheduler.factory.class指定的类,并默认使用SimpleRpcSchedulerFactory;
在调度请求时,使用priority=createPriority()
优先级函数,并通过ConnectionUtils.setServerSideHConnectionRetiresConfig
设置访问时的重试次数
public SimpleRpcServer(final Server server, final String name,
final List<BlockingServiceAndInterface> services,
final InetSocketAddress bindAddress, Configuration conf,
RpcScheduler scheduler, boolean reservoirEnabled) throws IOException {
super(server, name, services, bindAddress, conf, scheduler, reservoirEnabled);
this.socketSendBufferSize = 0;
this.readThreads = conf.getInt("hbase.ipc.server.read.threadpool.size", 10);
this.purgeTimeout = conf.getLong("hbase.ipc.client.call.purge.timeout",
2 * HConstants.DEFAULT_HBASE_RPC_TIMEOUT);
// Start the listener here and let it bind to the port
listener = new Listener(name);
this.port = listener.getAddress().getPort();
// Create the responder here
responder = new SimpleRpcServerResponder(this);
connectionManager = new ConnectionManager();
initReconfigurable(conf);
this.scheduler.init(new RpcSchedulerContext(this));
}
该构造函数为RpcServer实现的RpcServerInterface接口,进行Listener、Responder的设置,同时调用方通过RpcSchedulerFactory创建,设置Scheduler。
2、Listener
Listener负责监听请求,对于获取到的请求,交由reader负责读取。
private class Listener extends Thread {
private ServerSocketChannel acceptChannel = null; //the accept channel
private Selector selector = null; //the selector that we use for the server
private Reader[] readers = null;
private int currentReader = 0;
private final int readerPendingConnectionQueueLength;
private ExecutorService readPool;
public Listener(final String name) throws IOException {
super(name);
//...
acceptChannel = ServerSocketChannel.open();
acceptChannel.configureBlocking(false);
// Bind the server socket to the binding addrees (can be different from the default interface)
bind(acceptChannel.socket(), bindAddress, backlogLength);
port = acceptChannel.socket().getLocalPort(); //Could be an ephemeral port
address = (InetSocketAddress)acceptChannel.socket().getLocalSocketAddress();
// create a selector;
selector = Selector.open();
readers = new Reader[readThreads];
readPool = Executors.newFixedThreadPool(readThreads,
new ThreadFactoryBuilder().setNameFormat(
"Reader=%d,bindAddress=" + bindAddress.getHostName() +
",port=" + port).setDaemon(true)
.setUncaughtExceptionHandler(Threads.LOGGING_EXCEPTION_HANDLER).build());
for (int i = 0; i < readThreads; ++i) {
Reader reader = new Reader();
readers[i] = reader;
readPool.execute(reader);
}
//...
acceptChannel.register(selector, SelectionKey.OP_ACCEPT);
this.setName("Listener,port=" + port);
this.setDaemon(true);
}
通过acceptChannel=ServerSocketChannel.open()
创建非阻塞的ServerSocketChannel;
bind(acceptChannel.socket(),bindAddress,backlogLength)
绑定Socket到RpcServer#bingAddress;
接下来创建selector,并readers=new Reader[readThreads]
初始化Reader ThreadPool,并完成selector的注册。
Listener监听到OP_ACCEPT,doAccept方法就是选择一个reader,注册accept返回的channel的OP_READ事件,并且构造Connection对象,以便read获取。
3、Reader
处理请求的逻辑在Reader中,生成Call对象交由RPCSchedule进行分发。
private class Reader implements Runnable {
final private LinkedBlockingQueue<SimpleServerRpcConnection> pendingConnections;
private final Selector readSelector;
public void run() {
try {
doRunLoop();
} finally {
try {
readSelector.close();
} catch (IOException ioe) {
LOG.error(getName() + ": error closing read selector in " + getName(), ioe);
}
}
}
private synchronized void doRunLoop() {
while (running) {
try {
int size = pendingConnections.size();
for (int i=size; i>0; i--) {
SimpleServerRpcConnection conn = pendingConnections.take();
conn.channel.register(readSelector, SelectionKey.OP_READ, conn);
}
readSelector.select();
Iterator<SelectionKey> iter = readSelector.selectedKeys().iterator();
while (iter.hasNext()) {
SelectionKey key = iter.next();
iter.remove();
if (key.isValid()) {
if (key.isReadable()) {
doRead(key);
}
}
key = null;
}
}
}
}
在synchronized中,如果线程发生阻塞,则知道有请求到来;若请求有效,则doRead(key)进行后续处理。
void doRead(SelectionKey key) throws InterruptedException {
int count;
SimpleServerRpcConnection c = (SimpleServerRpcConnection) key.attachment();
try {
count = c.readAndProcess();
} catch (InterruptedException ieo) {
LOG.info(Thread.currentThread().getName() + ": readAndProcess caught InterruptedException", ieo);
throw ieo;
} catch (Exception e) {
//...
}
}
if (!this.rpcServer.scheduler.dispatch(new CallRunner(this.rpcServer, call))) {
this.rpcServer.callQueueSizeInBytes.add(-1 * call.getSize());
//...
call.sendResponseIfReady();
}
}
reader监听OP_READ,doRead()
方法在Listener中,有Connection对象处理,生成Call,并包装为CallRunner交给Scheduler。从连接中读取字节流数据,并且处理请求(构造RequestHeader、方法、参数、构造callRunner对象,交由调度器分发)。
4、Scheduler
Scheduler 是一个生产者消费者模型,内部有一个队列缓存请求,另外有一些线程负责从队列中拉取消息进行分发。
public class SimpleRpcScheduler extends RpcScheduler implements ConfigurationObserver {
private int port;
private final PriorityFunction priority;
private final RpcExecutor callExecutor;
private final RpcExecutor priorityExecutor;
private final RpcExecutor replicationExecutor;
在server实现过程中,hbase rpc实现了两种调度器,分别为FifoRPCScheduler和SimpleRpcScheduler。
FifoRPCScheduler会直接将CallRunner对象放到线程池中去执行,而simpleRPCScheduler会分成三种不同的executor,对于不同的请求,使用不同的executor去执行。
Scheduler 默认实现为 SimpleRpcScheduler,包含三个 RpcExecutor(callExecutor、priorityExecutor、replicationExecutor),对于不同的请求,使用不同的executor去执行
if (null != callExecutor) {
queueName = "Call Queue";
callQueueInfo.setCallMethodCount(queueName, callExecutor.getCallQueueCountsSummary());
callQueueInfo.setCallMethodSize(queueName, callExecutor.getCallQueueSizeSummary());
}
if (null != priorityExecutor) {
queueName = "Priority Queue";
callQueueInfo.setCallMethodCount(queueName, priorityExecutor.getCallQueueCountsSummary());
callQueueInfo.setCallMethodSize(queueName, priorityExecutor.getCallQueueSizeSummary());
}
if (null != replicationExecutor) {
queueName = "Replication Queue";
callQueueInfo.setCallMethodCount(queueName, replicationExecutor.getCallQueueCountsSummary());
callQueueInfo.setCallMethodSize(queueName, replicationExecutor.getCallQueueSizeSummary());
}
选择不同的Executor处理分发请求,大部分基于的请求都是通过callExecutor来执行。
protected List<BlockingQueue<CallRunner>> getQueues() {
return queues;
}
protected void startHandlers(final int port) {
List<BlockingQueue<CallRunner>> callQueues = getQueues();
startHandlers(null, handlerCount, callQueues, 0, callQueues.size(), port, activeHandlerCount);
}
RpcExecutor的实现类RWQueueRpcExecutor使用阻塞队列缓存消息。进过scheduler调度后,CallRunner调用RpcServer中的Call方法,callBlockingMethod调用server实现该服务的函数,并且返回结果result。
5、Responder
Resonder 负责发送 RPC 请求结果给 Client,Scheduler 调度请求后,执行结果通过 doRespond() 加入到返回结果的相应队列里面。
void doRespond(SimpleServerRpcConnection conn, RpcResponse resp) throws IOException {
if (conn.responseQueue.isEmpty() && conn.responseWriteLock.tryLock()) {
try {
if (conn.responseQueue.isEmpty()) {
// If we're alone, we can try to do a direct call to the socket. It's
// an optimization to save on context switches and data transfer between cores..
if (processResponse(conn, resp)) {
return; // we're done.
}
// Too big to fit, putting ahead.
conn.responseQueue.addFirst(resp);
added = true; // We will register to the selector later, outside of the lock.
}
} finally {
conn.responseWriteLock.unlock();
}
}
if (conn.responseQueue.isEmpty() && conn.responseWriteLock.tryLock()) {
try {
if (conn.responseQueue.isEmpty()) {
if (processResponse(conn, resp)) {
return; // we're done.
}
conn.responseQueue.addFirst(resp);
added = true; // We will register to the selector later, outside of the lock.
}
} finally {
conn.responseWriteLock.unlock();
}
}
if (!added) {
conn.responseQueue.addLast(resp);
}
registerForWrite(conn);
}
}
RpcServer:call返回结果后,包装返回结果response,并且通过doResponse函数将当前的Call push到responseQueue中,并且将当前connectionResponder需要写到set:writingCons中。
private void registerWrites() {
Iterator<SimpleServerRpcConnection> it = writingCons.iterator();
//...
private void doRunLoop() {
while (this.simpleRpcServer.running) {
try {
registerWrites();
int keyCt = writeSelector.select(this.simpleRpcServer.purgeTimeout);
if (keyCt == 0) {
continue;
}
Set<SelectionKey> keys = writeSelector.selectedKeys();
Iterator<SelectionKey> iter = keys.iterator();
while (iter.hasNext()) {
SelectionKey key = iter.next();
iter.remove();
try {
if (key.isValid() && key.isWritable()) {
doAsyncWrite(key);
}
} catch (IOException e) {
SimpleRpcServer.LOG.debug(getName() + ": asyncWrite", e);
}
}
//...
private boolean processAllResponses(final Connection connection) throws IOException {
connection.responseWriteLock.lock();
try {
for (int i = 0; i < 20; i++) {
Call call = connection.responseQueue.pollFirst();
if (!processResponse(call)) {
connection.responseQueue.addFirst(call);
return false;
}
}
} finally {
connection.responseWriteLock.unlock();
}
return connection.responseQueue.isEmpty();
}
}
如果在 doRespond() 中没有完成写操作,通过将 Call 对象的 connection 注册到 selector,由 Responder中的线程进行后续的操作。
总结
server端rpc实现的步骤
1.server端的实现逻辑主要封装在rpcserver对象中,在该对象中,首先要有一个Listener对象,负责监控连接请求,一旦有连接,listener会选择一个Reader,并且在新建的连接上注册OP_READ事件,封装Connection对象。
2.Reader首先从连接中读取数据,最终构造成callRunner,交由调度器调取。rpcserver对象中一般会有多个Reader对象。
3.根据调度器选择的一个callRunner对象,调用CallRunner:Run->RpcServer:Call,从而调用具体函数的实现。
4.函数返回结果后,包装成response对象,并且通过doReponse函数将当前的Call push到responseQueue中,而且将当前conResponder需要写到writingCons中。
5.注册writingCons中连接的OP_WRITE事件,从responseQueue中获取call,并且进行处理,将结果的byte流发送出去。