对于分布式系统而言,RPC毫无疑问是非常重要的,其负责机器之间的通信,而在hadoop中,相应的RPC调用更是不计其数,这里仅仅简单提供一个Hadoop IPC的一个小例子,供大家研究。
(1)继承自VersionedProtocol的接口
import org.apache.hadoop.ipc.VersionedProtocol;
public interface IPCQueryStatus extends VersionedProtocol {
IPCFileStatus getFileStatus(String fileName);
}
在hadoop内,远程调用的接口都需要继承本接口,让我们看下VersionedProtocol的源码和其中的方法,非常简单:
/**
* Superclass of all protocols that use Hadoop RPC.
* Subclasses of this interface are also supposed to have
* a static final long versionID field.
*/
public interface VersionedProtocol {
/**
* Return protocol version corresponding to protocol interface.
* @param protocol The classname of the protocol interface
* @param clientVersion The version of the protocol that the client speaks
* @return the version that the server will speak
*/
public long getProtocolVersion(String protocol,
long clientVersion) throws IOException;
}
其中就只有一个方法,getProtocolVersion,两个参数,分别是协议接口的类名,与客户端接口协议的版本,返回值则是服务器端接口的版本;这种设计考量到了客户端与服务端接口协议不一致的问题。
(2)接口的实现类
public class IPCQueryStatusImpl implements IPCQueryStatus {
/**
* @description:
* @author:
* @param:
* @return:
**/
@Override
public long getProtocolVersion(String protocol, long clientVersion)
throws IOException {
// TODO Auto-generated method stub
System.out.println("protocol:" + protocol);
System.out.println("clientVersion:" + clientVersion);
return IPCQueryServer.IPV_VER;
}
/**
* @description:
* @author:
* @param:
* @return:
**/
@Override
public IPCFileStatus getFileStatus(String fileName) {
// TODO Auto-generated method stub
return new IPCFileStatus(fileName);
}
}
这是毫无疑问的,必须有实现类才能执行方法,其中实现了VersionedProtocol中的默认方法,实现了自定义接口中的其它方法。
(3)服务端
public class IPCQueryServer {
public static final int IPC_PORT = 32121;
public static final long IPV_VER = 5473l;
public static void main(String[] args) throws IOException {
IPCQueryStatusImpl query = new IPCQueryStatusImpl();
Server server = RPC.getServer(query, "0.0.0.0", IPC_PORT,
new Configuration());
server.start();
}
}
开启客户端,能够看到,这里牵涉到了一个RPC类,非常关键,下面看下其源码,其源码非常长,单单截取出比较关键的地方来看:
/**
* Construct a server for a protocol implementation instance listening on a
* port and address.
*/
public static Server getServer(final Object instance,
final String bindAddress, final int port, Configuration conf)
throws IOException {
return getServer(instance, bindAddress, port, 1, false, conf);
}
这是我们新建服务器的代码,发现其中调用了getServer方法,一直点下去,发现调用的是这个方法:
@SuppressWarnings("unchecked")
protected Server(String bindAddress, int port,
Class<? extends Writable> paramClass, int handlerCount,
Configuration conf, String serverName, SecretManager<? extends TokenIdentifier> secretManager)
throws IOException {
this.bindAddress = bindAddress;
this.conf = conf;
this.port = port;
this.paramClass = paramClass;
this.handlerCount = handlerCount;
this.socketSendBufferSize = 0;
this.maxQueueSize = handlerCount * conf.getInt(
IPC_SERVER_HANDLER_QUEUE_SIZE_KEY,
IPC_SERVER_HANDLER_QUEUE_SIZE_DEFAULT);
this.maxRespSize = conf.getInt(IPC_SERVER_RPC_MAX_RESPONSE_SIZE_KEY,
IPC_SERVER_RPC_MAX_RESPONSE_SIZE_DEFAULT);
this.readThreads = conf.getInt(
IPC_SERVER_RPC_READ_THREADS_KEY,
IPC_SERVER_RPC_READ_THREADS_DEFAULT);
this.callQueue = new LinkedBlockingQueue<Call>(maxQueueSize);
this.maxIdleTime = 2*conf.getInt("ipc.client.connection.maxidletime", 1000);
this.maxConnectionsToNuke = conf.getInt("ipc.client.kill.max", 10);
this.thresholdIdleConnections = conf.getInt("ipc.client.idlethreshold", 4000);
this.secretManager = (SecretManager<TokenIdentifier>) secretManager;
this.authorize =
conf.getBoolean(HADOOP_SECURITY_AUTHORIZATION, false);
this.isSecurityEnabled = UserGroupInformation.isSecurityEnabled();
// Start the listener here and let it bind to the port
listener = new Listener();
this.port = listener.getAddress().getPort();
this.rpcMetrics = RpcInstrumentation.create(serverName, this.port);
this.tcpNoDelay = conf.getBoolean("ipc.server.tcpnodelay", false);
// Create the responder here
responder = new Responder();
if (isSecurityEnabled) {
SaslRpcServer.init(conf);
}
}
利用我们的传入的地址和端口号,新建了一个服务端,内部代码暂时没有过分深究,其他的赋值比较容易看懂,我们可以看到,这里利用了一个Listener和Responder有点陌生,看看源码如何:
先放个Listener的源码:
/** Listens on the socket. Creates jobs for the handler threads*/
private class Listener extends Thread {
private ServerSocketChannel acceptChannel = null; //the accept channel
private Selector selector = null; //the selector that we use for the server
private Reader[] readers = null;
private int currentReader = 0;
private InetSocketAddress address; //the address we bind at
private Random rand = new Random();
private long lastCleanupRunTime = 0; //the last time when a cleanup connec-
//-tion (for idle connections) ran
private long cleanupInterval = 10000; //the minimum interval between
//two cleanup runs
private int backlogLength = conf.getInt("ipc.server.listen.queue.size", 128);
private ExecutorService readPool;
public Listener() throws IOException {
address = new InetSocketAddress(bindAddress, port);
// Create a new server socket and set to non blocking mode
acceptChannel = ServerSocketChannel.open();
acceptChannel.configureBlocking(false);
// Bind the server socket to the local host and port
bind(acceptChannel.socket(), address, backlogLength);
port = acceptChannel.socket().getLocalPort(); //Could be an ephemeral port
// create a selector;
selector= Selector.open();
readers = new Reader[readThreads];
readPool = Executors.newFixedThreadPool(readThreads);
for (int i = 0; i < readThreads; i++) {
Selector readSelector = Selector.open();
Reader reader = new Reader(readSelector);
readers[i] = reader;
readPool.execute(reader);
}
// Register accepts on the server socket with the selector.
acceptChannel.register(selector, SelectionKey.OP_ACCEPT);
this.setName("IPC Server listener on " + port);
this.setDaemon(true);
}
服务器端使用这个Listener监听客户端的连接,可以看到,这里利用的是Java NIO的机制,并利用内部的线程池加以处理,代码很容易看懂,看不懂的可以认真研究下Java NIO的机制。
再来个Responder的源码:
private class Responder extends Thread {
private Selector writeSelector;
private int pending; // connections waiting to register
final static int PURGE_INTERVAL = 900000; // 15mins
Responder() throws IOException {
this.setName("IPC Server Responder");
this.setDaemon(true);
writeSelector = Selector.open(); // create a selector
pending = 0;
}
@Override
public void run() {
LOG.info(getName() + ": starting");
SERVER.set(Server.this);
long lastPurgeTime = 0; // last check for old calls.
while (running) {
try {
waitPending(); // If a channel is being registered, wait.
writeSelector.select(PURGE_INTERVAL);
Iterator<SelectionKey> iter = writeSelector.selectedKeys().iterator();
while (iter.hasNext()) {
SelectionKey key = iter.next();
iter.remove();
try {
if (key.isValid() && key.isWritable()) {
doAsyncWrite(key);
}
} catch (IOException e) {
LOG.info(getName() + ": doAsyncWrite threw exception " + e);
}
}
long now = System.currentTimeMillis();
if (now < lastPurgeTime + PURGE_INTERVAL) {
continue;
}
lastPurgeTime = now;
//
// If there were some calls that have not been sent out for a
// long time, discard them.
//
LOG.debug("Checking for old call responses.");
ArrayList<Call> calls;
// get the list of channels from list of keys.
synchronized (writeSelector.keys()) {
calls = new ArrayList<Call>(writeSelector.keys().size());
iter = writeSelector.keys().iterator();
while (iter.hasNext()) {
SelectionKey key = iter.next();
Call call = (Call)key.attachment();
if (call != null && key.channel() == call.connection.channel) {
calls.add(call);
}
}
}
for(Call call : calls) {
try {
doPurge(call, now);
} catch (IOException e) {
LOG.warn("Error in purging old calls " + e);
}
}
} catch (OutOfMemoryError e) {
//
// we can run out of memory if we have too many threads
// log the event and sleep for a minute and give
// some thread(s) a chance to finish
//
LOG.warn("Out of Memory in server select", e);
try { Thread.sleep(60000); } catch (Exception ie) {}
} catch (Exception e) {
LOG.warn("Exception in Responder " +
StringUtils.stringifyException(e));
}
}
LOG.info("Stopping " + this.getName());
}
在日常的使用中,我们传入对应的地址,端口号,即可新建一个服务端供我们使用,建议深入细节,加深自己对于NIO使用的理解。
(4)客户端
服务端OK了,我们要新建客户端来对连接服务端:
public class IPCQueryClient {
public static void main(String[] args) {
try {
// 准备一个socket地址
InetSocketAddress addr = new InetSocketAddress("localhost",
IPCQueryServer.IPC_PORT);
IPCQueryStatus query = (IPCQueryStatus) RPC.getProxy(
IPCQueryStatus.class, IPCQueryServer.IPV_VER, addr,
new Configuration());
System.out.println(query.getFileStatus("/test").getFilename());
RPC.stopProxy(query);
} catch (Exception e) {
e.printStackTrace();
}
}
}
可以看出,这里我们还是利用了RPC来获取远程服务,所以,对于Hadoop的远程调用来说,代码的精华部分都在RPC内,我们看下getProxy的代码:
/**
* Construct a client-side proxy object that implements the named protocol,
* talking to a server at the named address.
*/
public static VersionedProtocol getProxy(
Class<? extends VersionedProtocol> protocol, long clientVersion,
InetSocketAddress addr, UserGroupInformation ticket,
Configuration conf, SocketFactory factory, int rpcTimeout)
throws IOException {
if (UserGroupInformation.isSecurityEnabled()) {
SaslRpcServer.init(conf);
}
VersionedProtocol proxy = (VersionedProtocol) Proxy.newProxyInstance(
protocol.getClassLoader(), new Class[] { protocol },
new Invoker(protocol, addr, ticket, conf, factory, rpcTimeout));
long serverVersion = proxy.getProtocolVersion(protocol.getName(),
clientVersion);
if (serverVersion == clientVersion) {
return proxy;
} else {
throw new VersionMismatch(protocol.getName(), clientVersion,
serverVersion);
}
}
一路下来,到了这部分代码,可以看到,里面有对于客户端版本和服务器端版本是否一致的判断,如果不一致,就会抛出版本不一致的异常。
同时还可以发现,在返回结果之前,还有一次权限判断,对于传入的UserGroupInformation,其实就相当于我们的用户组信息,需要检测其是否有调用接口的权限。
我们这里可以看到,其实返回的代码,完全采用的就是Java动态代理的形式来做的。
综上所述:想要研究清楚Hadoop IPC的机制,必须对Java之NIO和Java动态代理了解透彻,这是精髓。
(5)上面的代码还缺少了一个IPCFileStatus的类,代码如下:
public class IPCFileStatus implements Writable {
private String filename;
public IPCFileStatus() {
}
/**
* @param filename
*/
public IPCFileStatus(String filename) {
this.filename = filename;
}
/**
* @return the filename
*/
public String getFilename() {
return filename;
}
/**
* @param filename
* the filename to set
*/
public void setFilename(String filename) {
this.filename = filename;
}
/**
* @description:
* @author:
* @param:
* @return:
**/
@Override
public void write(DataOutput out) throws IOException {
// TODO Auto-generated method stub
Text.writeString(out, filename);
}
/**
* @description:
* @author:
* @param:
* @return:
**/
@Override
public void readFields(DataInput in) throws IOException {
// TODO Auto-generated method stub
this.filename = Text.readString(in);
}
@Override
public String toString() {
return filename;
}
}
日常使用下,如此安排即可,但想要深入了解Hadoop IPC的机制,还是需要认真了解NIO和动态代理。