DataNode在启动时,为能与NameNode进行RPC调用,会创建NameNode的代理,该代理的创建用到了JAVA的动态代理,在与NameNode通讯时用到了NIO,整个创建流出稍显复杂所以单独记录下这个创建流程。
这里的返回类型已经变成了VersionProtocol,为DatanodeProtocol的父类
我们只看核心代码:
ObjectWritable value = (ObjectWritable)
client.call(new Invocation(method, args), remoteId);
call方法中的Invocation类实现了Writable接口,在这个方法调用中会建立网络连接,向RPC server发送call对象,等待对方返回,这样就实现了一次调用,表面看这一切都很协调,但是你错了,上代码
要理清这些问题,还要继续跟踪下去
首先看连接的建立
调用流程如下
void startDataNode(Configuration conf,
AbstractList<File> dataDirs, SecureResources resources
) throws IOException {
.......
// 开始创建代理,注意协议类型为DatanodeProtocol
this.namenode = (DatanodeProtocol)
RPC.waitForProxy(DatanodeProtocol.class,
DatanodeProtocol.versionID,
nameNodeAddr,
conf);
.......
}
对于RPC.waitForProxy来说会通过调用RPC.getProxy来获得
这里的返回类型已经变成了VersionProtocol,为DatanodeProtocol的父类
public static VersionedProtocol getProxy(
Class<? extends VersionedProtocol> protocol,
long clientVersion, InetSocketAddress addr, UserGroupInformation ticket,
Configuration conf, SocketFactory factory, int rpcTimeout) throws IOException {
if (UserGroupInformation.isSecurityEnabled()) {
SaslRpcServer.init(conf);
}
//动态代理生成实例protocol为
//org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol类型
//Invoker中包含了具体代理类的方法调用流程,下面会单独贴出这个类
VersionedProtocol proxy =
(VersionedProtocol) Proxy.newProxyInstance(
protocol.getClassLoader(), new Class[] { protocol },
new Invoker(protocol, addr, ticket, conf, factory, rpcTimeout));
//获得服务端版本用于对比
long serverVersion = proxy.getProtocolVersion(protocol.getName(),
clientVersion);
//版本比较成功后则返回代理
if (serverVersion == clientVersion) {
return proxy;
} else {
throw new VersionMismatch(protocol.getName(), clientVersion,
serverVersion);
}
}
看下Invoker类,这个类封装了被代理类的方法调用,并添加了新的东西,在这里调用时只记录了函数调用时间
private static class Invoker implements InvocationHandler {
private Client.ConnectionId remoteId;
private Client client;
private boolean isClosed = false;
public Invoker(Class<? extends VersionedProtocol> protocol,
InetSocketAddress address, UserGroupInformation ticket,
Configuration conf, SocketFactory factory,
int rpcTimeout) throws IOException {
//保存连接编号
this.remoteId = Client.ConnectionId.getConnectionId(address, protocol,
ticket, rpcTimeout, conf);
//保存客户端
this.client = CLIENTS.getClient(conf, factory);
}
//方法调用在开启调试的模式下,则记录方法调用的时间
public Object invoke(Object proxy, Method method, Object[] args)
throws Throwable {
final boolean logDebug = LOG.isDebugEnabled();
long startTime = 0;
if (logDebug) {
//记录开始时间
startTime = System.currentTimeMillis();
}
//真正的方法调用
ObjectWritable value = (ObjectWritable)
client.call(new Invocation(method, args), remoteId);
if (logDebug) {
//获得调用时间
long callTime = System.currentTimeMillis() - startTime;
LOG.debug("Call: " + method.getName() + " " + callTime);
}
//返回调用结果
return value.get();
}
/* close the IPC client that's responsible for this invoker's RPCs */
synchronized private void close() {
if (!isClosed) {
isClosed = true;
CLIENTS.stopClient(client);
}
}
}
在创建完代理后会调用该代理的方法,获得服务端版本,会调用Invoker的invoke方法
我们只看核心代码:
ObjectWritable value = (ObjectWritable)
client.call(new Invocation(method, args), remoteId);
call方法中的Invocation类实现了Writable接口,在这个方法调用中会建立网络连接,向RPC server发送call对象,等待对方返回,这样就实现了一次调用,表面看这一切都很协调,但是你错了,上代码
public Writable call(Writable param, ConnectionId remoteId)
throws InterruptedException, IOException {
//建立调用对象
Call call = new Call(param);
//获得连接
Connection connection = getConnection(remoteId, call);
//发送调用
connection.sendParam(call); // send the parameter
boolean interrupted = false;
synchronized (call) {
//开始等……
while (!call.done) {
try {
call.wait(); // wait for the result
} catch (InterruptedException ie) {
// save the fact that we were interrupted
interrupted = true;
}
}
if (interrupted) {
// set the interrupt flag now that we are done waiting
Thread.currentThread().interrupt();
}
//调用错误会怎么样?
if (call.error != null) {
if (call.error instanceof RemoteException) {
call.error.fillInStackTrace();
throw call.error;
} else { // local exception
// use the connection because it will reflect an ip change, unlike
// the remoteId
throw wrapException(connection.getRemoteAddress(), call.error);
}
} else {
//返回调用值
return call.value;
}
}
}
关键是连接是怎么建立的?又是在哪里和服务端通信的?返回值哪个地方获得的?为什么这里还要等待?
要理清这些问题,还要继续跟踪下去
首先看连接的建立
private Connection getConnection(ConnectionId remoteId,
Call call)
throws IOException, InterruptedException {
if (!running.get()) {
// the client is stopped
throw new IOException("The client is stopped");
}
Connection connection;
/* we could avoid this allocation for each RPC by having a
* connectionsId object and with set() method. We need to manage the
* refs for keys in HashMap properly. For now its ok.
*/
do {
synchronized (connections) {
//从连接池中获得连接
connection = connections.get(remoteId);
if (connection == null) {
//如果没有,则直接创建新连接,这里的连接只是初始化一些成员变量,还没有真正和NameNode连接
connection = new Connection(remoteId);
//放入连接池
connections.put(remoteId, connection);
}
}
} while (!connection.addCall(call));
//we don't invoke the method below inside "synchronized (connections)"
//block above. The reason for that is if the server happens to be slow,
//it will take longer to establish a connection and that will slow the
//entire system down.
//上面只是创建了连接对象,并未真正建立连接,下面这个函数会真正建立socket连接
connection.setupIOstreams();
return connection;
}
下面看真正建立连接的部分connection.setupIOstreams(),获得输入输出流、发送头信息,启动接受线程
private synchronized void setupIOstreams() throws InterruptedException {
if (socket != null || shouldCloseConnection.get()) {
return;
}
try {
if (LOG.isDebugEnabled()) {
LOG.debug("Connecting to "+server);
}
short numRetries = 0;
final short maxRetries = 15;
Random rand = null;
while (true) {
//建立连接
setupConnection();
//获得网络IO流
InputStream inStream = NetUtils.getInputStream(socket);
OutputStream outStream = NetUtils.getOutputStream(socket);
//发送包头信息
writeRpcHeader(outStream);
if (useSasl) {
final InputStream in2 = inStream;
final OutputStream out2 = outStream;
UserGroupInformation ticket = remoteId.getTicket();
if (authMethod == AuthMethod.KERBEROS) {
if (ticket.getRealUser() != null) {
ticket = ticket.getRealUser();
}
}
boolean continueSasl = false;
try {
continueSasl =
ticket.doAs(new PrivilegedExceptionAction<Boolean>() {
@Override
public Boolean run() throws IOException {
return setupSaslConnection(in2, out2);
}
});
} catch (Exception ex) {
if (rand == null) {
rand = new Random();
}
handleSaslConnectionFailure(numRetries++, maxRetries, ex, rand,
ticket);
continue;
}
if (continueSasl) {
// Sasl connect is successful. Let's set up Sasl i/o streams.
inStream = saslRpcClient.getInputStream(inStream);
outStream = saslRpcClient.getOutputStream(outStream);
} else {
// fall back to simple auth because server told us so.
authMethod = AuthMethod.SIMPLE;
header = new ConnectionHeader(header.getProtocol(),
header.getUgi(), authMethod);
useSasl = false;
}
}
//更新连接的流信息
this.in = new DataInputStream(new BufferedInputStream
(new PingInputStream(inStream)));
this.out = new DataOutputStream
(new BufferedOutputStream(outStream));
//发送协议头,刚才发送的是RPC头信息,该处协议为DatanodeProtocol
writeHeader();
// update last activity time
touch();
// 启动返回值接收线程
start();
return;
}
} catch (Throwable t) {
if (t instanceof IOException) {
markClosed((IOException)t);
} else {
markClosed(new IOException("Couldn't set up IO streams", t));
}
close();
}
}
到现在基本已经清楚具体流程了,但始终没见socket是如何建立的,只有看到后心中才能有底,继续往下跟踪,看建立连接的setupConnection函数
private synchronized void setupConnection() throws IOException {
short ioFailures = 0;
short timeoutFailures = 0;
while (true) {
try {
//终于看到socket了,是通过工厂类获得的
this.socket = socketFactory.createSocket();
this.socket.setTcpNoDelay(tcpNoDelay);
/*
* Bind the socket to the host specified in the principal name of the
* client, to ensure Server matching address of the client connection
* to host name in principal passed.
*/
if (UserGroupInformation.isSecurityEnabled()) {
KerberosInfo krbInfo =
remoteId.getProtocol().getAnnotation(KerberosInfo.class);
if (krbInfo != null && krbInfo.clientPrincipal() != null) {
String host =
SecurityUtil.getHostFromPrincipal(remoteId.getTicket().getUserName());
// If host name is a valid local address then bind socket to it
InetAddress localAddr = NetUtils.getLocalInetAddress(host);
if (localAddr != null) {
this.socket.bind(new InetSocketAddress(localAddr, 0));
}
}
}
// 在这里才真正开始建立连接、超时时间20秒,也算够长了
NetUtils.connect(this.socket, server, 20000);
if (rpcTimeout > 0) {
pingInterval = rpcTimeout; // rpcTimeout overwrites pingInterval
}
this.socket.setSoTimeout(pingInterval);
return;
} catch (SocketTimeoutException toe) {
/* Check for an address change and update the local reference.
* Reset the failure counter if the address was changed
*/
if (updateAddress()) {
timeoutFailures = ioFailures = 0;
}
/* The max number of retries is 45,
* which amounts to 20s*45 = 15 minutes retries.
*/
handleConnectionFailure(timeoutFailures++, 45, toe);
} catch (IOException ie) {
if (updateAddress()) {
timeoutFailures = ioFailures = 0;
}
handleConnectionFailure(ioFailures++, maxRetries, ie);
}
}
}
连接成功后会先发送RPC的头信息,如果此时在服务端跟踪的话,会看到断点已经定位到RPC server的Listener的run中
/* Write the RPC header */
private void writeRpcHeader(OutputStream outStream) throws IOException {
DataOutputStream out = new DataOutputStream(new BufferedOutputStream(outStream));
// Write out the header, version and authentication method
out.write(Server.HEADER.array());
out.write(Server.CURRENT_VERSION);
authMethod.write(out);
out.flush();
}
到这里连接已经建立完成,并且已经通过校验,后面会分析如何发送调用对象和获得返回值。