关于java.io.IOException: Call to master/10.200.187.77:8020 failed on local exception: java.nio.channels.ClosedByInterruptException异常问题,
最近将hadoop的日志改成DEBUG模式,进行仔细分析,以下日志报错信息截取部分:
注意红色字体部分:就是这个信息关闭了77:40956这个连接,导致后面写操作有异常抛出。
2012-06-28 14:12:20,433 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 289 on 8020: has #4952 from 10.200.187.77:40956
2012-06-28 14:12:20,434 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-6167110925703343645 is added to invalidSet of 10.200.187.78:50010
2012-06-28 14:12:20,434 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-6167110925703343645 is added to invalidSet of 10.200.187.80:50010
2012-06-28 14:12:20,434 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-6167110925703343645 is added to invalidSet of 10.200.187.79:50010
2012-06-28 14:12:20,435 DEBUG org.apache.hadoop.ipc.Server: got #4905
2012-06-28 14:12:20,435 DEBUG org.apache.hadoop.ipc.Server: IPC Server listener on 8020: disconnecting client 10.200.187.77:40956. Number of active connections: 105
2012-06-28 14:12:20,435 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 321 on 8020: has #4905 from 10.200.187.77:40956
2012-06-28 14:12:20,435 DEBUG org.apache.hadoop.ipc.Server: Served: getProtocolVersion queueTime= 0 procesingTime= 0
2012-06-28 14:12:20,435 DEBUG org.apache.hadoop.ipc.Server: IPC Server Responder: responding to #4905 from 10.200.187.77:40956
2012-06-28 14:12:20,435 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 61) from 10.200.187.77:40956: output error
2012-06-28 14:12:20,435 INFO org.apache.hadoop.ipc.Server: IPC Server handler 321 on 8020 caught: java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:325)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1700)
at org.apache.hadoop.ipc.Server.access$2000(Server.java:94)
at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:741)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:804)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1416)
2012-06-28 14:12:20,445 DEBUG org.apache.hadoop.ipc.Server: Served: setReplication queueTime= 0 procesingTime= 13
2012-06-28 14:12:20,445 DEBUG org.apache.hadoop.ipc.Server: IPC Server Responder: responding to #4950 from 10.200.187.77:40956
2012-06-28 14:12:20,445 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call setReplication(/tmp/hive-hadoop/hive_2012-06-28_14-09-10_170_1995016130043493038/-mr-10069/6e309161-eb89-4dbe-a6cb-ea3cb0167ee0, 10) from 10.200.187.77:40956: output error
2012-06-28 14:12:20,445 INFO org.apache.hadoop.ipc.Server: IPC Server handler 328 on 8020 caught: java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:325)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1700)
at org.apache.hadoop.ipc.Server.access$2000(Server.java:94)
at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:741)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:804)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1416)
2012-06-28 14:12:20,477 DEBUG org.apache.hadoop.ipc.Server: Served: delete queueTime= 0 procesingTime= 44
2012-06-28 14:12:20,477 DEBUG org.apache.hadoop.ipc.Server: IPC Server Responder: responding to #4952 from 10.200.187.77:40956
2012-06-28 14:12:20,477 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call delete(/tmp/hive-hadoop/hive_2012-06-28_14-09-10_170_1995016130043493038/-mr-10057/e2239dfc-0947-4dc6-adc0-50e28f8e5c71, true) from 10.200.187.77:40956: output error
2012-06-28 14:12:20,477 INFO org.apache.hadoop.ipc.Server: IPC Server handler 289 on 8020 caught: java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:325)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1700)
at org.apache.hadoop.ipc.Server.access$2000(Server.java:94)
at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:741)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:804)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1416)
在Server这类里:
void doRead(SelectionKey key)throws InterruptedException {
int count = 0;
Connection c = (Connection)key.attachment();
if (c == null) {
return;
}
c.setLastContact(System.currentTimeMillis());
try {
count = c.readAndProcess();
} catch (InterruptedException ieo) {
LOG.info(getName() +": readAndProcess caught InterruptedException", ieo);
throw ieo;
} catch (Exception e) {
LOG.info(getName() +": readAndProcess threw exception " + e + ". Count of bytes read: " + count, e);
count = -1;//so that the (count < 0) block is executed
}
if (count < 0) {
if (LOG.isDebugEnabled())
LOG.debug(getName() + ": disconnecting client " +
c +". Number of active connections: "+
numConnections);
closeConnection(c);
c = null;
}
else {
c.setLastContact(System.currentTimeMillis());
}
}
仔细研究下源码调用方式,发现远程RPC调用都会在传输中传递方法名称、参数个数、类型、值等信息。但是Server端在读取时,还是会出现count小于零的情况。
打算修过源码后,再进行测试。相关结果近期将会发布。
通过3周时间的不断测试和调整,已找出原因,稍后会对此进行整理……………………
参考文章《 hadoop和hive结合使用总结(关于一个稳定性的问题) 》