Tcp协议是个“流协议”,流就是没有界限的一串数据。Tcp底层并不关心上层业务数据,它会根据Tcp缓冲区的实际情况进行包的划分。所以,在业务上一个完整的数据,可能会被拆分成多个Tcp数据包进行传输,也可能会将业务上的多个数据打包为一个数据包进行传输。所以,必须在Tcp的上层设计应用协议来解决。
业界主流协议的解决方案,可以归纳为如下四种:
1、消息定长。例如每个报文的大小固定为200字节,如果不够,空位补齐;
2、 在数据包尾增加回车换行符进行分割,如FTP协议。
3、 将消息分为消息头和消息体,消息头中包含表示消息总长度(或消息体长度)的字段,通常设计思路为消息头的第一个字段使用int32来表示消息的总长度;
4、更加复杂的设计。
在zooKeeper中,客户端与服务端进行通信,主要采用两种方式:1、原生java NIO接口。2、Netty库。Netty库是对java NIO的封装,使得利用java实现网络通信更加便利。在zooKeeper采用原生java NIO接口实现网络通信时,需要解决Tcp粘包与拆包问题。zooKeeper客户端与服务端通信过程中,采用jute序列化与反序列化上层数据包,其解决tcp粘包与拆包的方案为方案三:添加消息头,消息头只包含表示消息体长度的字段。客户端与服务端通信结构如下代码所示:
/**
* This class allows us to pass the headers and the relevant records around.
*/
static class Packet {
RequestHeader requestHeader;
ReplyHeader replyHeader;
Record request;
Record response;
ByteBuffer bb;
/** Client's view of the path (may differ due to chroot) **/
String clientPath;
/** Servers's view of the path (may differ due to chroot) **/
String serverPath;
boolean finished;
AsyncCallback cb;
Object ctx;
WatchRegistration watchRegistration;
public boolean readOnly;
/** Convenience ctor */
Packet(RequestHeader requestHeader, ReplyHeader replyHeader,
Record request, Record response,
WatchRegistration watchRegistration) {
this(requestHeader, replyHeader, request, response,
watchRegistration, false);
}
Packet(RequestHeader requestHeader, ReplyHeader replyHeader,
Record request, Record response,
WatchRegistration watchRegistration, boolean readOnly) {
this.requestHeader = requestHeader;
this.replyHeader = replyHeader;
this.request = request;
this.response = response;
this.readOnly = readOnly;
this.watchRegistration = watchRegistration;
}
public void createBB() {
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
BinaryOutputArchive boa = BinaryOutputArchive.getArchive(baos);
boa.writeInt(-1, "len"); // We'll fill this in later
if (requestHeader != null) {
// 将zooKeeper请求头序列化后的字节流写入boa
requestHeader.serialize(boa, "header");
}
// 将zooKeeper请求体序列化后的字节流写入boa
if (request instanceof ConnectRequest) {
request.serialize(boa, "connect");
// append "am-I-allowed-to-be-readonly" flag
boa.writeBool(readOnly, "readOnly");
} else if (request != null) {
request.serialize(boa, "request");
}
baos.close();
// 将字节流数组包装为bytebuffer
this.bb = ByteBuffer.wrap(baos.toByteArray());
// 在bytebuffer的开始写入zooKeeper请求头+zooKeeper请求体的长度。解决Tcp粘包与拆包
this.bb.putInt(this.bb.capacity() - 4);
// position归0
this.bb.rewind();
} catch (IOException e) {
LOG.warn("Ignoring unexpected exception", e);
}
}
@Override
public String toString() {
StringBuilder sb = new StringBuilder();
sb.append("clientPath:" + clientPath);
sb.append(" serverPath:" + serverPath);
sb.append(" finished:" + finished);
sb.append(" header:: " + requestHeader);
sb.append(" replyHeader:: " + replyHeader);
sb.append(" request:: " + request);
sb.append(" response:: " + response);
// jute toString is horrible, remove unnecessary newlines
return sb.toString().replaceAll("\r*\n+", " ");
}
}
createBB()方法将zooKeeper请求头和zooKeeper请求体进行序列化后包装为ByteBuffer,然后在预占位上写入序列化后的请求长度。当socketChannel可以写入的时候,直接调用socketChannel的write方法即可。
服务端接收到请求的处理代码如下:
if (k.isReadable()) {
// 将内容读取到incomingBuffer中.
// 可能读取到lenBuffer中,也可能读取到根据请求内容长度新分配的bytebuffer中
int rc = sock.read(incomingBuffer);
if (rc < 0) {
throw new EndOfStreamException(
"Unable to read additional data from client sessionid 0x"
+ Long.toHexString(sessionId)
+ ", likely client has closed socket");
}
// 如果读满了
if (incomingBuffer.remaining() == 0) {
boolean isPayload;
// 关键:如果incomingBuffer和lenBuffer指向同一个byteBuffer,
// 说明是新请求的第一次读,读取的是请求内容长度
if (incomingBuffer == lenBuffer) { // start of next request
incomingBuffer.flip();
// 获取请求内容长度
isPayload = readLength(k);
incomingBuffer.clear();
} else {
// 说明tcp拆包了,需要继续读未读完的包。
// continuation
isPayload = true;
}
// 继续读请求内容
if (isPayload) { // not the case for 4letterword
readPayload();
}
else {
// four letter words take care
// need not do anything else
return;
}
}
}
其中成员变量lenBuffer和incomingBuffer的定义为:
ByteBuffer lenBuffer = ByteBuffer.allocate(4);
ByteBuffer incomingBuffer = lenBuffer;
readLength()为读取请求内容长度,代码为:
/** Reads the first 4 bytes of lenBuffer, which could be true length or
* four letter word.
*
* @param k selection key
* @return true if length read, otw false (wasn't really the length)
* @throws IOException if buffer size exceeds maxBuffer size
*/
private boolean readLength(SelectionKey k) throws IOException {
// Read the length, now get the buffer
int len = lenBuffer.getInt();
if (!initialized && checkFourLetterWord(sk, len)) {
return false;
}
if (len < 0 || len > BinaryInputArchive.maxBuffer) {
throw new IOException("Len error " + len);
}
if (zkServer == null) {
throw new IOException("ZooKeeperServer not running");
}
// 分配请求内容长度大小的ByteBuffer
incomingBuffer = ByteBuffer.allocate(len);
return true;
}
readPayload()为读取请求内容方法,代码为:
/** Read the request payload (everything following the length prefix) */
private void readPayload() throws IOException, InterruptedException {
// 如果还有剩余空间没有读满,那就接着读。表示tcp拆包了。
if (incomingBuffer.remaining() != 0) { // have we read length bytes?
int rc = sock.read(incomingBuffer); // sock is non-blocking, so ok
if (rc < 0) {
throw new EndOfStreamException(
"Unable to read additional data from client sessionid 0x"
+ Long.toHexString(sessionId)
+ ", likely client has closed socket");
}
}
// 如果没有剩余空间了,代表已经将请求内容读完整了。
if (incomingBuffer.remaining() == 0) { // have we read length bytes?
// 收到请求包
packetReceived();
incomingBuffer.flip();
if (!initialized) {
readConnectRequest();
} else {
readRequest();
}
// 清空
lenBuffer.clear();
// 下一个请求包还未读取状态
incomingBuffer = lenBuffer;
}
}
zooKeeper使用固定长度为4字节的lenBuffer来存放请求内容长度,incomingBuffer存放请求内容。如果incomingBuffer==lenBuffer,即两个引用指向预先分配4字节的bytebuffer实例,就代表着新的请求内容长度的读取。否则,表示读取请求内容。其设计很简洁、实用,特此总结,记录。
参考:
1、《Netty权威指南 第二版》
2、 zooKeeper-3.4.6源码