提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档
文章目录
前言
在Zookeeper服务端启动过程中调用了ServerCnxnFactory的Start方法,这里就开始了客户端跟服务端通信的通道,这里以netty实现方式(NettyServerCnxnFactory
)来分析下
public void start() {
LOG.info("binding to port " + localAddress);
parentChannel = bootstrap.bind(localAddress);
}
服务端执行bind操作就可以监听端口接收客户端请求了,这个localAddress
地址信息在QuorumPeerMain
runFromConfig方法中赋值,也就是配置项中的clientPort
cnxnFactory.configure(config.getClientPortAddress(), config.getMaxClientCnxns());
再看看NettyServerCnxnFactory
这个类
NettyServerCnxnFactory() {
bootstrap = new ServerBootstrap(
new NioServerSocketChannelFactory(
Executors.newCachedThreadPool(),
Executors.newCachedThreadPool()));
// parent channel
bootstrap.setOption("reuseAddress", true);
// child channels
bootstrap.setOption("child.tcpNoDelay", true);
/* set socket linger to off, so that socket close does not block */
bootstrap.setOption("child.soLinger", -1);
bootstrap.getPipeline().addLast("servercnxnfactory", channelHandler);
}
添加了一个处理Handler 实现类是CnxnChannelHandler
接下来我们看看这个类是怎样处理请求的
一、CnxnChannelHandler
定义了Channel事件的处理方法
- 处理客户端连接
public void channelConnected(ChannelHandlerContext ctx,
ChannelStateEvent e) throws Exception
{
if (LOG.isTraceEnabled()) {
LOG.trace("Channel connected " + e);
}
allChannels.add(ctx.getChannel());
NettyServerCnxn cnxn = new NettyServerCnxn(ctx.getChannel(),
zkServer, NettyServerCnxnFactory.this);
ctx.setAttachment(cnxn);
addCnxn(cnxn);
}
当有客户端连接进来后创建一个NettyServerCnxn对象,可以看到这个对象包含了客户端与服务器的连接通道(Channel)
- 处理客户端消息
public void messageReceived(ChannelHandlerContext ctx, MessageEvent e)
throws Exception
{
if (LOG.isTraceEnabled()) {
LOG.trace("message received called " + e.getMessage());
}
try {
if (LOG.isDebugEnabled()) {
LOG.debug("New message " + e.toString()
+ " from " + ctx.getChannel());
}
/**
* 建立连接过程中 已经创建了NettyServerCnxn对象并存在ctx中
*/
NettyServerCnxn cnxn = (NettyServerCnxn)ctx.getAttachment();
synchronized(cnxn) {
processMessage(e, cnxn);
}
} catch(Exception ex) {
LOG.error("Unexpected exception in receive", ex);
throw ex;
}
}
private void processMessage(MessageEvent e, NettyServerCnxn cnxn) {
if (LOG.isDebugEnabled()) {
LOG.debug(Long.toHexString(cnxn.sessionId) + " queuedBuffer: "
+ cnxn.queuedBuffer);
}
if (e instanceof NettyServerCnxn.ResumeMessageEvent) {
LOG.debug("Received ResumeMessageEvent");
if (cnxn.queuedBuffer != null) {
if (LOG.isTraceEnabled()) {
LOG.trace("processing queue "
+ Long.toHexString(cnxn.sessionId)
+ " queuedBuffer 0x"
+ ChannelBuffers.hexDump(cnxn.queuedBuffer));
}
cnxn.receiveMessage(cnxn.queuedBuffer);
if (!cnxn.queuedBuffer.readable()) {
LOG.debug("Processed queue - no bytes remaining");
cnxn.queuedBuffer = null;
} else {
LOG.debug("Processed queue - bytes remaining");
}
} else {
LOG.debug("queue empty");
}
cnxn.channel.setReadable(true);
} else {
ChannelBuffer buf = (ChannelBuffer)e.getMessage();
if (LOG.isTraceEnabled()) {
LOG.trace(Long.toHexString(cnxn.sessionId)
+ " buf 0x"
+ ChannelBuffers.hexDump(buf));
}
if (cnxn.throttled) {
LOG.debug("Received message while throttled");
// we are throttled, so we need to queue
if (cnxn.queuedBuffer == null) {
LOG.debug("allocating queue");
cnxn.queuedBuffer = dynamicBuffer(buf.readableBytes());
}
cnxn.queuedBuffer.writeBytes(buf);
if (LOG.isTraceEnabled()) {
LOG.trace(Long.toHexString(cnxn.sessionId)
+ " queuedBuffer 0x"
+ ChannelBuffers.hexDump(cnxn.queuedBuffer));
}
} else {
LOG.debug("not throttled");
if (cnxn.queuedBuffer != null) {
if (LOG.isTraceEnabled()) {
LOG.trace(Long.toHexString(cnxn.sessionId)
+ " queuedBuffer 0x"
+ ChannelBuffers.hexDump(cnxn.queuedBuffer));
}
cnxn.queuedBuffer.writeBytes(buf);
if (LOG.isTraceEnabled()) {
LOG.trace(Long.toHexString(cnxn.sessionId)
+ " queuedBuffer 0x"
+ ChannelBuffers.hexDump(cnxn.queuedBuffer));
}
cnxn.receiveMessage(cnxn.queuedBuffer);
if (!cnxn.queuedBuffer.readable()) {
LOG.debug("Processed queue - no bytes remaining");
cnxn.queuedBuffer = null;
} else {
LOG.debug("Processed queue - bytes remaining");
}
} else {
cnxn.receiveMessage(buf);
if (buf.readable()) {
if (LOG.isTraceEnabled()) {
LOG.trace("Before copy " + buf);
}
cnxn.queuedBuffer = dynamicBuffer(buf.readableBytes());
cnxn.queuedBuffer.writeBytes(buf);
if (LOG.isTraceEnabled()) {
LOG.trace("Copy is " + cnxn.queuedBuffer);
LOG.trace(Long.toHexString(cnxn.sessionId)
+ " queuedBuffer 0x"
+ ChannelBuffers.hexDump(cnxn.queuedBuffer));
}
}
}
}
}
}
最终处理客户端请求的逻辑在NettyServerCnxn
receiveMessage
方法里面实现
二、NettyServerCnxn
- receiveMessage
public void receiveMessage(ChannelBuffer message) {
try {
while(message.readable() && !throttled) {
if (bb != null) {
if (LOG.isTraceEnabled()) {
LOG.trace("message readable " + message.readableBytes()
+ " bb len " + bb.remaining() + " " + bb);
ByteBuffer dat = bb.duplicate();
dat.flip();
LOG.trace(Long.toHexString(sessionId)
+ " bb 0x"
+ ChannelBuffers.hexDump(
ChannelBuffers.copiedBuffer(dat)));
}
if (bb.remaining() > message.readableBytes()) {
int newLimit = bb.position() + message.readableBytes();
bb.limit(newLimit);
}
message.readBytes(bb);
bb.limit(bb.capacity());
if (LOG.isTraceEnabled()) {
LOG.trace("after readBytes message readable "
+ message.readableBytes()
+ " bb len " + bb.remaining() + " " + bb);
ByteBuffer dat = bb.duplicate();
dat.flip();
LOG.trace("after readbytes "
+ Long.toHexString(sessionId)
+ " bb 0x"
+ ChannelBuffers.hexDump(
ChannelBuffers.copiedBuffer(dat)));
}
// remaining()函数 计算从当前位置到上界还剩余的元素数目
if (bb.remaining() == 0) {
packetReceived();
bb.flip();
/**
* zkServer 根据选举结果的不同角色 可以是 LeaderZooKeeperServer FollowerZooKeeperServer ObserverZooKeeperServer
*/
ZooKeeperServer zks = this.zkServer;
/**
* zk服务未启动
*/
if (zks == null || !zks.isRunning()) {
throw new IOException("ZK down");
}
if (initialized) {
zks.processPacket(this, bb);
if (zks.shouldThrottle(outstandingCount.incrementAndGet())) {
disableRecvNoWait();
}
} else {
LOG.debug("got conn req request from "
+ getRemoteSocketAddress());
zks.processConnectRequest(this, bb);
initialized = true;
}
bb = null;
}
} else {
if (LOG.isTraceEnabled()) {
LOG.trace("message readable "
+ message.readableBytes()
+ " bblenrem " + bbLen.remaining());
ByteBuffer dat = bbLen.duplicate();
dat.flip();
LOG.trace(Long.toHexString(sessionId)
+ " bbLen 0x"
+ ChannelBuffers.hexDump(
ChannelBuffers.copiedBuffer(dat)));
}
if (message.readableBytes() < bbLen.remaining()) {
bbLen.limit(bbLen.position() + message.readableBytes());
}
message.readBytes(bbLen);
bbLen.limit(bbLen.capacity());
if (bbLen.remaining() == 0) {
bbLen.flip();
if (LOG.isTraceEnabled()) {
LOG.trace(Long.toHexString(sessionId)
+ " bbLen 0x"
+ ChannelBuffers.hexDump(
ChannelBuffers.copiedBuffer(bbLen)));
}
int len = bbLen.getInt();
if (LOG.isTraceEnabled()) {
LOG.trace(Long.toHexString(sessionId)
+ " bbLen len is " + len);
}
bbLen.clear();
if (!initialized) {
if (checkFourLetterWord(channel, message, len)) {
return;
}
}
if (len < 0 || len > BinaryInputArchive.maxBuffer) {
throw new IOException("Len error " + len);
}
bb = ByteBuffer.allocate(len);
}
}
}
} catch(IOException e) {
LOG.warn("Closing connection to " + getRemoteSocketAddress(), e);
close();
}
}
初始化后调用NettyServerCnxn内部的 ZooKeeperServer
(这里指向一个引用开始时候是没值的)processPacket
方法处理连接信息
ZookeeperServer在Master选举结束后设置
三 ZooKeeperServer processPacket处理连接信息
public void processPacket(ServerCnxn cnxn, ByteBuffer incomingBuffer) throws IOException {
// We have the request, now process and setup for next
InputStream bais = new ByteBufferInputStream(incomingBuffer);
BinaryInputArchive bia = BinaryInputArchive.getArchive(bais);
/**
* int xid;
* int type;
*/
RequestHeader h = new RequestHeader();
h.deserialize(bia, "header");
// Through the magic of byte buffers, txn will not be
// pointing
// to the start of the txn
incomingBuffer = incomingBuffer.slice();
if (h.getType() == OpCode.auth) {
LOG.info("got auth packet " + cnxn.getRemoteSocketAddress());
/**
* int type;
* ustring scheme;
* buffer auth;
*/
AuthPacket authPacket = new AuthPacket();
ByteBufferInputStream.byteBuffer2Record(incomingBuffer, authPacket);
String scheme = authPacket.getScheme();
AuthenticationProvider ap = ProviderRegistry.getProvider(scheme);
Code authReturn = KeeperException.Code.AUTHFAILED;
if(ap != null) {
try {
authReturn = ap.handleAuthentication(cnxn, authPacket.getAuth());
} catch(RuntimeException e) {
LOG.warn("Caught runtime exception from AuthenticationProvider: " + scheme + " due to " + e);
authReturn = KeeperException.Code.AUTHFAILED;
}
}
if (authReturn!= KeeperException.Code.OK) {
if (ap == null) {
LOG.warn("No authentication provider for scheme: "
+ scheme + " has "
+ ProviderRegistry.listProviders());
} else {
LOG.warn("Authentication failed for scheme: " + scheme);
}
// send a response...
ReplyHeader rh = new ReplyHeader(h.getXid(), 0,
KeeperException.Code.AUTHFAILED.intValue());
cnxn.sendResponse(rh, null, null);
// ... and close connection
cnxn.sendBuffer(ServerCnxnFactory.closeConn);
cnxn.disableRecv();
} else {
if (LOG.isDebugEnabled()) {
LOG.debug("Authentication succeeded for scheme: "
+ scheme);
}
LOG.info("auth success " + cnxn.getRemoteSocketAddress());
ReplyHeader rh = new ReplyHeader(h.getXid(), 0,
KeeperException.Code.OK.intValue());
cnxn.sendResponse(rh, null, null);
}
return;
} else {
if (h.getType() == OpCode.sasl) {
Record rsp = processSasl(incomingBuffer,cnxn);
ReplyHeader rh = new ReplyHeader(h.getXid(), 0, KeeperException.Code.OK.intValue());
cnxn.sendResponse(rh,rsp, "response"); // not sure about 3rd arg..what is it?
return;
}
else {
/**
* 创建请求类型
*/
Request si = new Request(cnxn, cnxn.getSessionId(), h.getXid(),
h.getType(), incomingBuffer, cnxn.getAuthInfo());
si.setOwner(ServerCnxn.me);
/**
* 处理提交 调用 RequestProcessor链处理请求
* OpCode 操作吗
*/
submitRequest(si);
}
}
cnxn.incrOutstandingRequests(h);
}
处理我们常规命令的逻辑在72-79行
// ZooKeeperServer
public void submitRequest(Request si) {
if (firstProcessor == null) {
synchronized (this) {
try {
// Since all requests are passed to the request
// processor it should wait for setting up the request
// processor chain. The state will be updated to RUNNING
// after the setup.
while (state == State.INITIAL) {
wait(1000);
}
} catch (InterruptedException e) {
LOG.warn("Unexpected interruption", e);
}
if (firstProcessor == null || state != State.RUNNING) {
throw new RuntimeException("Not started");
}
}
}
try {
touch(si.cnxn);
boolean validpacket = Request.isValid(si.type);
if (validpacket) {
/**
* RequestProcessor 是一个线程对象 已启动
*/
firstProcessor.processRequest(si);
if (si.cnxn != null) {
incInProcess();
}
} else {
LOG.warn("Received packet at server of unknown type " + si.type);
new UnimplementedRequestProcessor().processRequest(si);
}
} catch (MissingSessionException e) {
if (LOG.isDebugEnabled()) {
LOG.debug("Dropping request: " + e.getMessage());
}
} catch (RequestProcessorException e) {
LOG.error("Unable to process request:" + e.getMessage(), e);
}
}
调用ZooKeeperServer
的 RequestProcessor
类的processRequest
方法处理提交过来的请求
ZK 中有实现了不同类型的RequestProcessor,实际上具体的实现类内部还有下一个RequestProcessor的引用,就构成链表结构,不同的服务器类型,处理链最后一步都是FinalRequestProcessor
四 集群中各节点ZK RequestProcessor 创建流程
我们知道在选举结束后ZK中的节点类型有Follower
Leader
Observer
节点都持有一个ZK对象
节点类型 | ZK类型 |
---|---|
Follower | FollowerZooKeeperServer |
Leader | LeaderZooKeeperServer |
Observer | ObserverZooKeeperServer |
protected Follower makeFollower(FileTxnSnapLog logFactory) throws IOException {
return new Follower(this, new FollowerZooKeeperServer(logFactory,
this,new ZooKeeperServer.BasicDataTreeBuilder(), this.zkDb));
}
protected Leader makeLeader(FileTxnSnapLog logFactory) throws IOException {
return new Leader(this, new LeaderZooKeeperServer(logFactory,
this,new ZooKeeperServer.BasicDataTreeBuilder(), this.zkDb));
}
protected Observer makeObserver(FileTxnSnapLog logFactory) throws IOException {
return new Observer(this, new ObserverZooKeeperServer(logFactory,
this, new ZooKeeperServer.BasicDataTreeBuilder(), this.zkDb));
}
ZK对象内部都有一个RequestProcessor链表,这个在什么时候设置呢,我们具体来看下
1、Leader
- 选举结束后调用
leader.lead();
- lead 方法里面调用了 startZkServer
- 调用到 ZookeeperServer setupRequestProcessors方法(LeaderZooKeeperServer子类重写)
setupRequestProcessors 方法启动了PrepRequestProcessor CommitProcessor - 设置firstProcessor 为 PrepRequestProcessor
Leader Processor处理链表
PrepRequestProcessor -> ProposalRequestProcessor -> CommitProcessor -> ToBeAppliedRequestProcessor -> FinalRequestProcessor
2、Observer
- 选举结束后调用
observer.observeLeader()
- 同步
syncWithLeader
里面调用zk.startup(); - 调用到 ZookeeperServer setupRequestProcessors方法(ObserverZooKeeperServer子类重写)
setupRequestProcessors 方法启动了ObserverRequestProcessor CommitProcessor - 设置firstProcessor 为 ObserverRequestProcessor
ObserverProcessor处理链表
ObserverRequestProcessor -> CommitProcessor-> FinalRequestProcessor
3、Follower
- 选举结束后调用
lfollower.followLeader()
- 同步
syncWithLeader
里面调用zk.startup() - 调用到 ZookeeperServer setupRequestProcessors方法(LeaderZooKeeperServer子类重写)
setupRequestProcessors 方法启动了 SyncRequestProcessor FollowerRequestProcessorCommitProcessor - 设置firstProcessor 为 FollowerRequestProcessor
Follower处理链表
FollowerRequestProcessor-> CommitProcessor-> FinalRequestProcessor
同时也创建了一个SyncRequestProcessor
4、不同Processor的具体作用
Processor | 作用 |
---|---|
PrepRequestProcessor | Leader服务器的请求预处理器,也是Leader的第一个服务器 会对请求做一些预处理比如回话检查 ACL检查 也能识别出请求是否事务请求 |
ProposalRequestProcessor | Leader服务器的投票处理器,对应非事务请求直接转发到CommitProcessor ,如果是事务请求同时还会创建Proposal让所有Follower服务器来投票 |
SyncRequestProcessor | 日志记录处理器 ,将请求同步到快照 |
CommitProcessor | 事务提交处理器,非事务请求直接提交给下一级处理器处理 |
ToBeAppliedRequestProcessor | Leader服务器中事务处理器的下一级处理器 用于存储CommitProcessor 处理过的可被提交的Proposal |
FinalRequestProcessor | 在所有服务器类型中这都是最后一个处理器 创建客户端响应 |
PrepRequestProcessor | Leader服务器的请求预处理器,也是Leader的第一个服务器 会对请求做一些预处理比如回话检查 ACL检查 也能识别出请求是否事务请求 |
ObserverRequestProcessor | Observer的第一个处理器 事务请求(有可能改变服务器数据的请求)都会调用request方法转发到Leader服务器 |
FollowerRequestProcessor | Follower的第一个处理器 事务请求(有可能改变服务器数据的请求)都会调用request方法转发到Leader服务器 |
SendAckRequestProcessor | Follower服务器SyncRequestProcessor 的下一个处理器 用于反馈Leader服务器发送过来的Proposal |