rpc整合zookeeper的服务注册发现
前文:使用spring框架进行整合
本文使用的zookeeper为3.5.8版本
zookeeper客户端
客户端需要与zk连接, 并且监听zk下某一节点的变化, 在节点发生变化后, 我们的服务缓存需要刷新,
刷新的时候重新去zk上读取所有服务进行map缓存
监听器监听节点变化, 节点发生变化则会调用refresh方法, 但是我们在第一次加载zk的时候在spring中已经做了refresh所以第一次是不需要加载的, 这里将map初始值值为空, 在监听事件注册完毕后, 才初始化map, 这样在监听事件注册的时候, map为空就无法refresh
public class ZookeeperClient implements Watcher, ServletContextListener {
private ZooKeeper zk;
private CuratorFramework zkClient;
private static final int SESSION_TIMEOUT = 15000;
// 信号量,zk连接异步,用于阻塞,保证所有的操作都在zk创建连接成功以后再执行
private static final CountDownLatch countDownLatch = new CountDownLatch(1);
private NettyChannelManager manager;
private static final String REGISTRY_PATH = "/registry";
private String zkAddress;
// 初始值为null, 加载完毕zookeeper后给赋值
private ExpiringMap<String, String> expiringMap = null;
public ZookeeperClient() {
}
public ZookeeperClient(String zkAddress) {
connect(zkAddress);
}
/**
* @Description: 由于会话断开后需要重连, 所以从构造器中提取出了connect
*/
private void connect(String zkAddress) {
try {
expiringMap = null;
// 监听器为自己当前类
zk = new ZooKeeper(zkAddress, SESSION_TIMEOUT, this);
this.zkAddress = zkAddress;
countDownLatch.await();
// 在第一次注册监听器的时候, map为空, 所以不会触发refresh
// 初始化一个带有时限的map
// 每一个服务器ip为key, value可以不要
// 延迟1秒可以保证该服务器的所有服务注册完毕, 这样可以防止重复refresh
// 所以这里保证了一次服务的状态变化只会导致一次refresh
// 两秒后map中的entry销毁, 监听器监听到销毁事件来进行refresh
expiringMap = ExpiringMap.builder()
.maxSize(100)
.expiration(1L, TimeUnit.SECONDS)
.expirationPolicy(ExpirationPolicy.ACCESSED)
.variableExpiration()
.expirationListener((key, value) -> {
log.info("ip:{}下所有服务已经连接", key);
manager.refresh();
})
.build();
// 断开重连的refresh
if(null != manager) {
manager.refresh();
}
} catch (Exception ex) {
log.error("zookeeper连接失败", ex);
}
}
/**
* @Description: 由于ExpiringMap没有public的构造器, 所以无法实现内部类, 只能默认用个put方法
*/
private void addToMap(String key) {
String DEFAULT_VALUE = "DEFAULT_VALUE";
if(null != expiringMap) {
expiringMap.put(key, DEFAULT_VALUE);
}
}
public void setManager(NettyChannelManager manager) {
this.manager = manager;
}
/**
* 获取路径下所有子节点
*
* @param
* @return
* @throws KeeperException
* @throws InterruptedException
*/
public List<String> getAllService() throws KeeperException, InterruptedException {
List<String> children = zk.getChildren(REGISTRY_PATH, false);
return children;
}
public List<String> getServiceByName(String serviceName) throws KeeperException, InterruptedException {
String path = REGISTRY_PATH + "/" + serviceName;
List<String> children = zk.getChildren(path, false);
return children;
}
public String getData(String nodeName) throws KeeperException, InterruptedException {
String path = REGISTRY_PATH + "/" + nodeName;
byte[] data = zk.getData(path, false, null);
if (data == null) {
return "";
}
return new String(data);
}
private void refreshManager(String ip) {
// map为空则表示这是第一次注册监听器
if (null != expiringMap) {
// 重复put可以刷新过期时间, entry过期后refresh
addToMap(ip);
}
}
private void addRootListener(String root, CuratorFramework zkClient) throws Exception {
PathChildrenCache cache = new PathChildrenCache(zkClient, root, true);
// 在初始化时就开始进行监听
cache.start(PathChildrenCache.StartMode.POST_INITIALIZED_EVENT);
// 我们所希望的是监听中的代码块执行完毕后, 才执行我们的业务逻辑, 所以加入线程池监听状态
cache.getListenable().addListener((client, event) ->
NettyThreadPool.getInstance().submit(() -> {
switch (event.getType()) {
case CHILD_ADDED:
String serverPath = event.getData().getPath();
log.info("新增节点:" + serverPath);
//监听Server, 在具体服务获取时做refresh
try {
addServerListener(serverPath, client);
} catch (Exception e) {
e.printStackTrace();
}
break;
case CHILD_UPDATED:
log.info("节点:" + event.getData().getPath() + ",数据修改为:" + new String(event.getData().getData()));
break;
case CHILD_REMOVED:
String serverPath2 = event.getData().getPath();
log.info("节点:" + serverPath2 + "被删除");
break;
default:
break;
}
}));
}
private void addServerListener(String path, CuratorFramework zkClient) throws Exception {
PathChildrenCache cache = new PathChildrenCache(zkClient, path, true);
// 在初始化时就开始进行监听
cache.start(PathChildrenCache.StartMode.POST_INITIALIZED_EVENT);
cache.getListenable().addListener((client, event) -> NettyThreadPool.getInstance().submit(() -> {
switch (event.getType()) {
case CHILD_ADDED:
byte[] childData = event.getData().getData();
String ip = new String(childData);
log.info("新增子节点:" + event.getData().getPath() + ",数据为:" + ip);
// 调用一个refresh方法, 判断逻辑在方法中
refreshManager(ip);
break;
case CHILD_UPDATED:
log.info("子节点:" + event.getData().getPath() + ",数据修改为:" + new String(event.getData().getData()));
break;
case CHILD_REMOVED:
byte[] childData2 = event.getData().getData();
String ip2 = new String(childData2);
log.info("子节点:" + event.getData().getPath() + "被删除");
refreshManager(ip2);
break;
default:
break;
}
}));
}
@Override
public void process(WatchedEvent watchedEvent) {
// 获取事件状态
KeeperState keeperState = watchedEvent.getState();
// 获取事件类型
EventType eventType = watchedEvent.getType();
// 判断是否建立连接
if (KeeperState.SyncConnected == keeperState) {
// 如果当前状态已经连接上了 SyncConnected:连接,AuthFailed:认证失败,Expired:失效过期,
// ConnectedReadOnly:连接只读,Disconnected:连接失败
if (EventType.None == eventType) {
// 开启一个监听服务
try {
RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 10);
this.zkClient = CuratorFrameworkFactory.builder()
.connectString(zkAddress)
.sessionTimeoutMs(SESSION_TIMEOUT)
.retryPolicy(retryPolicy).build();
this.zkClient.start();
addRootListener(REGISTRY_PATH, zkClient);
// 保证线程池内添加了任务
Thread.sleep(500);
} catch (Exception e) {
e.printStackTrace();
} finally {
// 通过线程池的活动任务数判断所有的监听器已经执行完毕
// 即此刻节点扫描已经完成
for(;;) {
int activeCount = NettyThreadPool.getInstance().getActiveCount();
if(activeCount == 0) {
break;
}
}
// 释放
countDownLatch.countDown();
}
log.info("zookeeper连接成功, address:{}", zkAddress);
}
} else if (KeeperState.Expired == keeperState) {
connect(zkAddress);
}
}
客户端的作用就是获得zk上的节点, 并且缓存到本地, 并且监听节点变化
zookeeper服务端注册
我们服务端注册在/registry节点下, 由于每个服务都可能有多个地址, 所以子节点服务名下会存在一个自增节点, 该节点中的数据即服务部署地址
public class ServiceRegistry implements Watcher {
private ZooKeeper zk;
private static final int SESSION_TIMEOUT = 15000;
private static final CountDownLatch countDownLatch = new CountDownLatch(1);
private String zkAddress;
public ServiceRegistry() {
}
private Stat exists(String node, boolean needWatch) {
try {
return zk.exists(node, needWatch);
} catch (Exception e) {
e.printStackTrace();
return null;
}
}
public ServiceRegistry(String zkAddress) {
connect(zkAddress);
}
private void connect(String zkAddress) {
try {
zk = new ZooKeeper(zkAddress, SESSION_TIMEOUT, this);
countDownLatch.await();
this.zkAddress = zkAddress;
log.debug("zookeeper连接成功");
} catch (Exception ex) {
log.error("连接zookeeper失败", ex);
}
}
private static final String REGISTRY_PATH = "/registry";
public void register(String serviceName, String serviceAddress) {
try {
String registryPath = REGISTRY_PATH;
if (exists(registryPath, false) == null) {
zk.create(registryPath, null, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
log.debug("创建了节点:{}", registryPath);
}
//创建服务节点(持久节点)
String servicePath = registryPath + "/" + serviceName;
if (exists(servicePath, false) == null) {
zk.create(servicePath, null, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
log.debug("创建了节点:{}", servicePath);
}
//创建地址节点, 临时自增节点
String addressPath = servicePath + "/address-";
String addressNode = zk.create(addressPath, serviceAddress.getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
log.debug("创建了节点:{} => {}", addressNode, serviceAddress);
} catch (Exception e) {
log.error("创建节点失败", e);
}
}
@Override
public void process(WatchedEvent watchedEvent) {
// 获取事件状态
Event.KeeperState keeperState = watchedEvent.getState();
// 获取事件类型
Event.EventType eventType = watchedEvent.getType();
// 判断是否建立连接
if (Event.KeeperState.SyncConnected == keeperState) {
// 如果当前状态已经连接上了 SyncConnected:连接,AuthFailed:认证失败,Expired:失效过期,
// ConnectedReadOnly:连接只读,Disconnected:连接失败
if (Event.EventType.None == eventType) {
// 如果建立建立成功,让后程序往下走
log.info("zk 建立连接成功!");
countDownLatch.countDown();
}
} else if (Event.KeeperState.Expired == keeperState) {
connect(zkAddress);
}
}
}
manager
manager可以说是最核心的部分, 它负责从zk中获取服务, 做服务名与地址list的缓存, 做地址与channel的缓存
每个channel我都做了一个超时操作, 超过10分钟没有就会自动断开连接, 在这段时间内被使用过的话就会刷新时间戳
public class NettyChannelManager {
// 一个服务可能部署在多台机子上, 所以这里是一个服务名->所有服务地址的缓存
private final Map<String, List<SocketAddress>> serviceAddressMap;
// 调用的时候只有服务名, 通过服务名对上面的list进行轮询
// 最后获得的地址来这里获取channel
private final ExpiringMap<SocketAddress, Channel> channels;
// 上述的轮询状态保存在这里
private final Map<String, AtomicInteger> roundRobinMap;
private final RequestSender sender;
private final ZookeeperClient zkClient;
public NettyChannelManager(RequestSender sender, ZookeeperClient zkClient) {
this.serviceAddressMap = new ConcurrentHashMap<>();
this.roundRobinMap = new ConcurrentHashMap<>();
this.sender = sender;
this.zkClient = zkClient;
// 给zk设置manager, 事件监听调用refresh
zkClient.setManager(this);
// 这里使用了一个带有过期时间的map
channels = ExpiringMap.builder()
.maxSize(100)
.expiration(10, TimeUnit.MINUTES) // 十分钟没有通信就会关闭channel
.expirationPolicy(ExpirationPolicy.ACCESSED)
.variableExpiration()
// 监听过期事件的监听器
.expirationListener(this::removeExpiredChannel)
.build();
NettyInterfaceInvoker.setSender(this.sender);
}
// 这里只关闭channel的连接, 不会去清理key
private void removeExpiredChannel(SocketAddress key, Channel channel) {
if (channel != null) {
channel.close();
}
// 重新put, 保持key持续存在
channels.put(key, channel);
}
/**
* @Description: 轮询获取一个channel
*/
public synchronized Channel take(String serviceName) {
List<SocketAddress> socketAddressList = serviceAddressMap.get(serviceName);
// 服务名称下不存在服务端连接则返回null
if (socketAddressList != null && socketAddressList.size() > 0) {
int size = channels.size();
AtomicInteger roundRobin = roundRobinMap.get(serviceName);
int index = (roundRobin.getAndAdd(1) + size) % size;
SocketAddress socketAddress = socketAddressList.get(index);
Channel channel = channels.get(socketAddress);
if (channel != null && channel.isOpen()) {
// 重置超时时间 (这里可能用put会更好, 极端情况下key失效了就无法reset了
// 在这里reset更符合语义
channels.resetExpiration(socketAddress);
return channel;
} else if(channel != null && !channel.isRegistered()) {
// channel超时后, 关闭连接, 这里需要重启连接
try {
// 重新获取与服务端的连接
channel = connectServerNode(socketAddress);
} catch(IllegalArgumentException e) {
// 如果获取失败就说明是服务端主动关闭的, 或者服务端网络异常
log.error("客户端获取channel失败, 移除失效channel后重试");
// 这里会重新递归, 移除了失效channel之后的channels会返回一个有效的channel
return take(serviceName);
}
return channel;
}
}
return null;
}
public synchronized void refresh() {
try {
// refresh目的是重新从zookeeper上拉取最新服务, 所以原有服务map清空
serviceAddressMap.clear();
List<String> serviceList = zkClient.getAllService();
// 若zookeeper上无任何服务, 则直接将channel清空
if (serviceList == null || serviceList.size() == 0) {
for (final Channel channel : channels.values()) {
channel.close();
}
channels.clear();
log.info("zookeeper上无任何服务");
return;
}
// 获取最新服务
refreshService(serviceList);
// 检查并且移除已经无用或者失效的channel
checkAndRemoveChannel();
log.info("服务refresh成功");
} catch (KeeperException | InterruptedException e) {
e.printStackTrace();
}
}
/**
* @Description: 获取zookeeper上的最新服务
*/
private void refreshService(List<String> serviceList) throws KeeperException, InterruptedException {
// 遍历服务列表
for (String serviceName : serviceList) {
// 获取服务下的所有地址节点
List<String> addressList = zkClient.getServiceByName(serviceName);
// 遍历地址节点, 获得真正的地址
List<SocketAddress> socketAddressList = new ArrayList<>();
serviceAddressMap.put(serviceName, socketAddressList);
roundRobinMap.put(serviceName, new AtomicInteger(0));
for (String address : addressList) {
String path = serviceName + "/" + address;
String[] hostAndPort = zkClient.getData(path).split(":");
if (hostAndPort.length == 2) {
String host = hostAndPort[0];
int port = Integer.parseInt(hostAndPort[1]);
final SocketAddress remotePeer = new InetSocketAddress(host, port);
socketAddressList.add(remotePeer);
checkAndBuildChannel(remotePeer);
}
}
}
}
/**
* @Description: 检查并移除channel
*/
private void checkAndRemoveChannel() {
// 创建一个需要移除的List
List<SocketAddress> removeList = new ArrayList<>();
// 创建一个连接地址的最全集
Set<SocketAddress> allAddress = new HashSet<>();
// 获取一个服务的所有连接, 此时已经是最新状态
for (List<SocketAddress> socketAddressSet : serviceAddressMap.values()) {
allAddress.addAll(socketAddressSet);
}
// 获取获取所有channel对应的地址
// expiringMap的keySet有bug 会无限循环, 所以这里只能曲线救国
// 已经提了issue, 就看作者维不维护了
int count = 0;
for (SocketAddress socketAddress : channels.keySet()) {
count ++;
Channel channel = channels.get(socketAddress);
// 地址不为空并且在所有服务中都无此地址
if (socketAddress != null && !allAddress.contains(socketAddress)) {
// 做关闭channel操作
if (channel != null) {
channel.close();
}
channel = null; // help GC
// foreach的时候迭代删除会出错
// 可以使用iterator解决
removeList.add(socketAddress);
}
if(count == channels.size()) {
break;
}
}
for (SocketAddress socketAddress : removeList) {
channels.remove(socketAddress);
}
}
/**
* @Description: 检查并构建channel
*/
private void checkAndBuildChannel(SocketAddress serverNodeAddress) {
// 一个地址对应一个channel, 如果channel存在就不会重新连接
Channel channel = channels.get(serverNodeAddress);
if (channel != null && channel.isOpen()) {
log.info("当前服务节点已存在,无需重新连接.{}", serverNodeAddress);
} else {
// 此时channel已经关闭 直接移除新建
if (channel != null) {
channel = null; // help GC
channels.remove(serverNodeAddress);
}
try {
connectServerNode(serverNodeAddress);
} catch(IllegalArgumentException e) {
log.error("refresh的时候channel已被关闭",e);
}
}
}
/**
* @Description: refresh的时候重新构建channel放入map
*/
private Channel connectServerNode(SocketAddress address) throws IllegalArgumentException {
try {
Channel channel = sender.connect(address);
// 这里说明是服务器主动的断开连接, 我们需要去除这个端点
if(null == channel) {
remove(address);
// 移除后抛出异常
throw new IllegalArgumentException();
}
channels.put(address, channel);
return channel;
} catch (InterruptedException e) {
e.printStackTrace();
log.info("未能连接到服务器.{}", address);
}
return null;
}
public synchronized void remove(SocketAddress address) {
channels.remove(address);
for (List<SocketAddress> list : serviceAddressMap.values()) {
list.remove(address);
}
}
}
整个rpc的流程图
未完持续…
后记
该项目我上传至gitee上了, 因为我这里访问github会404
pp-netty-rpc: https://gitee.com/bigzibo/pp-netty-rpc
做完整个rpc的实践学习之后, 更加了解了netty的使用, 并且清楚了将代码注入到spring的生命周期的过程, 第一次使用zookeeper也踩了许多的坑, 比如watcher只能实现一次, 无法使用该方法进行节点监听, 因为节点注册不是在客户端上进行
希望大家共同进步!!!