机器上部署了:
gateway网关(spring-cloud-starter-gateway)
阿里sentinel(com.alibaba.cspsentinel-core)
一个某业务服务
故障现象:
一台ip为10.0.0.7的服务器升级重启后
请求这台服务器的gateway网关报错:Unable to find instance for xxx
这台服务器上的业务服务都没有注册到sentinel
解决:
控制台运行hostname发现主机名为VM_0_7_centos
但是hosts文件中和hostname返回的主机名不一致(hosts文件里主机名是短横杠):
127.0.0.1 VM-0-7-centos VM-0-7-centos
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4
::1 VM-0-7-centos VM-0-7-centos
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
于是修改hosts文件使得主机名保持一致,重启服务后系统恢复正常
127.0.0.1 VM_0_7_centos VM_0_7_centos
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4
::1 VM_0_7_centos VM_0_7_centos
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
排查过程
经过重启服务、服务器、检查配置信息等常规操作无效后
果断拦截sentinel与其他正常运行的服务交互的tcp报文进行分析并发现异常
正常报文里带的参数:
app=xxx
&hostname=VM_0_8_centos
&app_type=0
&port=8719
&v=1.6.3
&ip=10.0.0.8
&version=1620718245065
服务器响应success=true
异常报文:
app=xbcx-partner
&app_type=0
&port=8722
&v=1.6.3
&version=1620719056313
{"success":false,"code":-1,"msg":"ip.can't.be.null","data":null}
可以看到,报文里没有主机名和ip。
百度一番,得知sentinel的客户端是用HostNameUtil这个类获取ip的,于是进源码看
static {
try {
// Init the host information.
resolveHost();
} catch (Exception e) {
RecordLog.info("Failed to get local host", e);
}
}
private static void resolveHost() throws Exception {
//根据现象及查看了同一个类里的getIp方法看了调用逻辑,判断是上面这行抛了异常(UnknownHostException)但是不知道什么原因错误没有打印出来
InetAddress addr = InetAddress.getLocalHost();
hostName = addr.getHostName();
ip = addr.getHostAddress();
if (addr.isLoopbackAddress()) {
// find the first IPv4 Address that not loopback
Enumeration<NetworkInterface> interfaces = NetworkInterface.getNetworkInterfaces();
while (interfaces.hasMoreElements()) {
NetworkInterface in = interfaces.nextElement();
Enumeration<InetAddress> addrs = in.getInetAddresses();
while (addrs.hasMoreElements()) {
InetAddress address = addrs.nextElement();
if (!address.isLoopbackAddress() && address instanceof Inet4Address) {
ip = address.getHostAddress();
}
}
}
}
}
于是百度UnknownHostException异常相关信息,找到解决方案:https://blog.csdn.net/yfusu/article/details/103660975
ps:幸好是混合部署,还有其他服务器扛着,差点心肌梗塞%>_<%