今天启动Kafka consumer时遇到了一个很诡异的问题,莫名其妙地抛出了UnknownHostException。同事的提醒下,将/etc/hosts
中配置上了VM的主机名的映射,解决了问题。如今日志已不可追溯,但发现StackOverFlow已有此类问题1。
java.net.InetAddress.getLocalHost
本质上是由于Kafka的ZookeeperConsumerConnector
会调用java.net.InetAddress.getLocalHost()
,可能是为了生成consumer的一些默认配置(有说法是说为了生成默认的consumer id)。由于Kafka是scala写的,在IDE中虽然可以看到经过转换的Java代码,但可读性极差。由于异常是直接由getLocalHost()
抛出,就直接来看看它吧。以下省略了一些安全检查以及缓存相关的代码。
public static InetAddress getLocalHost() throws UnknownHostException {
try {
// 这个方法最终唤起了native API
// Java_java_net_Inet6AddressImpl_getLocalHostName
String local = impl.getLocalHostName();
if (local.equals("localhost")) {
// 这个函数会返回Ipv4或者Ipv6的回送地址
return impl.loopbackAddress();
}
InetAddress ret = null;
synchronized (cacheLock) {
if (ret == null) {
InetAddress[] localAddrs;
try {
// 这个函数最终会调用
// sun.net.spi.nameservice.dns包中的
// DNSNameService.lookupAllHostAddr
localAddrs = InetAddress.getAddressesFromNameService(local, null);
} catch (UnknownHostException uhe) {
// Rethrow with a more informative error message.
UnknownHostException uhe2 = new UnknownHostException(local + ": " + uhe.getMessage());
uhe2.initCause(uhe);
throw uhe2;
}
cachedLocalHost = localAddrs[0];
cacheTime = now;
ret = localAddrs[0];
}
}
return ret;
} catch (java.lang.SecurityException e) {
return impl.loopbackAddress();
}
}
Java_java_net_Inet6AddressImpl_getLocalHostName
关于Java_java_net_Inet6AddressImpl_getLocalHostName是Java native方法的惯用命名方式,我最初在一篇文章有提到过。在OpenJDK中可以找到这个方法的源码2:
JNIEXPORT jstring JNICALL
Java_java_net_Inet6AddressImpl_getLocalHostName(JNIEnv *env, jobject this) {
char hostname[NI_MAXHOST+1];
hostname[0] = '\0';
if (JVM_GetHostName(hostname, sizeof(hostname))) {
/* Something went wrong, maybe networking is not setup? */
strcpy(hostname, "localhost");
} else {
// ensure null-terminated
hostname[NI_MAXHOST] = '\0';
#if defined(__linux__) || defined(_ALLBSD_SOURCE)
// do nothing
#else
#ifdef AF_INET6
// ...
#endif /* AF_INET6 */
#endif /* __linux__ || _ALLBSD_SOURCE */
}
return (*env)->NewStringUTF(env, hostname);
}
所以JVM_GetHostName
应该是获取hostname的入口,找到这个宏定义3,
JVM_LEAF(int, JVM_GetHostName(char* name, int namelen))
JVMWrapper("JVM_GetHostName");
return os::get_host_name(name, namelen);
JVM_END
以及其调用的os::get_host_name
4:
inline int os::get_host_name(char* name, int namelen) {
return ::gethostname(name, namelen);
}
至此,我们找到了最终的调用函数gethostname
,在Linux系统下,来自头文件<unistd.h>
。
gethostname
Linux下通过man查看gethostname的具体说明:
- NAME
gethostname, sethostname - get/set hostname- DESCRIPTION
These system calls are used to access or to change the hostname of the current processor.
sethostname() sets the hostname to the value given in the character array name. The len argument specifies the number of bytes in name. (Thus, name does not require a terminating null byte.)
gethostname() returns the null-terminated hostname in the character array name, which has a length of len bytes. If the null-terminated hostname is too large to fit, then the name is truncated, and no error is returned (but see NOTES below). POSIX.1-2001 says that if such truncation occurs, then it is unspecified whether the returned buffer includes a terminating null byte.
介绍了gethostname
的格式相关与返回错误码的信息,并没有提到具体的hostname是根据什么来决定的。先不继续翻源码了,直接看一下返回值吧。
#include <unistd.h>
#include <iostream>
using namespace std;
int main() {
char name[65];
gethostname(name, sizeof(name));
cout<< name << endl;
return 0;
}
在我的VM上,上面程序输出“docker100113”,确实与我昨日看到的UnknownHostException中的错误信息相符。
翻看hostname的manual page,给出了hostname实际被设置的时间(系统启动时):
- NAME
hostname - show or set the system’s host name- DESCRIPTION
GET NAME
hostname will print the name of the system as returned by the gethostname(2) function.
SET NAME
The host name is usually set once at system startup in /etc/rc.d/rc.inet1 or /etc/init.d/boot (normally by reading the contents of a file which contains the host name, e.g. /etc/hostname).
以及gethostname的实际返回的结果就是hostname:
[dev@docker100113 ~]$ hostname
docker100113
lookupAllHostAddr
回到java.net.InetAddress.getLocalHost()
,经过一系列检查,最后调用会落到sun.net.spi.nameservice.dns包中的DNSNameService.lookupAllHostAddr
中。这个包似乎不开源,没法进行debug。但是由于跟DNS相关,似乎getLocalHost
函数最终还是落到了域名解析服务上。也就不奇怪为什么修改hosts文件可以最终解决这个异常。DNS的本质是要解决域名到IP地址的映射,所以getLocalHost
想要拿到的是本地域名对应的IP地址,如果没有拿到,就会抛出UnknownHostException
。而hosts默认情况下只会有localhost或者localhost.localdomain的回送地址映射,远端的DNS也不会包含本机的主机名,所以这个IP的解析“无疾而终”。
网上找到了一份lookupAllHostAddr源码,想看的同学移步参考文献。
总结
回头再看一下,其实这并不是Kafka本身的原因所导致的。根本上是由于java.net对getLocalHost的实现是与机器本身配置息息相关,DNS将会影响到函数最终的行为。
参考文献
java获取主机IP地址,适用于Linux以及windows环境
Unix/Linux System Call gethostname()
UnknownHostException kafka - StackOverFlow
DNSNameService.java
- StackOverFlow Discussion ↩
- openjdk/jdk/src/solaris/native/java/net/Inet6AddressImpl.c ↩
- openjdk/hotspot/src/share/vm/prims/jvm.cpp ↩
- openjdk/hotspot/src/os/linux/vm/os_linux.inline.hpp ↩