问题定位
在执行spark代码的时候,一个读只有几行文件的操作,耗时20几秒,明显异常,通过Yourkit发现耗时的地方全部都是在做DNS解析。
问题分析
通过简单的测试代码测试HDFS连接,在创建socket的时候,首先查找的host是hdfs ha的虚拟地址,在这个地方去解析地址耗时过多,然后返回失败信息,再去寻找真正的主机地址。
public static InetSocketAddress createSocketAddrForHost(String host, int port) {
String staticHost = getStaticResolution(host);
String resolveHost = (staticHost != null) ? staticHost : host;
InetSocketAddress addr;
try {
InetAddress iaddr = SecurityUtil.getByName(resolveHost);
// if there is a static entry for the host, make the returned
// address look like the original given host
if (staticHost != null) {
iaddr = InetAddress.getByAddress(host, iaddr.getAddress());
}
addr = new InetSocketAddress(iaddr, port);
} catch (UnknownHostException e) {
addr = InetSocketAddress.createUnresolved(host, port);
}
return addr;
}