Linux 偶尔解析失败,resolv.conf配置rotate问题。

b.example.com:未知的名称或服务:

应用程序服务一般会尝试根据服务器的配置查询主机名,读取一些配置文件(比如 /etc/nsswitch.conf, /etc/hosts, /etc/resolv.conf) 来决定使用什么域名服务器(nameserver),请参考系统如何处理名称解析

收到日志告警: `a.example.com:未知的名称或服务`。服务器测试情况如下,出现一定几率失败情况:
[root@v_yunweikaifa246 ~]# ping  a.example.com -c 2 -w 0.1
ping: a.example.com: Name or service not known
[root@v_yunweikaifa246 ~]# ping  a.example.com -c 2 -w 0.1
ping: a.example.com: Name or service not known
[root@v_yunweikaifa246 ~]# ping  a.example.com -c 2 -w 0.1
PING a.example.com (10.0.0.4) 56(84) bytes of data.
64 bytes from bogon (10.0.0.4): icmp_seq=1 ttl=64 time=0.291 ms
64 bytes from bogon (10.0.0.4): icmp_seq=2 ttl=64 time=0.402 ms

--- a.example.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.291/0.346/0.402/0.058 ms
发现可能原因:rotate功能:

多次使用strace -e trace=connect,write getent hosts a.example.com 跟踪连接主机发现,只有去请求nameserver 10.128.2.130才能正常解析,但服务器选择nameserver 看上去是随机的。

[root@v_yunweikaifa246 ~]# strace -e trace=connect,write  getent hosts a.example.com
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("219.141.140.10")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("219.141.136.10")}, 16) = 0
+++ exited with 2 +++

[root@v_yunweikaifa246 ~]# strace -e trace=connect,write  getent hosts a.example.com
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("219.141.136.10")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.128.2.130")}, 16) = 0
write(1, "10.0.0.4     a.example."..., 3610.0.0.4     a.example.com
) = 36
+++ exited with 0 +++

[root@v_yunweikaifa246 ~]# strace -e trace=connect,write  getent hosts a.example.com
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("219.141.140.10")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("219.141.136.10")}, 16) = 0
+++ exited with 2 +++

**注意:**dig, host, nslook 这类工具,因为他们并没有调用 resolver 的库,只是解析了/etc/resolv.conf 第一条,不能通过nameservers测试rotate选项,

查看/etc/resolv.conf 配置

[root@v_yunweikaifa141 ~]# cat /etc/resolv.conf 
options timeout:1 attempts:1 rotate
nameserver 10.128.2.130
nameserver 219.141.140.10       
nameserver 219.141.136.10		
nameserver 202.106.0.20		

根据man resolv.conf 中的解释,options timeout:1 attempts:1 rotate的意思是超时1秒,重试1次,采用rotate 模式,其中rotate 官方给的解释是:

	sets  RES_ROTATE  in  _res.options, which causes round-robin selection of nameservers from among those listed.  This hasthe effect of spreading the query load among all listed servers, rather than having all clients  try  the  first  listedserver first every time.

大致是说在配置的nameservers中随机挑选,而不是每次都首先尝试第一个。当错误nameserver给出响应

问题解决:
  1. rotate功能是为了均衡server的负载,所有当nameserver 功能不一样时,去除rotate功能。恢复按默顺序请求。

  2. 自建的服务配置在第一位和第二位,把公共服务器配置在第三位

    都增大的第一个nameserver的服务压力

问题拓展:为什么 resolv.conf 中的rotate选项,每次都选择第二个nameserver作为第一个

多次测试发现 第二个nameserver作为第一次请求的概率比较高。因为在请求前服务器已经做过rotate

        /*
         * Some resolvers want to even out the load on their nameservers.
         * Note that RES_BLAST overrides RES_ROTATE.
         */
        if ((statp->options & RES_ROTATE) != 0 &&
            (statp->options & RES_BLAST) == 0) {
                struct sockaddr_in6 *ina;
                unsigned int map;

                n = 0;
                while (n < MAXNS && EXT(statp).nsmap[n] == MAXNS)
                        n++;
                if (n < MAXNS) {
                        ina = EXT(statp).nsaddrs[n];
                        map = EXT(statp).nsmap[n];
                        for (;;) {
                                ns = n + 1;
                                while (ns < MAXNS
                                       && EXT(statp).nsmap[ns] == MAXNS)
                                        ns++;
                                if (ns == MAXNS)
                                        break;
                                EXT(statp).nsaddrs[n] = EXT(statp).nsaddrs[ns]; /*
把第二个IP地址移动到第一个*/
                                EXT(statp).nsmap[n] = EXT(statp).nsmap[ns];
                                n = ns;
                        }
                        EXT(statp).nsaddrs[n] = ina;
                        EXT(statp).nsmap[n] = map;
                }
        }

测试python 脚本

import socket
for x in range(5):
    try:
        print socket.getaddrinfo('a.example.com', 80);
    except:
        pass
问题拓展:/etc/resolv.conf的nameserver 为什么只能配置三个生效

默认情况下/etc/resolv.conf 只能配置三个,多nameserver配置查询不到 因为MAXNS`被定义三个,可以修改重新编译,但官方不推荐

/usr/include/resolv.h
 nameserver Name server IP address
      Internet address (in dot notation) of a name server that the resolver should query.  Up to MAXNS (currently 3, see <resolv.h>) name servers  may  be  listed,
      one per keyword.  If there are multiple servers, the resolver library queries them in the order listed.  If no nameserver entries are present, the default is
      to use the name server on the local machine.  (The algorithm used is to try a name server, and if the query times out,  try  the  next,  until  out  of  name
      servers, then repeat trying all the name servers until a maximum number of retries are made.)   
问题拓展:nameserver解析不了主机时,不能故障转移:

例如:

nameserver 10.0.0.1  # handles queries for some internal zones
nameserver 10.0.0.2  # handles queries for zones that .1 nameserver doesn't know about
nameserver 10.0.0.3  # handles queries out to the global internet

是因为:首先尝试第一个nameserver。如果第一个nameserver关闭并且在可配置的超时内没有响应,则解析器将移动到下一个nameserver,然后是下一个。如果第一个 DNS 服务器启动并响应,解析器永远不会继续尝试第二个或第三个nameserver。

219.141.136.10网络不可达,给出的状态码是-1,所以继续请求下一个

[root@v_yunweikaifa141 ~]# strace -e trace=connect  ping a.example.com -c 1 -w 1...PING a.example.com (10.0.0.4) 56(84) bytes of data.connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("219.141.140.10")}, 16) = -1 ENETUNREACH (Network is unreachable)connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.128.2.130")}, 16) = 0--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---64 bytes from 10.0.0.4: icmp_seq=1 ttl=64 time=0.610 ms--- a.example.com ping statistics ---1 packets transmitted, 1 received, 0% packet loss, time 0msrtt min/avg/max/mdev = 0.610/0.610/0.610/0.000 ms+++ exited with 0 +++

219.141.136.10给出响应,但未找到记录,

[root@v_yunweikaifa246 ~]# strace -e trace=connect  ping a.example.com -c 1 -w 1connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("219.141.140.10")}, 16) = 0ping: a.example.com: Name or service not known
收获:

​ 在测试过程中了解了系统是如何解析,以及resolv.conf配置和使用

参阅:

系统如何处理名称解析

resolv.conf 中设置的 DNS 名称服务器的最大数量

nameserver不知道主机时不能故障转移

为什么 resolv.conf 中的rotate选项,每次都选择第二个nameserver作为第一个

  • 1
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值