转载:https://blog.csdn.net/prog_6103/article/details/78569510
嵌入式开发中存在各种奇奇怪怪的问题,明明ping www.baidu.com可以ping通了,但是使用nslookup 却始终报错,后来经过参考这个博主文章解决了nslookup失败的问题。
缺少库
libdl.so
libnss.so
详细分析请看如下文章
busybox是自制简易系统必备的工具集了,包含了coreutils等各种系统小工具。在自己编译的内核上运行busybox就要静态编译它。以前一直遇到busybox静态编译后放在自制系统运行nslookup ping wget解析不了域名的问题,也一直懒,没有去解决。这次又遇到了这个问题,于是花点时间把它解决吧。
busybox静态编译后的nslookup
下载了glibc-static后对busybox编译:make menuconfig
,勾选Static Link
;make
。其中的build log当然不会注意,乐呵呵使用busybox nslookup blog.csdn.net
,输出Can't resolve "blog.csdn.net"。
这个问题困扰很久了,今天终于有时间来看一看究竟。
难道是因为系统网络配置的不对?ping 172.16.65.1
网关是通路。开始抓瞎乱操作,比如开busybox dnsd
,编译ip_relay
将172.16.65.1:53
映射到本机127.0.0.1:53
;全都没有效果。于是Google这个问题,基本都是在问docker的busybox。在编译Busybox的Linux机器上运行busybox nslookup blog.csdn.net
,竟然是work的!
下载strace
源代码,静态编译。先在编译机上试运行strace busybox nslookup blog.csdn.net
,把log看一看,它先读了/etc/resolv.conf
,又读了/etc/nsswitch.conf
,还有/etc/host.conf
;照搬把这些文件全在目标机器上创建好,nslookup显示server倒是随着resolve.conf的nameserver改变而改变了,可是不管设置什么server,都不能解析域名,114.114.114.114
也如此。
再回到strace的log,发现它一样加载了很多库,比如libdl.so
libnss.so
。那busybox运行在目标机器上会做什么?随意把静态编译的strace复制到目标机器,strace busybox nslookup blog.csdn.net
,好了,问题来了:目标机器上一样要加载libdl.so等库,而且在目标机器上都是No such file
的错误,所以最后输出了不能解析域名。
网上搜索了libnss是什么样的库,给出的答案是设计就是动态库,不建议静态编译。要静态编译,请先重新编译glibc,把–enable-libnss-static放上。这就比较烦了,用busybox就是冲着它dependency少,glibc又是一堆依赖,关键是编译出来占蛮大空间的。有没有其他方案?
域名解析
因为编译了node,就想看看有没有纯javascript的包支持域名解析?结果没有,还发现了npm也要用getaddrinfo
进行域名解析。这个函数是libnss
里的一个API。能绕过它么?既然网络连得通,那为什么不可以直接发包给dns服务器呢?于是下载了Wireshark
抓包,网上找找DNS协议的说明。发现了:
http://www.binarytides.com/dns-query-code-in-c-with-linux-sockets/
这篇用C代码手动发DNS请求得到ip的文章。先用JavaScript写了一遍,准备编译到node里让npm工作起来,发现node静态编译后运行npm会segment falut,暂时摆一边。于是先还是解决busybox的域名解析。grep getaddrinfo
,发现三个文件在用nslookup.c
xconnect.c
inet_common.c
,其中inet_common.c
用它解析ipv6,这个就暂时不管了。对于nslookup和xcoonect(ping
wget
都调用这里),应该把系统的getaddrinfo函数覆盖掉就好了吧,所以准备了一个getaddrinfo.h 把getaddrinfo和freeaddrinfo重写一遍,用直接发送UDP包的形式。make
编译busybox,然后在目标机器上运行:busybox nslookup blog.csdn.net
DNS解析其实过程很简单,就是发一个query请求到DNS服务器,然后服务器会返回一个解答包。这篇文章介绍的很具体:http://www.firewall.cx/networking-topics/protocols/domain-name-system-dns/161-protocols-dns-response.html
好了,编译完成。顺便把node那个fully-static编译后的DNS问题也解决了吧 :)
Translation:
Busybox is a power tool set including coreutils; it is very useful for self compiled Linux distribution. For using it on a Linux kernel without glibc support, it is required static link to compile Busybox. One issue always occurring is that nslookup
ping
wget
applets do not work for url whose host name is not ip after static link. This time my build is stuck again for the problem. And also this time, I decide to totally resolve it.
nslookup in static linked Busybox
Installing glibc static support bundles, it is easy to compile Busybox with static link
option checked: make menuconfig
; make
. Actually there is some static link warning in build process and I noticed after resolve the problem. After compiling it is happy to run busybox nslookup www.google.com
, however the output is sad - Can't resolve "www.google.com"
. What a bug!
Is there any problem on the system network configuration? Let me check. ping 172.16.65.1
the gateway is available; meanwhile my target machine has ip at 172.16.65.101
. I am lost in the sea and do trials and errors: run busybox dnsd
; compile ip_relay
and map 127.0.0.1:53
to 172,16.65.1:53
; etc. No luck at all. I also tried to Google the problem and no exact answer, most people reported busybox nslookup problem is for busybox running by docker in container. Wait! Let me try static linked Busybox on build machine: busybox nslookup www.google.com
, it works!!!
I believe the program trigger some thing in backend and connect to DNS server. I download and compile static linked (it is possible to move strace to my target machine conveniently in future) strace
to look insight. Read the trace log and find that the program touched some files like /etc/resolv.conf
/etc/nsswitch.conf
/etc/host.conf
. I set up them on the target machine as the same on the build machine. nslookup
changes DNS server when resolve.conf changes. Somehow it cannot resolve domain name even using 8.8.8.8
.
Back to the strace log and I find there are many shared libraries loaded, for example libdl.so
libnss.so
. What Busybox do internally on the target machine? For static linked strace, it is easy to move it to the target machine and run: strace busybox nslookup www.google.com
. Good, hit the problem: there are many shared libraries needed to be loaded; however, they does not exist on the target machine so that the log shows many No such file
error. Naturally it print out Can't resolve "xxx".
Searching for libnss on Internet, I got the answer that it is designed to be a shared library set; it is not recommended to link statically. If really want, recompile glibc with --enable-libnss-static
. How terrible! Lazy people want to sleep. I do not want glibc. Any idea?
DNS resolver
Actually I compile NodeJS and link it fully statically and want to use pure JavaScript package to support DNS resolving. Nothing found and get another problem: npm install
uses getaddrinfo
to resolve domain name and the function is dynamically linked to libnss
API. Let’s find a way to bypass the dynamic link. Then an idea occurs. All we can get network available, why not send DNS request to servers directly? To understand DNS process in network, I download and install Wireshark
to sniff network traffic and read a nice introduction at:
http://www.binarytides.com/dns-query-code-in-c-with-linux-sockets/
It is an article with some C code to send DNS request to server for IP address directly. At first I would like to use the method to make node npm work. After writing and testing simple JavaScript DNS module, I find that npm install
seems always segment fault
after fully static link build. Forget it and focus on Busybox
. grep getaddrinfo
and find that there are 3 files in Busybox
source code: nslookup.c
xconnect.c
(ping
wget
), inet_common.c
. For inet_common.c
, it uses getaddrinfo
to resolve ipv6
. I would not like to make things complex at beginning. Thus let me write new getaddrinfo
and freeaddrinfo
for nslookup.c
and xconnect.c
to make DNS resolving work: getaddrinfo.h make
to compile Busybox and run it on the target machine. Wow, it works finally! Below is a screenshot for busybox nslookup blog.csdn.net
The process of DNS resolving is not duplicated: send a request to DNS server and it will return answers. If interested in, you can read more about it: http://www.firewall.cx/networking-topics/protocols/domain-name-system-dns/161-protocols-dns-response.html
Cool, mission complete. Go to resolve NodeJS DNS resolving problem in fully static link build : ) It is also clear now.