由两个python脚本引起的学习兴趣 – 查看程序的库依赖
在网上看到宋宝华老师写的两个python脚本,是可以用来查看程序的库依赖的,github地址在下面:
https://github.com/21cnbao/libdep/
Linux程序对库的依赖
symbol-dep.py
:
https://cloud.tencent.com/developer/article/1518254
原理
用nm -D --undefined-only
命令可以列出一个程序依赖的需要动态链接的库函数,例如:
➜ nm -D --undefined-only a.out
w __gmon_start__
U __libc_start_main
U puts
a.out
是一个随意编写的C程序编译出来的ELF可执行文件。
helloc.c
:
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("hello world!\n");
return 0;
}
gcc hello.c
用nm -D --defined-only
命令可以列出一个动态链接库给别人提供的函数,例如:
➜ nm -D --defined-only /lib/x86_64-linux-gnu/libc-2.19.so | more
0000000000046d30 T a64l
0000000000039f90 T abort
00000000003bfe00 B __abort_msg
000000000003c920 T abs
...
./symbol-dep.py -s a -d b.so
,只要把a依赖的函数,与b.so供给的函数中,求一个交集,即可在完全没有源代码的情况下,知道a会调用到b.so的哪些函数。
代码实现
#!/usr/bin/python3
import sys, getopt, os
def main(argv):
srcfile = ''
dstfile = ''
neededsymbols = []
exportedsymbols = []
try:
opts, args = getopt.getopt(argv, "hs:d:", ["sfile=", "dfile="])
except getopt.GetoptError:
print ('symbol-dep.py -s <srcfile> -d <dstfile>')
sys.exit(2)
for opt, arg in opts:
if opt == '-h':
print ('symbol-dep.py -s <srcfile> -d <dstfile>')
sys.exit()
elif opt in ("-s", "--sfile"):
srcfile = arg
elif opt in ("-d", "--dfile"):
dstfile = arg
# get the symbols srcfile depends on
src=os.popen("nm -D --undefined-only "+srcfile)
srclist=src.read().splitlines()
for sline in srclist:
neededsymbols.append(sline.split()[-1])
# get the symbols dstfile exports
dst=os.popen("nm -D --defined-only "+dstfile)
dstlist=dst.read().splitlines()
for dline in dstlist:
exportedsymbols.append(dline.split()[-1])
# intersection of src and dest
for symbol in neededsymbols:
if symbol in exportedsymbols:
print(symbol)
if __name__ == "__main__":
main(sys.argv[1:])
在ubuntu虚拟机上使用:
./symbol-dep.py -s a.out -d /lib/x86_64-linux-gnu/libc-2.19.so
画出Linux程序/库依赖图
libdep-pic.py
:
https://cloud.tencent.com/developer/article/1518085
原理
dot绘图工具
ubuntu安装dot绘图工具:
sudo apt-get install xdot
编写test.dot
数据代码:
digraph graphname {
a -> b -> c;
b -> d;
}
运行命令生成图片test.png
:
dot -Tpng -o test.png test.dot
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qMQHUbL3-1602486864635)(./dot/test.png)]
ldd工具
ldd工具是一个普通的shell脚本,可以列出来elf文件所依赖的.so,以及.so依赖的.so。
➜ which ldd
/usr/bin/ldd
➜ file /usr/bin/ldd
/usr/bin/ldd: Bourne-Again shell script, ASCII text executable
操作方法:
➜ ldd /usr/lib/firefox/firefox
linux-vdso.so.1 => (0x00007fffef7e9000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f3b7f724000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f3b7f520000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f3b7f21c000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3b7ef16000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3b7eb51000)
/lib64/ld-linux-x86-64.so.2 (0x00007f3b7fb5e000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f3b7e93b000)
linux-vdso.so.1
的vdso
全称是Virtual Dynamic Shared Object
,是一个虚拟的动态库,不存在/lib
或/usb/lib
下。
firefox依赖于libm.so.6等,如果我们对libm.so.6继续ldd,就可以分析出更深层次的依赖。所以,整个依赖图依赖于递归。
➜ ldd /lib/x86_64-linux-gnu/libm.so.6
linux-vdso.so.1 => (0x00007ffe37f60000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdffe5c1000)
/lib64/ld-linux-x86-64.so.2 (0x00007fdffec8c000)
代码实现
#!/usr/bin/python2
import sys, os, re
analyzedlist = []
# get the libs prog depends on and write the results into opened file f
def dep(f, prog):
# one lib may be used by several users
if prog in analyzedlist:
return
else:
analyzedlist.append(prog)
pname = prog.split('/')[-1]
needed=os.popen("ldd "+prog)
neededso=re.findall(r'[>](.*?)[(]', needed.read())
for so in neededso:
if(len(so.strip()) > 0):
f.write('"' + pname + '" -> "' + so.split('/')[-1] + '";\n')
dep(f, so)
def main(argv):
f = open('/tmp/libdep.dot','w',encoding='utf-8')
f.write('digraph graphname {\n')
dep(f, argv)
f.write('}\n')
f.close()
os.popen("dot -Tpng -o ./libdep.png /tmp/libdep.dot")
if __name__ == "__main__":
if len(sys.argv) == 2:
main(sys.argv[1])
else:
print ("usage: libdep-pic.py [program]")
在ubuntu虚拟机上使用:
./libdep-pic.py /usr/lib/firefox/firefox
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-EkBVUCX6-1602486864638)(./dot/libdep.png)]
ldd工具
介绍:
1)
ldd不是一个可执行程序,而只是一个shell脚本 ldd能够显示可执行模块的dependency(所属),其原理是通过设置一系列的环境变量,如下: LD_TRACE_LOADED_OBJECTS、LD_WARN、LD_BIND_NOW、LD_LIBRARY_VERSION、LD_VERBOSE等。当LD_TRACE_LOADED_OBJECTS环境变量不为空时,任何可执行程序在运行时,它都会只显示模块的dependency(所属),而程序并不真正执行。要不你可以在shell终端测试一下,如下: export LD_TRACE_LOADED_OBJECTS=1 再执行任何的程序,如ls等,看看程序的运行结果。
2)
ldd显示可执行模块的dependency(所属)的工作原理,其实质是通过ld-linux.so(elf动态库的装载器)来实现的。我们知道,ld-linux.so模块会先于executable模块程序工作,并获得控制权,因此当上述的那些环境变量被设置时,ld-linux.so选择了显示可执行模块的dependency(所属)。 实际上可以直接执行ld-linux.so模块,如: /lib/ld-linux.so.2 --list program(这相当于ldd program)
ldd脚本源码:
#! /bin/bash
# Copyright (C) 1996-2014 Free Software Foundation, Inc.
# This file is part of the GNU C Library.
# The GNU C Library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
# The GNU C Library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# Lesser General Public License for more details.
# You should have received a copy of the GNU Lesser General Public
# License along with the GNU C Library; if not, see
# <http://www.gnu.org/licenses/>.
# This is the `ldd' command, which lists what shared libraries are
# used by given dynamically-linked executables. It works by invoking the
# run-time dynamic linker as a command and setting the environment
# variable LD_TRACE_LOADED_OBJECTS to a non-empty value.
# We should be able to find the translation right at the beginning.
TEXTDOMAIN=libc
TEXTDOMAINDIR=/usr/share/locale
RTLDLIST="/lib/ld-linux.so.2 /lib64/ld-linux-x86-64.so.2 /libx32/ld-linux-x32.so.2"
warn=
bind_now=
verbose=
while test $# -gt 0; do
case "$1" in
--vers | --versi | --versio | --version)
echo 'ldd (Ubuntu EGLIBC 2.19-0ubuntu6.6) 2.19'
printf $"Copyright (C) %s Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
" "2014"
printf $"Written by %s and %s.
" "Roland McGrath" "Ulrich Drepper"
exit 0
;;
--h | --he | --hel | --help)
echo $"Usage: ldd [OPTION]... FILE...
--help print this help and exit
--version print version information and exit
-d, --data-relocs process data relocations
-r, --function-relocs process data and function relocations
-u, --unused print unused direct dependencies
-v, --verbose print all information
"
printf $"For bug reporting instructions, please see:\\n%s.\\n" \
"<https://bugs.launchpad.net/ubuntu/+source/eglibc/+bugs>"
exit 0
;;
-d | --d | --da | --dat | --data | --data- | --data-r | --data-re | \
--data-rel | --data-relo | --data-reloc | --data-relocs)
warn=yes
shift
;;
-r | --f | --fu | --fun | --func | --funct | --functi | --functio | \
--function | --function- | --function-r | --function-re | --function-rel | \
--function-relo | --function-reloc | --function-relocs)
warn=yes
bind_now=yes
shift
;;
-v | --verb | --verbo | --verbos | --verbose)
verbose=yes
shift
;;
-u | --u | --un | --unu | --unus | --unuse | --unused)
unused=yes
shift
;;
--v | --ve | --ver)
echo >&2 $"ldd: option \`$1' is ambiguous"
exit 1
;;
--) # Stop option processing.
shift; break
;;
-*)
echo >&2 'ldd:' $"unrecognized option" "\`$1'"
echo >&2 $"Try \`ldd --help' for more information."
exit 1
;;
*)
break
;;
esac
done
nonelf ()
{
# Maybe extra code for non-ELF binaries.
return 1;
}
add_env="LD_TRACE_LOADED_OBJECTS=1 LD_WARN=$warn LD_BIND_NOW=$bind_now"
add_env="$add_env LD_LIBRARY_VERSION=\$verify_out"
add_env="$add_env LD_VERBOSE=$verbose"
if test "$unused" = yes; then
add_env="$add_env LD_DEBUG=\"$LD_DEBUG${LD_DEBUG:+,}unused\""
fi
# The following command substitution is needed to make ldd work in SELinux
# environments where the RTLD might not have permission to write to the
# terminal. The extra "x" character prevents the shell from trimming trailing
# newlines from command substitution results. This function is defined as a
# subshell compound list (using "(...)") to prevent parameter assignments from
# affecting the calling shell execution environment.
try_trace() (
output=$(eval $add_env '"$@"' 2>&1; rc=$?; printf 'x'; exit $rc)
rc=$?
printf '%s' "${output%x}"
return $rc
)
case $# in
0)
echo >&2 'ldd:' $"missing file arguments"
echo >&2 $"Try \`ldd --help' for more information."
exit 1
;;
1)
single_file=t
;;
*)
single_file=f
;;
esac
result=0
for file do
# We don't list the file name when there is only one.
test $single_file = t || echo "${file}:"
case $file in
*/*) :
;;
*) file=./$file
;;
esac
if test ! -e "$file"; then
echo "ldd: ${file}:" $"No such file or directory" >&2
result=1
elif test ! -f "$file"; then
echo "ldd: ${file}:" $"not regular file" >&2
result=1
elif test -r "$file"; then
RTLD=
ret=1
for rtld in ${RTLDLIST}; do
if test -x $rtld; then
dummy=`$rtld 2>&1`
if test $? = 127; then
verify_out=`${rtld} --verify "$file"`
ret=$?
case $ret in
[02]) RTLD=${rtld}; break;;
esac
fi
fi
done
case $ret in
0|2)
try_trace "$RTLD" "$file" || result=1
;;
1)
# This can be a non-ELF binary or no binary at all.
nonelf "$file" || {
echo $" not a dynamic executable"
result=1
}
;;
*)
echo 'ldd:' ${RTLD} $"exited with unknown exit code" "($ret)" >&2
exit 1
;;
esac
else
echo 'ldd:' $"error: you do not have read permission for" "\`$file'" >&2
result=1
fi
done
exit $result
# Local Variables:
# mode:ksh
# End:
上面的脚本关键应该是在try_trace()
函数和add_env
变量,首先设置环境变量LD_TRACE_LOADED_OBJECTS=1
,其他的环境变量根据输入的选项进行设置,然后会在RTLDLIST
中找到一个当前环境存在的ld-linux.so
库,找到之后将ld
和要查看的文件输入try_trace()
,就可以有对应的输出了。
简化ldd
执行的命令相当于下面的工作:
➜ export LD_TRACE_LOADED_OBJECTS=1
➜ /lib64/ld-linux-x86-64.so.2 ./a.out
linux-vdso.so.1 => (0x00007ffe89dcb000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb168b4b000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb168f10000)
当设置了之后,就连普通的ls
命令也会打印出当前的依赖库:
➜ export LD_TRACE_LOADED_OBJECTS=1
➜ ls
linux-vdso.so.1 => (0x00007ffc2b7bb000)
libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007f7675ee4000)
libacl.so.1 => /lib/x86_64-linux-gnu/libacl.so.1 (0x00007f7675cdc000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7675917000)
libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007f76756d9000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f76754d5000)
/lib64/ld-linux-x86-64.so.2 (0x00007f7676107000)
libattr.so.1 => /lib/x86_64-linux-gnu/libattr.so.1 (0x00007f76752d0000)
➜ unset LD_TRACE_LOADED_OBJECTS
撤销LD_TRACE_LOADED_OBJECTS
环境变量后恢复正常。
在某些情况下(例如,程序规范使用ELF解释器 ld-linux.so之外),ldd的某些版本可能会尝试通过直接执行程序来获取依赖项信息,这可能导致执行在程序的ELF解释器中定义的任何代码,或者执行程序本身。(例如,在2.27之前的glibc版本中,上游ldd实现做到了这一点,尽管大多数发行版提供了未修改的版本。
因此可以使用另外的代替:
➜ objdump -p ./a.out | grep NEEDED
NEEDED libc.so.6
实际ldd
脚本的关键是动态链接器ld
。
ld-linux.so.X
查看相关的man手册:
- man ldd(http://www.kernel.org/doc/man-pages/online/pages/man1/ldd.1.html)
- man ld.so(http://www.kernel.org/doc/man-pages/online/pages/man8/ld.so.8.html)
- man ldconfig(http://www.kernel.org/doc/man-pages/online/pages/man8/ldconfig.8.html)
这里生成了对应网页的pdf文件。
相关参考博客:
- Linux 动态库剖析(http://www.ibm.com/developerworks/cn/linux/l-dynamic-libraries/)
- 剖析共享程序库(http://www.ibm.com/developerworks/cn/linux/l-shlibs.html)
到这里就是ld
工具的代码了,后续想要了解看来还是需要很多时间和精力去研究一下。
在网上看到相关资料,发现ld.so
是glibc
的内容,这里下载glibc-2.30.tar.gz
源码,发现完全看不懂,算了吧,当前能力还是不够阅读这些代码,哈哈哈。