inter position带来的crash问题,大家有碰到过么?

刚来到新部门没有几天,subsystem代码还没有怎么下载,就接到coredump问题。

部分非必要的,可能涉嫌透露产品的信息作了处理

【问题报告】

接到报告,xyzd 进程出现crash,经tester初步查看,和本组负责的组件相关。

【日志搜集】

连上环境,简单粗暴

journalctl >xyzd_crash.log

[robot@xx-0:/var/lib/systemd/coredump]

$ ls

core.xyzd.760.23726ab5789e4c08a46dfaa540e075b8.29810.1597131685000000000000.lz4

解开后是上述调用栈

(gdb) bt

#0  0x00007ffff7f9139b in avahi_entry_group_free () from /lib64/libavahi-core.so.7

#1  0x000055555555cdf2 in xxoo::AvahiUserdata::~AvahiUserdata (this=0x7ffff645eb40, __in_chrg=<optimized out>)

    at common/avahi/AvahiManager.cpp:470

#2  0x000055555555d3af in xxoo::AvahiManager::ThreadFunc (this=0x55555558b3a0, avahi_mode=xxoo::AvahiMode::PUBLISH)

    at /usr/include/bits/syslog.h:31

#3  0x00007ffff79ac354 in execute_native_thread_routine () from /lib64/libstdc++.so.6

#4  0x00007ffff7ad74df in start_thread () from /lib64/libpthread.so.0

#5  0x00007ffff77f36a3 in clone () from /lib64/libc.so.6

【分析过程】

阶段一:

Master上找到的代码,行数对不齐

#0  0x00007ffff7f9139b in avahi_entry_group_free () from /lib64/libavahi-core.so.7

#1  0x000055555555cdf2 in xyz::AvahiUserdata::~AvahiUserdata (this=0x7ffff645eb40, __in_chrg=<optimized out>)

    at common/avahi/AvahiManager.cpp:470

#2  0x000055555555d3af in xyz::AvahiManager::ThreadFunc (this=0x55555558b3a0, avahi_mode=xyz::AvahiMode::PUBLISH)

从官网下载代码,对应group结构如下

struct AvahiEntryGroup {
    char *path;
    AvahiEntryGroupState state;
    int state_valid;
    AvahiClient *client;
    AvahiEntryGroupCallback callback;
    void *userdata;
    AVAHI_LLIST_FIELDS(AvahiEntryGroup, groups);
};

对应的

(gdb) p this->group_

$1 = (AvahiEntryGroup *) 0x7fffe80045d0

(gdb) x/100xw 0x7fffe80045d0

0x7fffe80045d0: 0xe8001280      0x00007fff        0x7fffe8001280  --path

                0x00000002        --state,对应 AVAHI_ENTRY_GROUP_ESTABLISHED

                0x00000001    --state_valid,表示可用

0x7fffe80045e0: 0xe8000c60      0x00007fff        0x7fffe800c60     --client

                             0x5555d130      0x00005555        0x55555555d130  --callback

0x7fffe80045f0: 0xf645eb40      0x00007fff        0x7fffef645eb40 --userdata    

                0x00000000      0x00000000        --next --prev   都是空的,说明是单个的节点

Path 具体内容如下:/Client13/Entryroup1

47 '/'  67 'C'  108 'l' 105 'i' 101 'e' 110 'n' 116 't' 49 '1'

0x7fffe8001288: 51 '3'  47 '/'  69 'E'  110 'n' 116 't' 114 'r' 121 'y' 71 'G'

0x7fffe8001290: 114 'r' 111 'o' 117 'u' 112 'p' 49 '1'  0 '\000'

Client

struct AvahiClient {
    const AvahiPoll *poll_api;
    DBusConnection *bus;
    int error;
    AvahiClientState state;
    AvahiClientFlags flags;

    /* Cache for some seldom changing server data */
    char *version_string, *host_name, *host_name_fqdn, *domain_name;
    uint32_t local_service_cookie;
    int local_service_cookie_valid;

    AvahiClientCallback callback;
    void *userdata;

    AVAHI_LLIST_HEAD(AvahiEntryGroup, groups);
    AVAHI_LLIST_HEAD(AvahiDomainBrowser, domain_browsers);
    AVAHI_LLIST_HEAD(AvahiServiceBrowser, service_browsers);
    AVAHI_LLIST_HEAD(AvahiServiceTypeBrowser, service_type_browsers);
    AVAHI_LLIST_HEAD(AvahiServiceResolver, service_resolvers);
    AVAHI_LLIST_HEAD(AvahiHostNameResolver, host_name_resolvers);
    AVAHI_LLIST_HEAD(AvahiAddressResolver, address_resolvers);
    AVAHI_LLIST_HEAD(AvahiRecordBrowser, record_browsers);
};
 

(gdb) x/100xw 0x7fffe8000c60

0x7fffe8000c60: 0xe8000b60      0x00007fff       poll_api

                               0x00000000      0x00000000        bus

0x7fffe8000c70: 0xffffffe9         0x00000064                                    error

                              0x00000002                       state

                             0x00000000                      flags

                             0x00000000        0x00000000                           version_string

                              0x7fffe8000c80:    host_name

                              0x00000000     0x00000000                         host_name_fqdn

                             0x00000000      0x00000000     domain_name

                             0x00000000                            local_service_cookie

0x7fffe8000c90:      0x00000000      0x00000000 local_service_cookie_valid

                             0x5555d920      0x00005555    callback

0x7fffe8000cb0: 0xf645eb40      0x00007fff      userdata

                               0xe80045d0      0x00007fff       groups

从结构上看,并没有看出问题,反汇编看看,调用栈是挂在 0x00007ffff7f9139b

int avahi_entry_group_free(AvahiEntryGroup *group) {

    AvahiClient *client = group->client;

    int r = AVAHI_OK;



    assert(group);



    if (group->path && avahi_client_is_connected(client))

        r = entry_group_simple_method_call(group, "Free");



    AVAHI_LLIST_REMOVE(AvahiEntryGroup, groups, client->groups, group);



    avahi_free(group->path);

    avahi_free(group);



    return r;

}

(gdb) disas avahi_entry_group_free

Dump of assembler code for function avahi_entry_group_free:

   0x00007ffff7f91380 <+0>:     push   %rbp

   0x00007ffff7f91381 <+1>:     push   %rbx

   0x00007ffff7f91382 <+2>:     sub    $0x8,%rsp

   0x00007ffff7f91386 <+6>:     test   %rdi,%rdi

   0x00007ffff7f91389 <+9>:     je     0x7ffff7f91464 <avahi_entry_group_free+228>

   0x00007ffff7f9138f <+15>:    mov    %rsi,%rbp

   0x00007ffff7f91392 <+18>:    test   %rsi,%rsi

   0x00007ffff7f91395 <+21>:    je     0x7ffff7f91445 <avahi_entry_group_free+197>

=> 0x00007ffff7f9139b <+27>:    mov    0x60(%rsi),%rsi

分析到这里,其实快要接近答案。

有两个线索一开始没有注意到

线索1  #0  0x00007ffff7f9139b in avahi_entry_group_free () from /lib64/libavahi-core.so.7

    这里注意是挂在libavahi-core.so当中,之所以错过这个线索,在于不了解avahi的背景,实际上

线索2  disas avahi_entry_group_free 应该继续往下找出完整版本,没有kownledge就没有sense,当天继续错过答案。

阶段2

最终再次check 源代码,发现avahi_entry_group_free 有两份实现,libavahi-core/libavahi-client当中各有一份,从具体代码看,判定应该是libavahi-client才对。

反汇编方式来证实

反汇编

(gdb) info r

rax            0x0                 0

rbx            0x7ffff645eb40      140737325165376

rcx            0x7fffe80008d0      140737085704400

rdx            0x7fffe8001910      140737085708560

rsi            0x7                 7 这个参数明显不合法,导致0x00007ffff7f9139b 指令访问到了保护范围,比如中断向量表,触发段错误

rdi            0x7fffe8004600      140737085720064

rbp            0x7                 0x7

rsp            0x7ffff645eaf0      0x7ffff645eaf0

r8             0x7fffe80008d8      140737085704408

r9             0x6                 6

r10            0x4000              16384

r11            0x0                 0

r12            0x55555558b3a0      93824992457632

r13            0x0                 0

r14            0x7ffff645eb78      140737325165432

r15            0x7ffff645eb2c      140737325165356

rip            0x7ffff7f9139b      0x7ffff7f9139b <avahi_entry_group_free+27>

eflags         0x10202             [ IF RF ]

cs             0x33                51

ss             0x2b                43

ds             0x0                 0

es             0x0                 0

fs             0x0                 0

gs             0x0                 0

通过反汇编看,完整指令如下

(gdb) disas avahi_entry_group_free

Dump of assembler code for function avahi_entry_group_free:

   0x00007ffff7f91380 <+0>:     push   %rbp

   0x00007ffff7f91381 <+1>:     push   %rbx

   0x00007ffff7f91382 <+2>:     sub    $0x8,%rsp

   0x00007ffff7f91386 <+6>:     test   %rdi,%rdi

   0x00007ffff7f91389 <+9>:     je     0x7ffff7f91464 <avahi_entry_group_free+228>

   0x00007ffff7f9138f <+15>:    mov    %rsi,%rbp

   0x00007ffff7f91392 <+18>:    test   %rsi,%rsi

   0x00007ffff7f91395 <+21>:    je     0x7ffff7f91445 <avahi_entry_group_free+197>

=> 0x00007ffff7f9139b <+27>:    mov    0x60(%rsi),%rsi

   0x00007ffff7f9139f <+31>:    mov    %rdi,%rbx

   0x00007ffff7f913a2 <+34>:    test   %rsi,%rsi

   0x00007ffff7f913a5 <+37>:    je     0x7ffff7f913c1 <avahi_entry_group_free+65>

   0x00007ffff7f913a7 <+39>:    nopw   0x0(%rax,%rax,1)

   0x00007ffff7f913b0 <+48>:    mov    %rbx,%rdi

   0x00007ffff7f913b3 <+51>:    callq  0x7ffff7f88c70 <avahi_entry_free@plt>

   0x00007ffff7f913b8 <+56>:    mov    0x60(%rbp),%rsi

   0x00007ffff7f913bc <+60>:    test   %rsi,%rsi

   0x00007ffff7f913bf <+63>:    jne    0x7ffff7f913b0 <avahi_entry_group_free+48>

   0x00007ffff7f913c1 <+65>:    mov    0x38(%rbp),%rdi

   0x00007ffff7f913c5 <+69>:    test   %rdi,%rdi

   0x00007ffff7f913c8 <+72>:    je     0x7ffff7f913cf <avahi_entry_group_free+79>

   0x00007ffff7f913ca <+74>:    callq  0x7ffff7f89170 <avahi_time_event_free@plt>//半桶水的知识面上看,关键点在这里

   0x00007ffff7f913cf <+79>:    mov    0x50(%rbp),%rdx

   0x00007ffff7f913d3 <+83>:    mov    0x58(%rbp),%rax

   0x00007ffff7f913d7 <+87>:    test   %rdx,%rdx

   0x00007ffff7f913da <+90>:    je     0x7ffff7f913e4 <avahi_entry_group_free+100>

   0x00007ffff7f913dc <+92>:    mov    %rax,0x58(%rdx)

   0x00007ffff7f913e0 <+96>:    mov    0x58(%rbp),%rax

   0x00007ffff7f913e4 <+100>:   test   %rax,%rax

   0x00007ffff7f913e7 <+103>:   je     0x7ffff7f91410 <avahi_entry_group_free+144>

   0x00007ffff7f913e9 <+105>:   mov    0x50(%rbp),%rdx

   0x00007ffff7f913ed <+109>:   mov    %rdx,0x50(%rax)

   0x00007ffff7f913f1 <+113>:   movq   $0x0,0x58(%rbp)

   0x00007ffff7f913f9 <+121>:   mov    %rbp,%rdi

   0x00007ffff7f913fc <+124>:   movq   $0x0,0x50(%rbp)

   0x00007ffff7f91404 <+132>:   add    $0x8,%rsp

   0x00007ffff7f91408 <+136>:   pop    %rbx

   0x00007ffff7f91409 <+137>:   pop    %rbp

   0x00007ffff7f9140a <+138>:   jmpq   0x7ffff7f893f0 <avahi_free@plt>

   0x00007ffff7f9140f <+143>:   nop

   0x00007ffff7f91410 <+144>:   cmp    0xf8(%rbx),%rbp

   0x00007ffff7f91417 <+151>:   jne    0x7ffff7f91426 <avahi_entry_group_free+166>

   0x00007ffff7f91419 <+153>:   mov    0x50(%rbp),%rax

--Type <RET> for more, q to quit, c to continue without paging--c

   0x00007ffff7f9141d <+157>:   mov    %rax,0xf8(%rbx)

   0x00007ffff7f91424 <+164>:   jmp    0x7ffff7f913f1 <avahi_entry_group_free+113>

   0x00007ffff7f91426 <+166>:   lea    0x18c93(%rip),%rcx        # 0x7ffff7faa0c0

   0x00007ffff7f9142d <+173>:   mov    $0x6d,%edx

   0x00007ffff7f91432 <+178>:   lea    0x1862a(%rip),%rsi        # 0x7ffff7fa9a63

   0x00007ffff7f91439 <+185>:   lea    0x16dde(%rip),%rdi        # 0x7ffff7fa821e

   0x00007ffff7f91440 <+192>:   callq  0x7ffff7f88770 <__assert_fail@plt>

   0x00007ffff7f91445 <+197>:   lea    0x18c74(%rip),%rcx        # 0x7ffff7faa0c0

   0x00007ffff7f9144c <+204>:   mov    $0x65,%edx

   0x00007ffff7f91451 <+209>:   lea    0x1860b(%rip),%rsi        # 0x7ffff7fa9a63

   0x00007ffff7f91458 <+216>:   lea    0x19d3a(%rip),%rdi        # 0x7ffff7fab199

   0x00007ffff7f9145f <+223>:   callq  0x7ffff7f88770 <__assert_fail@plt>

   0x00007ffff7f91464 <+228>:   lea    0x18c55(%rip),%rcx        # 0x7ffff7faa0c0

   0x00007ffff7f9146b <+235>:   mov    $0x64,%edx

   0x00007ffff7f91470 <+240>:   lea    0x185ec(%rip),%rsi        # 0x7ffff7fa9a63

   0x00007ffff7f91477 <+247>:   lea    0x1b11c(%rip),%rdi        # 0x7ffff7fac59a

   0x00007ffff7f9147e <+254>:   callq  0x7ffff7f88770 <__assert_fail@plt>

我们自身的代码调用如下

预期中的函数调用是如下(符号在/usr/lib64/libavahi-client.so中)

int avahi_entry_group_free(AvahiEntryGroup *group) {

    AvahiClient *client = group->client;

    int r = AVAHI_OK;



    assert(group);



    if (group->path && avahi_client_is_connected(client))

        r = entry_group_simple_method_call(group, "Free");



    AVAHI_LLIST_REMOVE(AvahiEntryGroup, groups, client->groups, group);



    avahi_free(group->path);

    avahi_free(group);



    return r;

}

但是实际调用的是这个(符号在 /usr/lib64/libavahi-core.so中)

void avahi_entry_group_free(AvahiServer *s, AvahiSEntryGroup *g) {

    assert(s);

    assert(g);



    while (g->entries)

        avahi_entry_free(s, g->entries);



    if (g->register_time_event)

        avahi_time_event_free(g->register_time_event);//就是通过这个符号调用才明确是调用了该函数



    AVAHI_LLIST_REMOVE(AvahiSEntryGroup, groups, s->groups, g);

    avahi_free(g);

}

倒推回来,该函数接收到的两个参数都不是预期中的,只要走到该流程,必然出问题。

阶段三

请教 组内大拿 才知道

configure.ac 里面,有依赖关系配置,其中需要check的avahi库如下

PKG_CHECK_MODULES([LIBAVAHI], [avahi-core avahi-client])

avahi-core 其实并不需要,上面的头文件目录可以佐证,

avahi 编译出来的大概率是c库,而不是c++库,然后当前subsystem最终在链接的时候找到的是core的库函数,导致出问题(记得以前《C专家编程》一书中有过interposition描述),抛开先前看过的全局变量不小心在不同组件中有相同的导致串用,对于串用同名不同参的函数,职业生涯里来算是首次碰到。

额外一些背景知识

PKG_CONFIG_PATH=':/opt/xyz/lib64/pkgconfig'

在这个下面查找对应的avahi-client.pc

[i9527@897a602fb1f3:/opt/xyz/lib64/pkgconfig]

$ cat avahi-client.pc

prefix=/usr

exec_prefix=${prefix}

libdir=/usr/lib64

includedir=${prefix}/include

Name: avahi-client

Description: Avahi Multicast DNS Responder (Client Support)

Version: 0.7

Libs: -L${libdir} -lavahi-common -lavahi-client

Cflags: -D_REENTRANT -I${includedir}

上面两行高亮就是所谓

 -L(lib 库)  包含/usr/lib64 目录下avahi-common avahi-client 两个库

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值