ARM平台偶现4026问题分析

问题描述

pl_dbt2_mysql.test这个case在arm平台偶现4026错误,但是在x86平台没有此问题。报错的语句是

call order_status(0, 1, 5, 'ESEBARABLE');

分析case发现是在customer表里根本不存在'ESEBARABLE',也就是说这个case里的select是一定会抛出一个4026异常的,但是在order_status里定义了一个捕捉4026错误的Exception,所以正常情况下,这个异常会被捕获,所以不会报4026错误。arm平台报4026错误显然是这个错误没有被捕获,也就是说arm平台的exception机制有问题。

这个错误不是必现的,跑10次大概会有6次出现。每次一旦出现之后,再执行call order_status(0, 1, 5, 'ESEBARABLE');这条语句就稳定出现;而每次一旦成功之后,再执行call order_status(0, 1, 5, 'ESEBARABLE');这条语句就稳定成功,除非reboot。

初步分析

  • ACTION:

跟一下personality函数,看看这个错误码为什么在ObPLEH::match_action_value里没有找到匹配的condition来处理。

  • 结果:

在ObPLEH::eh_personality打了断点,根本断不住。这说明压根没进personality。

  • 结论:

_Unwind_RaiseException在进入personality之前就已经出错了。需要看看_Unwind_RaiseException是怎么出错的。

  • ACTION:

断在_Unwind_RaiseException里看看什么地方出错了。

  • 结果:

断住之后看不到符号表,_Unwind_RaiseException是一个不带符号表的release版本。通过分析_Unwind_RaiseException源码发现在进入personality之前有两处可能提前退出的代码。

101       if (code == _URC_END_OF_STACK)
102         /* Hit end of stack with no handler found.  */
103         return _URC_END_OF_STACK;
104
105       if (code != _URC_NO_REASON)
106         /* Some error encountered.  Usually the unwinder doesn't
107            diagnose these and merely crashes.  */
108         return _URC_FATAL_PHASE1_ERROR;

通过在LLVM里加日志打印出调用_Unwind_RaiseException之后的返回值,发现是_URC_END_OF_STACK。但是具体原因不明。

  • 结论:

需要搞一个debug版本的_Unwind_RaiseException。

_Unwind_RaiseException在哪儿?

nm一下observer发现_Unwind_RaiseException在observer的符号表里是T状态,说明_Unwind_RaiseException的定义和实现包含在observer里。

$nm observer | grep _Unwind_RaiseException
00000000129ce940 t _Unwind_RaiseException
00000000129ce340 t _Unwind_RaiseException_Phase2

是谁把_Unwind_RaiseException带进observer的?

_Unwind_RaiseException被编进了observer里,说明是通过静态链接方式带进来的。到rpm的.dep_create下找到所有的.a文件,nm之后发现_Unwind_RaiseException的实现是在libgcc_eh.a里。

$find -name "*.a" | xargs nm --print-file-name | grep _Unwind_RaiseException
./var/usr/local/gcc-5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/libgcc_eh.a:unwind-dw2.o:00000000000025b0 T _Unwind_RaiseException
./var/usr/local/gcc-5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/libgcc_eh.a:unwind-dw2.o:0000000000001fb0 t _Unwind_RaiseException_Phase2

删掉observer后执行make VERBOSE=1,可以看到observer的链接命令,所有链接的静态文件里没有libgcc_eh.a。

[100%] Linking CXX executable observer
cd /data/2/ryan.ly/oceanbase/build_debug/src/observer && /data/2/ryan.ly/oceanbase/rpm/.dep_create/var/usr/bin/cmake -E cmake_link_script CMakeFiles/observer.dir/link.txt --verbose=1
/data/2/ryan.ly/oceanbase/rpm/.dep_create/var/usr/bin/clang++  --gcc-toolchain=/data/2/ryan.ly/oceanbase/rpm/.dep_create/var/usr/local/gcc-5.2.0 -fdebug-prefix-map=/data/2/ryan.ly/oceanbase=. -fcolor-diagnostics -g -fuse-ld=/data/2/ryan.ly/oceanbase/rpm/.dep_create/var/usr/bin/ld.lld --gcc-toolchain=/data/2/ryan.ly/oceanbase/rpm/.dep_create/var/usr/local/gcc-5.2.0 -fdebug-prefix-map=/data/2/ryan.ly/oceanbase=. -Wl,-z,noexecstack CMakeFiles/observer.dir/main.cpp.o  -o observer  -Wl,--start-group liboceanbase_static.a ../sql/libob_sql_static.a -Wl,--end-group -static-libgcc -static-libstdc++ ../sql/parser/libob_sql_server_parser_static.a ../../../rpm/.dep_create/lib/libbfd.a ../../../rpm/.dep_create/lib/libiberty.a ../objit/src/libobjit.a ../../deps/oblib/src/liboblib.a ../../../rpm/.dep_create/var/u01/mongodb_aliws/lib/libAliWS.a ../../../rpm/.dep_create/lib/liboss_c_sdk_static.a ../../../rpm/.dep_create/lib/libaprutil-1.a ../../../rpm/.dep_create/lib/libapr-1.a ../../../rpm/.dep_create/lib/libmxml.a ../../../rpm/.dep_create/var/usr/local/lib/libvsclient.a ../../../rpm/.dep_create/var/usr/lib/libeasy.a ../../../rpm/.dep_create/var/usr/lib64/libisal.a ../../../rpm/.dep_create/var/u01/mysql_current/lib/libmysqlclient.a -L/data/2/ryan.ly/oceanbase/rpm/.dep_create/var/usr/lib64/ -L/data/2/ryan.ly/oceanbase/rpm/.dep_create/var/usr/lib -lexpat -laio -lcurl -lssl -lcrypto -l:libboost_system.a -l:libboost_thread.a -latomic ../../../rpm/.dep_create/var/usr/lib/libLLVMIRReader.a ../../../rpm/.dep_create/var/usr/lib/libLLVMAsmParser.a ../../../rpm/.dep_create/var/usr/lib/libLLVMOrcJIT.a ../../../rpm/.dep_create/var/usr/lib/libLLVMMCJIT.a ../../../rpm/.dep_create/var/usr/lib/libLLVMExecutionEngine.a ../../../rpm/.dep_create/var/usr/lib/libLLVMAArch64CodeGen.a ../../../rpm/.dep_create/var/usr/lib/libLLVMAsmPrinter.a ../../../rpm/.dep_create/var/usr/lib/libLLVMGlobalISel.a ../../../rpm/.dep_create/var/usr/lib/libLLVMSelectionDAG.a ../../../rpm/.dep_create/var/usr/lib/libLLVMCodeGen.a ../../../rpm/.dep_create/var/usr/lib/libLLVMScalarOpts.a ../../../rpm/.dep_create/var/usr/lib/libLLVMInstCombine.a ../../../rpm/.dep_create/var/usr/lib/libLLVMTransformUtils.a ../../../rpm/.dep_create/var/usr/lib/libLLVMAArch64AsmParser.a ../../../rpm/.dep_create/var/usr/lib/libLLVMAArch64Desc.a ../../../rpm/.dep_create/var/usr/lib/libLLVMAArch64AsmPrinter.a ../../../rpm/.dep_create/var/usr/lib/libLLVMAArch64Info.a ../../../rpm/.dep_create/var/usr/lib/libLLVMAArch64Utils.a ../../../rpm/.dep_create/var/usr/lib/libLLVMRuntimeDyld.a ../../../rpm/.dep_create/var/usr/lib/libLLVMBitWriter.a ../../../rpm/.dep_create/var/usr/lib/libLLVMObjectYAML.a ../../../rpm/.dep_create/var/usr/lib/libLLVMDebugInfoCodeView.a ../../../rpm/.dep_create/var/usr/lib/libLLVMDebugInfoMSF.a ../../../rpm/.dep_create/var/usr/lib/libLLVMTarget.a ../../../rpm/.dep_create/var/usr/lib/libLLVMAnalysis.a ../../../rpm/.dep_create/var/usr/lib/libLLVMObject.a ../../../rpm/.dep_create/var/usr/lib/libLLVMBitReader.a ../../../rpm/.dep_create/var/usr/lib/libLLVMMCParser.a ../../../rpm/.dep_create/var/usr/lib/libLLVMMC.a ../../../rpm/.dep_create/var/usr/lib/libLLVMProfileData.a ../../../rpm/.dep_create/var/usr/lib/libLLVMCore.a ../../../rpm/.dep_create/var/usr/lib/libLLVMBinaryFormat.a ../../../rpm/.dep_create/var/usr/lib/libLLVMSupport.a -lz -lrt -ldl -ltinfo -lm ../../../rpm/.dep_create/var/usr/lib/libLLVMDemangle.a -lrt -ldl -ltinfo -lpthread -lz -lm

猜测:

-static-libgcc是会把libgcc和libgcc_eh两个静态文件同步链接进来的。

验证:

把-static-libgcc改成-shared-libgcc,从新编译observer。nm observer发现_Unwind_RaiseException变成了U状态,说明只有声明而没有实现。证明了两点:

  • observer依赖的.so里的符号在nm的时候是不会显示为T的
  • -static-libgcc是同时把libgcc和libgcc_eh两个静态文件带进来的
$nm observer | grep _Unwind_RaiseException
                 U _Unwind_RaiseException

编译DEBUG版本的LIBGCC

详见: https://yuque.antfin-inc.com/ob/plsql/dfgbrx

分析_Unwind_RaiseException

_Unwind_RaiseException实现在libgcc/unwind.inc里第82行。

 82 _Unwind_RaiseException(struct _Unwind_Exception *exc)
 83 {
 84   struct _Unwind_Context this_context, cur_context;
 85   _Unwind_Reason_Code code;
 86
 87   /* Set up this_context to describe the current stack frame.  */
 88   uw_init_context (&this_context);
 89   cur_context = this_context;
 90
 91   /* Phase 1: Search.  Unwind the stack, calling the personality routine
 92      with the _UA_SEARCH_PHASE flag set.  Do not modify the stack yet.  */
 93   while (1)
 94     {
 95       _Unwind_FrameState fs;
 96
 97       /* Set up fs to describe the FDE for the caller of cur_context.  The
 98          first time through the loop, that means __cxa_throw.  */
 99       code = uw_frame_state_for (&cur_context, &fs);
100
101       if (code == _URC_END_OF_STACK)
102         /* Hit end of stack with no handler found.  */
103         return _URC_END_OF_STACK;
104
105       if (code != _URC_NO_REASON)
106         /* Some error encountered.  Usually the unwinder doesn't
107            diagnose these and merely crashes.  */
108         return _URC_FATAL_PHASE1_ERROR;
109
110       /* Unwind successful.  Run the personality routine, if any.  */
111       if (fs.personality)
112         {
113           code = (*fs.personality) (1, _UA_SEARCH_PHASE, exc->exception_class,
114                                     exc, &cur_context);
115           if (code == _URC_HANDLER_FOUND)
116             break;
117           else if (code != _URC_CONTINUE_UNWIND)
118             return _URC_FATAL_PHASE1_ERROR;
119         }
120
121       /* Update cur_context to describe the same frame as fs.  */
122       uw_update_context (&cur_context, &fs);
123     }
124
125   /* Indicate to _Unwind_Resume and associated subroutines that this
126      is not a forced unwind.  Further, note where we found a handler.  */
127   exc->private_1 = 0;
128   exc->private_2 = uw_identify_context (&cur_context);
129
130   cur_context = this_context;
131   code = _Unwind_RaiseException_Phase2 (exc, &cur_context);
132   if (code != _URC_INSTALL_CONTEXT)
133     return code;
134
135   uw_install_context (&this_context, &cur_context);
136 }

通过单步n发现是在第103行return,和之前的判断一样。下面需要分析uw_frame_state_for返回的code为什么不符合预期。

分析uw_frame_state_for

代码实现在libgcc/unwind-dw2.c。

通过单步n发现错误发生在1249行返回的fde为NULL。

1235 static _Unwind_Reason_Code
1236 uw_frame_state_for (struct _Unwind_Context *context, _Unwind_FrameState *fs)
1237 {
1238   const struct dwarf_fde *fde;
1239   const struct dwarf_cie *cie;
1240   const unsigned char *aug, *insn, *end;
1241
1242   memset (fs, 0, sizeof (*fs));
1243   context->args_size = 0;
1244   context->lsda = 0;
1245
1246   if (context->ra == 0)
1247     return _URC_END_OF_STACK;
1248
1249   fde = _Unwind_Find_FDE (context->ra + _Unwind_IsSignalFrame (context) - 1,
1250                           &context->bases);
1251   if (fde == NULL)
1252     {
1253 #ifdef MD_FALLBACK_FRAME_STATE_FOR
1254       /* Couldn't find frame unwind info for this function.  Try a
1255          target-specific fallback mechanism.  This will necessarily
1256          not provide a personality routine or LSDA.  */
1257       return MD_FALLBACK_FRAME_STATE_FOR (context, fs);
1258 #else
1259       return _URC_END_OF_STACK;
1260 #endif
1261     }
1262
1263   fs->pc = context->bases.func;
1264
1265   cie = get_cie (fde);
1266   insn = extract_cie_info (cie, context, fs);
1267   if (insn == NULL)
1268     /* CIE contained unknown augmentation.  */
1269     return _URC_FATAL_PHASE1_ERROR;
1270
1271   /* First decode all the insns in the CIE.  */
1272   end = (const unsigned char *) next_fde ((const struct dwarf_fde *) cie);
1273   execute_cfa_program (insn, end, context, fs);
1274
1275   /* Locate augmentation for the fde.  */
1276   aug = (const unsigned char *) fde + sizeof (*fde);
1277   aug += 2 * size_of_encoded_value (fs->fde_encoding);
1278   insn = NULL;
1279   if (fs->saw_z)
1280     {
1281       _uleb128_t i;
1282       aug = read_uleb128 (aug, &i);
1283       insn = aug + i;
1284     }
1285   if (fs->lsda_encoding != DW_EH_PE_omit)
1286     {
1287       _Unwind_Ptr lsda;
1288
1289       aug = read_encoded_value (context, fs->lsda_encoding, aug, &lsda);
1290       context->lsda = (void *) lsda;
1291     }
1292
1293   /* Then the insns in the FDE up to our target PC.  */
1294   if (insn == NULL)
1295     insn = aug;
1296   end = (const unsigned char *) next_fde (fde);
1297   execute_cfa_program (insn, end, context, fs);
1298
1299   return _URC_NO_REASON;
1300 }

分析_Unwind_Find_FDE

实现在libgcc/unwind-dw2-fde-dip.c第452行,从代码实现看,主体是调用了_Unwind_Find_registered_FDE。

452 const fde *
453 _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
454 {
455   struct unw_eh_callback_data data;
456   const fde *ret;
457
458   ret = _Unwind_Find_registered_FDE (pc, bases);
459   if (ret != NULL)
460     return ret;
461
462   data.pc = (_Unwind_Ptr) pc;
463   data.tbase = NULL;
464   data.dbase = NULL;
465   data.func = NULL;
466   data.ret = NULL;
467   data.check_cache = 1;
468
469   if (dl_iterate_phdr (_Unwind_IteratePhdrCallback, &data) < 0)
470     return NULL;
471
472   if (data.ret)
473     {
474       bases->tbase = data.tbase;
475       bases->dbase = data.dbase;
476       bases->func = data.func;
477     }
478   return data.ret;
479 }

分析_Unwind_Find_registered_FDE

其定义是在unwind-dw2-fde.c的1026行。其定义的名字是_Unwind_Find_FDE,这是因为在./unwind-dw2-fde-dip.c定义了宏。

libgcc/unwind-dw2-fde-dip.c

95 #define _Unwind_Find_FDE _Unwind_Find_registered_FDE
96 #include "unwind-dw2-fde.c"
97 #undef _Unwind_Find_FDE
1026 const fde *
1027 _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
1028 {
1029   struct object *ob;
1030   const fde *f = NULL;
1031
1032 #ifdef ATOMIC_FDE_FAST_PATH
1033   /* For targets where unwind info is usually not registered through these
1034      APIs anymore, avoid taking a global lock.
1035      Use relaxed MO here, it is up to the app to ensure that the library
1036      loading/initialization happens-before using that library in other
1037      threads (in particular unwinding with that library's functions
1038      appearing in the backtraces).  Calling that library's functions
1039      without waiting for the library to initialize would be racy.  */
1040   if (__builtin_expect (!__atomic_load_n (&any_objects_registered,
1041                                           __ATOMIC_RELAXED), 1))
1042     return NULL;
1043 #endif
1044
1045   init_object_mutex_once ();
1046   __gthread_mutex_lock (&object_mutex);
1047
1048   /* Linear search through the classified objects, to find the one
1049      containing the pc.  Note that pc_begin is sorted descending, and
1050      we expect objects to be non-overlapping.  */
1051   for (ob = seen_objects; ob; ob = ob->next)
1052     if (pc >= ob->pc_begin)
1053       {
1054         f = search_object (ob, pc);
1055         if (f)
1056           goto fini;
1057         break;
1058       }
1059
1060   /* Classify and search the objects we've not yet processed.  */
1061   while ((ob = unseen_objects))
1062     {
1063       struct object **p;
1064
1065       unseen_objects = ob->next;
1066       f = search_object (ob, pc);
1067
1068       /* Insert the object into the classified list.  */
1069       for (p = &seen_objects; *p ; p = &(*p)->next)
1070         if ((*p)->pc_begin < ob->pc_begin)
1071           break;
1072       ob->next = *p;
1073       *p = ob;
1074
1075       if (f)
1076         goto fini;
1077     }
1078
1079  fini:
1080   __gthread_mutex_unlock (&object_mutex);
1081
1082   if (f)
1083     {
1084       int encoding;
1085       _Unwind_Ptr func;
1086
1087       bases->tbase = ob->tbase;
1088       bases->dbase = ob->dbase;
1089
1090       encoding = ob->s.b.encoding;
1091       if (ob->s.b.mixed_encoding)
1092         encoding = get_fde_encoding (f);
1093       read_encoded_value_with_base (encoding, base_from_object (encoding, ob),
1094                                     f->pc_begin, &func);
1095       bases->func = (void *) func;
1096     }
1097
1098   return f;
1099 }

seen_objects是两项,而unseen_objects是NULL。seen_objects是一个链表,通过next指针连接。

(gdb) p seen_objects
$2 = (struct object *) 0x7f0695cc9760
(gdb) p *seen_objects
$3 = {pc_begin = 0x7f059a032000, tbase = 0x0, dbase = 0x0, u = {single = 0x7f0695f81de0, array = 0x7f0695f81de0, sort = 0x7f0695f81de0}, s = {b = {sorted = 1, from_array = 0, mixed_encoding = 0, encoding = 28, count = 1}, i = 2273}, next = 0x7f0695527320}
(gdb) p *seen_objects->next
$4 = {pc_begin = 0x7f04175fb000, tbase = 0x0, dbase = 0x0, u = {single = 0x7f0695489110, array = 0x7f0695489110, sort = 0x7f0695489110}, s = {b = {sorted = 1, from_array = 0, mixed_encoding = 0, encoding = 28, count = 1}, i = 2273}, next = 0x0}

单步n发现,arm平台上执行失败的时候,在1051这个for循环里,没有能满足pc >= ob->pc_begin这个条件的。而在X86平台上是能够满足pc >= ob->pc_begin条件从而进入search_object的。

我们知道pc是指当前指令,猜测pc_begin是某一层frame的首指令地址,那么猜测seen_objects应该是抛出异常时候的调用堆栈。但是这里只有两层frame有点不太好理解。

结论:

到这里我们得到一个初步结论,在通过pc寻找frame的时候arm平台出错了,两种可能性:

  • pc错了
  • seen_objects里的pc_begin错了

首先怀疑是PC错了

从uw_frame_state_for对_Unwind_Find_FDE的调用我们发现,_Unwind_Find_FDE的第一个入参pc是通过context->ra + _Unwind_IsSignalFrame (context) - 1计算出来的。无论X86还是ARM上

p _Unwind_IsSignalFrame (context)的结果都是0,所以pc的值就是context->ra - 1。

ra是怎么计算出来的

在libgcc下grep,使用了ra的地方不多,都在unwind-dw2.c里。

$find -name "*.c" | xargs grep "\->ra"
./unwind-dw2.c:  return (_Unwind_Ptr) context->ra;
./unwind-dw2.c:  return (_Unwind_Ptr) context->ra;
./unwind-dw2.c:  context->ra = (void *) val;
./unwind-dw2.c:	 && fs->pc < context->ra + _Unwind_IsSignalFrame (context))
./unwind-dw2.c:  if (context->ra == 0)
./unwind-dw2.c:  fde = _Unwind_Find_FDE (context->ra + _Unwind_IsSignalFrame (context) - 1,
./unwind-dw2.c:    /* uw_frame_state_for uses context->ra == 0 check to find outermost
./unwind-dw2.c:    context->ra = 0;
./unwind-dw2.c:    context->ra = __builtin_extract_return_addr
./unwind-dw2.c:  context->ra = ra;
./unwind-dw2.c:  context->ra = __builtin_extract_return_addr (outer_ra);
./unwind-dw2.c:      void *handler = __builtin_frob_return_addr ((TARGET)->ra);	\

其中赋值的地方有4处,分别在uw_update_context的1514行和1518行,以及uw_init_context_1的1558行和1588行。

1499 static void
1500 uw_update_context (struct _Unwind_Context *context, _Unwind_FrameState *fs)
1501 {
1502   uw_update_context_1 (context, fs);
1503
1504   /* In general this unwinder doesn't make any distinction between
1505      undefined and same_value rule.  Call-saved registers are assumed
1506      to have same_value rule by default and explicit undefined
1507      rule is handled like same_value.  The only exception is
1508      DW_CFA_undefined on retaddr_column which is supposed to
1509      mark outermost frame in DWARF 3.  */
1510   if (fs->regs.reg[DWARF_REG_TO_UNWIND_COLUMN (fs->retaddr_column)].how
1511       == REG_UNDEFINED)
1512     /* uw_frame_state_for uses context->ra == 0 check to find outermost
1513        stack frame.  */
1514     context->ra = 0;
1515   else
1516     /* Compute the return address now, since the return address column
1517        can change from frame to frame.  */
1518     context->ra = __builtin_extract_return_addr
1519       (_Unwind_GetPtr (context, fs->retaddr_column));
1520 }
1548 static void __attribute__((noinline))
1549 uw_init_context_1 (struct _Unwind_Context *context,
1550                    void *outer_cfa, void *outer_ra)
1551 {
1552   void *ra = __builtin_extract_return_addr (__builtin_return_address (0));
1553   _Unwind_FrameState fs;
1554   _Unwind_SpTmp sp_slot;
1555   _Unwind_Reason_Code code;
1556
1557   memset (context, 0, sizeof (struct _Unwind_Context));
1558   context->ra = ra;
1559   if (!ASSUME_EXTENDED_UNWIND_CONTEXT)
1560     context->flags = EXTENDED_CONTEXT_BIT;
1561
1562   code = uw_frame_state_for (context, &fs);
1563   gcc_assert (code == _URC_NO_REASON);
1564
1565 #if __GTHREADS
1566   {
1567     static __gthread_once_t once_regsizes = __GTHREAD_ONCE_INIT;
1568     if (__gthread_once (&once_regsizes, init_dwarf_reg_size_table) != 0
1569         && dwarf_reg_size_table[0] == 0)
1570       init_dwarf_reg_size_table ();
1571   }
1572 #else
1573   if (dwarf_reg_size_table[0] == 0)
1574     init_dwarf_reg_size_table ();
1575 #endif
1576
1577   /* Force the frame state to use the known cfa value.  */
1578   _Unwind_SetSpColumn (context, outer_cfa, &sp_slot);
1579   fs.regs.cfa_how = CFA_REG_OFFSET;
1580   fs.regs.cfa_reg = __builtin_dwarf_sp_column ();
1581   fs.regs.cfa_offset = 0;
1582
1583   uw_update_context_1 (context, &fs);
1584
1585   /* If the return address column was saved in a register in the
1586      initialization context, then we can't see it in the given
1587      call frame data.  So have the initialization context tell us.  */
1588   context->ra = __builtin_extract_return_addr (outer_ra);
1589 }

分别断在uw_update_context和uw_init_context_1,运行测试程序,立刻断在了uw_init_context_1。结合分析_Unwind_RaiseException,发现在第88行调用的就是uw_init_context_1,所以可以断定context->ra就是在uw_init_context_1里赋值的。

(gdb) bt
#0  uw_init_context_1 (context=0x7f01afdb7f10, outer_cfa=0x7f01afdb8070, outer_ra=0x7f01a8ad682b) at ../.././libgcc/unwind-dw2.c:1565
#1  0x000000000d1b51a1 in _Unwind_RaiseException (exc=0x7f01d09b1a20) at ../.././libgcc/unwind.inc:88

google搜索__builtin_extract_return_addr,找到了这样一个网页: Jiong Wang - [5/5][AArch64, libgcc] Runtime support for AArch64 DWARF operations 看起来从某个版本开始libgcc增加了对AArch64的支持。

寻找高版本libgcc

目前用的是5.2.0,分别下载了5.5、6.5、7.5、8.4、9.3等每个大版本序列的最高版本,

grep __builtin_aarch64_xpaclri,发现7.5的代码中有该定义,进一步发现,uw_init_context_1的实现里增加了MD_POST_EXTRACT_ROOT_ADDR,会在AArch64的情况下对计算出的ra进行进一步修正。似乎已经接近真相了,大概率怀疑是gcc5.2的bug,导致计算pc出错了。

1561 static void __attribute__((noinline))
1562 uw_init_context_1 (struct _Unwind_Context *context,
1563                    void *outer_cfa, void *outer_ra)
1564 {
1565   void *ra = __builtin_extract_return_addr (__builtin_return_address (0));
1566 #ifdef MD_POST_EXTRACT_ROOT_ADDR
1567   ra = MD_POST_EXTRACT_ROOT_ADDR (ra);
1568 #endif
1569   _Unwind_FrameState fs;
1570   _Unwind_SpTmp sp_slot;
1571   _Unwind_Reason_Code code;
1572
1573   memset (context, 0, sizeof (struct _Unwind_Context));
1574   context->ra = ra;
1575   if (!ASSUME_EXTENDED_UNWIND_CONTEXT)
1576     context->flags = EXTENDED_CONTEXT_BIT;
1577
1578   code = uw_frame_state_for (context, &fs);
1579   gcc_assert (code == _URC_NO_REASON);
1580
1581 #if __GTHREADS
1582   {
1583     static __gthread_once_t once_regsizes = __GTHREAD_ONCE_INIT;
1584     if (__gthread_once (&once_regsizes, init_dwarf_reg_size_table) != 0
1585         && dwarf_reg_size_table[0] == 0)
1586       init_dwarf_reg_size_table ();
1587   }
1588 #else
1589   if (dwarf_reg_size_table[0] == 0)
1590     init_dwarf_reg_size_table ();
1591 #endif
1592
1593   /* Force the frame state to use the known cfa value.  */
1594   _Unwind_SetSpColumn (context, outer_cfa, &sp_slot);
1595   fs.regs.cfa_how = CFA_REG_OFFSET;
1596   fs.regs.cfa_reg = __builtin_dwarf_sp_column ();
1597   fs.regs.cfa_offset = 0;
1598
1599   uw_update_context_1 (context, &fs);
1600
1601   /* If the return address column was saved in a register in the
1602      initialization context, then we can't see it in the given
1603      call frame data.  So have the initialization context tell us.  */
1604   context->ra = __builtin_extract_return_addr (outer_ra);
1605 #ifdef MD_POST_EXTRACT_ROOT_ADDR
1606   context->ra = MD_POST_EXTRACT_ROOT_ADDR (context->ra);
1607 #endif
1608 }

果然是libgcc的bug?

验证这个问题,需要编译libgcc7.5。还是按照之前的编译方法,首先在x86上验证,仍然一切正常。在arm上编译之后验证case仍然出错。

回到打断点的模式,断在uw_init_context_1单步走,看第1567行和1606行修正之后的ra是不是正确的。但是发现这两行代码根本走不到。显然是MD_POST_EXTRACT_ROOT_ADDR宏没打开的缘故,进一步查资料发现需要在编译libgcc的时候指定target。

--target=aarch64-linux

重新编译打断点,能够正确执行1567行和1606行代码,但是发现ra经过这两行之后并没有任何变化,也就是说AARCH64并没有对ra做出修正。

考虑继续换更新版本?考虑pc是正确的,而是seen_objects里的pc_begin错了?

回过头来考虑是seen_objects里的pc_begin错了

我们猜测pc_begin应该是一层frame里的首指令位置。在X86上实验一下,首先断在pl里看看编译出来的llvm代码的函数指针位置:

Thread 496 "TNT1002" hit Breakpoint 2, oceanbase::pl::ObPLExecState::execute (this=0x7f01afa34a18) at ./src/pl/ob_pl.cpp:2102
2102	./src/pl/ob_pl.cpp: No such file or directory.
(gdb) p fp
$2 = (int (*)(oceanbase::pl::ObPLExecCtx *, int64_t, int64_t *)) 0x7f032da96000

然后看看seen_objects里的pc_begin:

Thread 496 "TNT1002" hit Breakpoint 3, _Unwind_Find_registered_FDE (pc=0x7f032da9d82a, bases=0x7f01afa2fec8) at ../.././libgcc/unwind-dw2-fde.c:1051
1051	in ../.././libgcc/unwind-dw2-fde.c
(gdb) p *seen_objects->next
$6 = {pc_begin = 0x7f01a8acf000, tbase = 0x0, dbase = 0x0, u = {single = 0x7f04296b42e0, array = 0x7f04296b42e0, sort = 0x7f04296b42e0}, s = {b = {sorted = 1, from_array = 0, mixed_encoding = 0, encoding = 28, count = 1}, i = 2273}, next = 0x0}
(gdb) p *seen_objects
$7 = {pc_begin = 0x7f032da96000, tbase = 0x0, dbase = 0x0, u = {single = 0x54e6c360, array = 0x54e6c360, sort = 0x54e6c360}, s = {b = {sorted = 1, from_array = 0, mixed_encoding = 0, encoding = 28, count = 1}, i = 2273}, next = 0x7f042982a1b0}
(gdb) p pc
$8 = (void *) 0x7f032da9d82a
(gdb) p pc-seen_objects->pc_begin
$9 = 30762

可见seen_objects里的其中一个pc_begin果然是和我们编译出来的LLVM函数指针位置相同,而pc是一个比该pc_begin略大的地址。说明pc确实是我们编出来的LLVM函数里调用_Unwind_RaiseException那条指令。

而在arm平台,seen_objects里的pc_begin没有一个是跟LLVM函数指针相同。

Breakpoint 1, oceanbase::pl::ObPLExecState::execute (this=0xfffd205e2760) at ./src/pl/ob_pl.cpp:1878
1878	./src/pl/ob_pl.cpp: No such file or directory.
(gdb) p fp
$5 = (int (*)(oceanbase::pl::ObPLExecCtx *, int64_t, int64_t *)) 0xfffe96a07000
Breakpoint 2, _Unwind_Find_registered_FDE (pc=0xfffe96a14187, bases=0xfffd205ddc60) at ../.././libgcc/unwind-dw2-fde.c:1051
1051	  for (ob = seen_objects; ob; ob = ob->next)
(gdb) p *seen_objects
$9 = {pc_begin = 0xffff1beb9000, tbase = 0x0, dbase = 0x0, u = {single = 0xffff90291520, array = 0xffff90291520, sort = 0xffff90291520}, s = {b = {sorted = 1, from_array = 0, mixed_encoding = 0, encoding = 27, count = 1}, i = 2265}, next = 0xffff90d22540}
(gdb) p *seen_objects->next
$10 = {pc_begin = 0xffff96a07000, tbase = 0x0, dbase = 0x0, u = {single = 0xffff908c8d00, array = 0xffff908c8d00, sort = 0xffff908c8d00}, s = {b = {sorted = 1, from_array = 0, mixed_encoding = 0, encoding = 27, count = 1}, i = 2265}, next = 0x0}

分析pc_begin和fp的值我们发现,在arm平台pc_begin的高16位的最后一位比fp大1,多次验证都是如此。至此我们断定是seen_objects里的pc_begin错了。

pc_begin是怎么计算出来的

grep pc_begin,发现pc_begin的赋值都在unwind-dw2-fde.c的classify_object_over_fdes里。

$find -name "*.c" | xargs grep "pc_begin"
./unwind-dw2-fde-dip.c:					&f->pc_begin[f_enc_size], &range);
./unwind-dw2-fde-dip.c:  ob.pc_begin = NULL;
./unwind-dw2-fde-dip.c:				    data->ret->pc_begin, &func);
./unwind-dw2-fde.c:   its pc_begin and count fields initialized at minimum, and is sorted
./unwind-dw2-fde.c:   by decreasing value of pc_begin.  */
./unwind-dw2-fde.c:  ob->pc_begin = (void *)-1;
./unwind-dw2-fde.c:  ob->pc_begin = (void *)-1;
./unwind-dw2-fde.c:  memcpy (&x_ptr, x->pc_begin, sizeof (_Unwind_Ptr));
./unwind-dw2-fde.c:  memcpy (&y_ptr, y->pc_begin, sizeof (_Unwind_Ptr));
./unwind-dw2-fde.c:  read_encoded_value_with_base (ob->s.b.encoding, base, x->pc_begin, &x_ptr);
./unwind-dw2-fde.c:  read_encoded_value_with_base (ob->s.b.encoding, base, y->pc_begin, &y_ptr);
./unwind-dw2-fde.c:				x->pc_begin, &x_ptr);
./unwind-dw2-fde.c:				y->pc_begin, &y_ptr);
./unwind-dw2-fde.c:/* Update encoding, mixed_encoding, and pc_begin for OB for the
./unwind-dw2-fde.c:      _Unwind_Ptr mask, pc_begin;
./unwind-dw2-fde.c:      read_encoded_value_with_base (encoding, base, this_fde->pc_begin,
./unwind-dw2-fde.c:				    &pc_begin);
./unwind-dw2-fde.c:      if ((pc_begin & mask) == 0)
./unwind-dw2-fde.c:      if ((void *) pc_begin < ob->pc_begin)
./unwind-dw2-fde.c:	ob->pc_begin = (void *) pc_begin;
./unwind-dw2-fde.c:	  memcpy (&ptr, this_fde->pc_begin, sizeof (_Unwind_Ptr));
./unwind-dw2-fde.c:	  _Unwind_Ptr pc_begin, mask;
./unwind-dw2-fde.c:	  read_encoded_value_with_base (encoding, base, this_fde->pc_begin,
./unwind-dw2-fde.c:					&pc_begin);
./unwind-dw2-fde.c:	  if ((pc_begin & mask) == 0)
./unwind-dw2-fde.c:      _Unwind_Ptr pc_begin, pc_range;
./unwind-dw2-fde.c:	  const _Unwind_Ptr *pc_array = (const _Unwind_Ptr *) this_fde->pc_begin;
./unwind-dw2-fde.c:	  pc_begin = pc_array[0];
./unwind-dw2-fde.c:	  if (pc_begin == 0)
./unwind-dw2-fde.c:					    this_fde->pc_begin, &pc_begin);
./unwind-dw2-fde.c:	  if ((pc_begin & mask) == 0)
./unwind-dw2-fde.c:      if ((_Unwind_Ptr) pc - pc_begin < pc_range)
./unwind-dw2-fde.c:      void *pc_begin;
./unwind-dw2-fde.c:      memcpy (&pc_begin, (const void * const *) f->pc_begin, sizeof (void *));
./unwind-dw2-fde.c:      memcpy (&pc_range, (const uaddr *) f->pc_begin + 1, sizeof (uaddr));
./unwind-dw2-fde.c:      if (pc < pc_begin)
./unwind-dw2-fde.c:      else if (pc >= pc_begin + pc_range)
./unwind-dw2-fde.c:      _Unwind_Ptr pc_begin, pc_range;
./unwind-dw2-fde.c:      p = read_encoded_value_with_base (encoding, base, f->pc_begin,
./unwind-dw2-fde.c:					&pc_begin);
./unwind-dw2-fde.c:      if ((_Unwind_Ptr) pc < pc_begin)
./unwind-dw2-fde.c:      else if ((_Unwind_Ptr) pc >= pc_begin + pc_range)
./unwind-dw2-fde.c:      _Unwind_Ptr pc_begin, pc_range;
./unwind-dw2-fde.c:					f->pc_begin, &pc_begin);
./unwind-dw2-fde.c:      if ((_Unwind_Ptr) pc < pc_begin)
./unwind-dw2-fde.c:      else if ((_Unwind_Ptr) pc >= pc_begin + pc_range)
./unwind-dw2-fde.c:      if (pc < ob->pc_begin)
./unwind-dw2-fde.c:     containing the pc.  Note that pc_begin is sorted descending, and
./unwind-dw2-fde.c:    if (pc >= ob->pc_begin)
./unwind-dw2-fde.c:	if ((*p)->pc_begin < ob->pc_begin)
./unwind-dw2-fde.c:				    f->pc_begin, &func);
./config/unwind-dw2-fde-darwin.c:	    ob->pc_begin = (void *)-1;
./config/unwind-dw2-fde-darwin.c:		  if ((*p)->pc_begin < ob->pc_begin)
./config/unwind-dw2-fde-darwin.c:					      result->pc_begin, &func);

分析classify_object_over_fdes

分析classify_object_over_fdes的代码,发现pc_begin是通过read_encoded_value_with_base计算出来进而赋值的。

 633 static size_t
 634 classify_object_over_fdes (struct object *ob, const fde *this_fde)
 635 {
 636   const struct dwarf_cie *last_cie = 0;
 637   size_t count = 0;
 638   int encoding = DW_EH_PE_absptr;
 639   _Unwind_Ptr base = 0;
 640
 641   for (; ! last_fde (ob, this_fde); this_fde = next_fde (this_fde))
 642     {
 643       const struct dwarf_cie *this_cie;
 644       _Unwind_Ptr mask, pc_begin;
 645
 646       /* Skip CIEs.  */
 647       if (this_fde->CIE_delta == 0)
 648         continue;
 649
 650       /* Determine the encoding for this FDE.  Note mixed encoded
 651          objects for later.  */
 652       this_cie = get_cie (this_fde);
 653       if (this_cie != last_cie)
 654         {
 655           last_cie = this_cie;
 656           encoding = get_cie_encoding (this_cie);
 657           if (encoding == DW_EH_PE_omit)
 658             return -1;
 659           base = base_from_object (encoding, ob);
 660           if (ob->s.b.encoding == DW_EH_PE_omit)
 661             ob->s.b.encoding = encoding;
 662           else if (ob->s.b.encoding != encoding)
 663             ob->s.b.mixed_encoding = 1;
 664         }
 665
 666       read_encoded_value_with_base (encoding, base, this_fde->pc_begin,
 667                                     &pc_begin);
 668
 669       /* Take care to ignore link-once functions that were removed.
 670          In these cases, the function address will be NULL, but if
 671          the encoding is smaller than a pointer a true NULL may not
 672          be representable.  Assume 0 in the representable bits is NULL.  */
 673       mask = size_of_encoded_value (encoding);
 674       if (mask < sizeof (void *))
 675         mask = (((_Unwind_Ptr) 1) << (mask << 3)) - 1;
 676       else
 677         mask = -1;
 678
 679       if ((pc_begin & mask) == 0)
 680         continue;
 681
 682       count += 1;
 683       if ((void *) pc_begin < ob->pc_begin)
 684         ob->pc_begin = (void *) pc_begin;
 685     }
 686
 687   return count;
 688 } 

分析read_encoded_value_with_base

read_encoded_value_with_base实现在unwind-pe.h的180行。调试该函数,发现X86上是encoding28,而ARM上encoding是27,base在两种平台上都是0。

两种平台上198行的判断都不成立,所以进入207分支,X86走252行,ARM走248行。

180 static const unsigned char *
181 read_encoded_value_with_base (unsigned char encoding, _Unwind_Ptr base,
182                               const unsigned char *p, _Unwind_Ptr *val)
183 {
184   union unaligned
185     {
186       void *ptr;
187       unsigned u2 __attribute__ ((mode (HI)));
188       unsigned u4 __attribute__ ((mode (SI)));
189       unsigned u8 __attribute__ ((mode (DI)));
190       signed s2 __attribute__ ((mode (HI)));
191       signed s4 __attribute__ ((mode (SI)));
192       signed s8 __attribute__ ((mode (DI)));
193     } __attribute__((__packed__));
194
195   const union unaligned *u = (const union unaligned *) p;
196   _Unwind_Internal_Ptr result;
197
198   if (encoding == DW_EH_PE_aligned)
199     {
200       _Unwind_Internal_Ptr a = (_Unwind_Internal_Ptr) p;
201       a = (a + sizeof (void *) - 1) & - sizeof(void *);
202       result = *(_Unwind_Internal_Ptr *) a;
203       p = (const unsigned char *) (_Unwind_Internal_Ptr) (a + sizeof (void *));
204     }
205   else
206     {
207       switch (encoding & 0x0f)
208         {
209         case DW_EH_PE_absptr:
210           result = (_Unwind_Internal_Ptr) u->ptr;
211           p += sizeof (void *);
212           break;
213
214         case DW_EH_PE_uleb128:
215           {
216             _uleb128_t tmp;
217             p = read_uleb128 (p, &tmp);
218             result = (_Unwind_Internal_Ptr) tmp;
219           }
220           break;
221
222         case DW_EH_PE_sleb128:
223           {
224             _sleb128_t tmp;
225             p = read_sleb128 (p, &tmp);
226             result = (_Unwind_Internal_Ptr) tmp;
227           }
228           break;
229
230         case DW_EH_PE_udata2:
231           result = u->u2;
232           p += 2;
233           break;
234         case DW_EH_PE_udata4:
235           result = u->u4;
236           p += 4;
237           break;
238         case DW_EH_PE_udata8:
239           result = u->u8;
240           p += 8;
241           break;
242
243         case DW_EH_PE_sdata2:
244           result = u->s2;
245           p += 2;
246           break;
247         case DW_EH_PE_sdata4:
248           result = u->s4;
249           p += 4;
250           break;
251         case DW_EH_PE_sdata8:
252           result = u->s8;
253           p += 8;
254           break;
255
256         default:
257           __gxx_abort ();
258         }
259
260       if (result != 0)
261         {
262           result += ((encoding & 0x70) == DW_EH_PE_pcrel
263                      ? (_Unwind_Internal_Ptr) u : base);
264           if (encoding & DW_EH_PE_indirect)
265             result = *(_Unwind_Internal_Ptr *) result;
266         }
267     }
268
269   *val = result;
270   return p;
271 }

在x86平台通过b unwind-pe.h:269 if encoding==28下条件断点,在arm平台用b unwind-pe.h:269 if encoding==27。看一下相关变量。

X86:

Breakpoint 1, oceanbase::pl::ObPLExecState::execute (this=0xfffd205e2760) at ./src/pl/ob_pl.cpp:1878
1878	./src/pl/ob_pl.cpp: No such file or directory.
(gdb) p fp
$5 = (int (*)(oceanbase::pl::ObPLExecCtx *, int64_t, int64_t *)) 0xfffe96a07000
(gdb) p u
$10 = (const union unaligned *) 0x7f032db3d02c
(gdb) p result
$11 = 139651627704320
(gdb) p /x result
$12 = 0x7f032da96000
(gdb) p *u
$13 = {ptr = 0xfffffffffff58fd4, u2 = 36820, u4 = 4294283220, u8 = 18446744073708867540, s2 = -28716, s4 = -684076, s8 = -684076}
(gdb) p /x (int64_t)u+u->s8
$14 = 0x7f032da96000

ARM:

Breakpoint 2, oceanbase::pl::ObPLExecState::execute (this=0xfffd0fd40760) at ./src/pl/ob_pl.cpp:1878
1878	./src/pl/ob_pl.cpp: No such file or directory.
(gdb) p fp
$1 = (int (*)(oceanbase::pl::ObPLExecCtx *, int64_t, int64_t *)) 0xfffd0db45000
Breakpoint 1, read_encoded_value_with_base (encoding=27 '\033', base=0, p=0xffff84002824 "\354\276\002", val=0xfffd0fd3a928) at ../.././libgcc/unwind-pe.h:269
269	  *val = result;
(gdb) p result
$2 = 281470911664128
(gdb) p /x result
$3 = 0xffff0db45000
(gdb) p u
$4 = (const union unaligned *) 0xffff84002820
(gdb) p *u
$5 = {ptr = 0x2beec89b427e0, u2 = 10208, u4 = 2310285280, u8 = 772873085265888, s2 = 10208, s4 = -1984682016, s8 = 772873085265888}
(gdb) p /x (int64_t)u+u->s4
$6 = 0xffff0db45000

可见在X86平台read_encoded_value_with_base里算出来的pc_begin符合预期和fp的值相等,而ARM平台算出来的pc_begin和fp的高16位差1。

猜测pc_begin是通过u本身的地址加上u指向的一个整数offset来定位的。在X86平台上u的地址高16位是正确的,而ARM平台u的地址已经错误了,所以优先怀疑是u本身的地址不对。u的地址来源于read_encoded_value_with_base的入参p。

寻找read_encoded_value_with_base的入参p的来源

打出来堆栈逐层往上看:

(gdb) bt
#0  read_encoded_value_with_base (encoding=27 '\033', base=0, p=0xffff84002824 "\354\276\002", val=0xfffd0fd3a928) at ../.././libgcc/unwind-pe.h:269
#1  0x00000000118cbf70 in classify_object_over_fdes (ob=0x4e8d9f40, this_fde=0xffff84002818) at ../.././libgcc/unwind-dw2-fde.c:666
#2  0x00000000118cc2a0 in init_object (ob=0x4e8d9f40) at ../.././libgcc/unwind-dw2-fde.c:777
#3  0x00000000118cc988 in search_object (ob=0x4e8d9f40, pc=0x118ca10b <_Unwind_RaiseException+75>) at ../.././libgcc/unwind-dw2-fde.c:989
#4  0x00000000118ccb94 in _Unwind_Find_registered_FDE (pc=0x118ca10b <_Unwind_RaiseException+75>, bases=0xfffd0fd3c020) at ../.././libgcc/unwind-dw2-fde.c:1066
#5  0x00000000118cd688 in _Unwind_Find_FDE (pc=0x118ca10b <_Unwind_RaiseException+75>, bases=0xfffd0fd3c020) at ../.././libgcc/unwind-dw2-fde-dip.c:458
#6  0x00000000118c901c in uw_frame_state_for (context=0xfffd0fd3bcf8, fs=0xfffd0fd3ab50) at ../.././libgcc/unwind-dw2.c:1249
#7  0x00000000118c9ca4 in uw_init_context_1 (context=0xfffd0fd3bcf8, outer_cfa=0xfffd0fd3c0d0, outer_ra=0xfffd0db52188) at ../.././libgcc/unwind-dw2.c:1578
#8  0x00000000118ca10c in _Unwind_RaiseException (exc=0xfffd28f3e6a0) at ../.././libgcc/unwind.inc:88
#9  0x0000fffd0db52188 in ?? ()
#10 0x0000000000000004 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

p来源于classify_object_over_fde的第二个参数this_fde。


 633 static size_t
 634 classify_object_over_fdes (struct object *ob, const fde *this_fde)
 635 {
 636   const struct dwarf_cie *last_cie = 0;
 637   size_t count = 0;
 638   int encoding = DW_EH_PE_absptr;
 639   _Unwind_Ptr base = 0;
 640
 641   for (; ! last_fde (ob, this_fde); this_fde = next_fde (this_fde))
 642     {
 643       const struct dwarf_cie *this_cie;
 644       _Unwind_Ptr mask, pc_begin;
 645
 646       /* Skip CIEs.  */
 647       if (this_fde->CIE_delta == 0)
 648         continue;
 649
 650       /* Determine the encoding for this FDE.  Note mixed encoded
 651          objects for later.  */
 652       this_cie = get_cie (this_fde);
 653       if (this_cie != last_cie)
 654         {
 655           last_cie = this_cie;
 656           encoding = get_cie_encoding (this_cie);
 657           if (encoding == DW_EH_PE_omit)
 658             return -1;
 659           base = base_from_object (encoding, ob);
 660           if (ob->s.b.encoding == DW_EH_PE_omit)
 661             ob->s.b.encoding = encoding;
 662           else if (ob->s.b.encoding != encoding)
 663             ob->s.b.mixed_encoding = 1;
 664         }
 665
 666       read_encoded_value_with_base (encoding, base, this_fde->pc_begin,
 667                                     &pc_begin);
 668
 669       /* Take care to ignore link-once functions that were removed.
 670          In these cases, the function address will be NULL, but if
 671          the encoding is smaller than a pointer a true NULL may not
 672          be representable.  Assume 0 in the representable bits is NULL.  */
 673       mask = size_of_encoded_value (encoding);
 674       if (mask < sizeof (void *))
 675         mask = (((_Unwind_Ptr) 1) << (mask << 3)) - 1;
 676       else
 677         mask = -1;
 678
 679       if ((pc_begin & mask) == 0)
 680         continue;
 681
 682       count += 1;
 683       if ((void *) pc_begin < ob->pc_begin)
 684         ob->pc_begin = (void *) pc_begin;
 685     }
 686
 687   return count;
 688 }

this_fde来源于init_object的ob->u.single。

755 static inline void
 756 init_object (struct object* ob)
 757 {
 758   struct fde_accumulator accu;
 759   size_t count;
 760
 761   count = ob->s.b.count;
 762   if (count == 0)
 763     {
 764       if (ob->s.b.from_array)
 765         {
 766           fde **p = ob->u.array;
 767           for (count = 0; *p; ++p)
 768             {
 769               size_t cur_count = classify_object_over_fdes (ob, *p);
 770               if (cur_count == (size_t) -1)
 771                 goto unhandled_fdes;
 772               count += cur_count;
 773             }
 774         }
 775       else
 776         {
 777           count = classify_object_over_fdes (ob, ob->u.single);
 778           if (count == (size_t) -1)
 779             {
 780               static const fde terminator;
 781             unhandled_fdes:
 782               ob->s.i = 0;
 783               ob->s.b.encoding = DW_EH_PE_omit;
 784               ob->u.single = &terminator;
 785               return;
 786             }
 787         }
 788
 789       /* The count field we have in the main struct object is somewhat
 790          limited, but should suffice for virtually all cases.  If the
 791          counted value doesn't fit, re-write a zero.  The worst that
 792          happens is that we re-count next time -- admittedly non-trivial
 793          in that this implies some 2M fdes, but at least we function.  */
 794       ob->s.b.count = count;
 795       if (ob->s.b.count != count)
 796         ob->s.b.count = 0;
 797     }
 798
 799   if (!start_fde_sort (&accu, count))
 800     return;
 801
 802   if (ob->s.b.from_array)
 803     {
 804       fde **p;
 805       for (p = ob->u.array; *p; ++p)
 806         add_fdes (ob, &accu, *p);
 807     }
 808   else
 809     add_fdes (ob, &accu, ob->u.single);
 810
 811   end_fde_sort (ob, &accu, count);
 812
 813   /* Save the original fde pointer, since this is the key by which the
 814      DSO will deregister the object.  */
 815   accu.linear->orig_data = ob->u.single;
 816   ob->u.sort = accu.linear;
 817
 818   ob->s.b.sorted = 1;
 819 }

ob来源于search_object的入参ob。

982 static const fde *
 983 search_object (struct object* ob, void *pc)
 984 {
 985   /* If the data hasn't been sorted, try to do this now.  We may have
 986      more memory available than last time we tried.  */
 987   if (! ob->s.b.sorted)
 988     {
 989       init_object (ob);
 990
 991       /* Despite the above comment, the normal reason to get here is
 992          that we've not processed this object before.  A quick range
 993          check is in order.  */
 994       if (pc < ob->pc_begin)
 995         return NULL;
 996     }
 997
 998   if (ob->s.b.sorted)
 999     {
1000       if (ob->s.b.mixed_encoding)
1001         return binary_search_mixed_encoding_fdes (ob, pc);
1002       else if (ob->s.b.encoding == DW_EH_PE_absptr)
1003         return binary_search_unencoded_fdes (ob, pc);
1004       else
1005         return binary_search_single_encoding_fdes (ob, pc);
1006     }
1007   else
1008     {
1009       /* Long slow laborious linear search, cos we've no memory.  */
1010       if (ob->s.b.from_array)
1011         {
1012           fde **p;
1013           for (p = ob->u.array; *p ; p++)
1014             {
1015               const fde *f = linear_search_fdes (ob, *p, pc);
1016               if (f)
1017                 return f;
1018             }
1019           return NULL;
1020         }
1021       else
1022         return linear_search_fdes (ob, ob->u.single, pc);
1023     }
1024 }

而这个ob最终来源于_Unwind_Find_FDE里的unseen_objects。从堆栈上可以猜测,一开始seen_objects为空,而unseen_objects有数据,unseen_objects里的pc_begin无效,经过这个堆栈的调用pc_begin被计算出来并挂在了seen_objects上。

u.single是什么

从unwind-dw2-fde.c的777行和666行两层堆栈可见:ob->u.single实际上是一个fde,结构如下:

(gdb) p *this_fde
$8 = {length = 52, CIE_delta = 36, pc_begin = 0xffff84002820 "\340'\264\211\354\276\002"}

尝试用this_fde里的pc_begin计算seen_objects里的pc_begin,发现计算的结果符合预期。

(gdb) p (const union unaligned *)(this_fde->pc_begin)
$9 = (const union unaligned *) 0xffff84002820
(gdb) p *(const union unaligned *)(this_fde->pc_begin)
$10 = {p = 0x2beec89b427e0, u2 = 10208, u4 = 2310285280, u8 = 772873085265888, s2 = 10208, s4 = -1984682016, s8 = 772873085265888}
(gdb) p /x (int64_t)(this_fde->pc_begin)+((const union unaligned *)(this_fde->pc_begin))->s4
$11 = 0xffff0db45000

fde是什么

再看classify_object_over_fdes的641行,发现其实641行进行了一个循环,通过last_fde (ob, this_fde)和next_fde (this_fde)进行遍历。代码实现在./unwind-dw2-fde.h

134 struct dwarf_cie
135 {
136   uword length;
137   sword CIE_id;
138   ubyte version;
139   unsigned char augmentation[];
140 } __attribute__ ((packed, aligned (__alignof__ (void *))));
141
142 /* The first few fields of an FDE.  */
143 struct dwarf_fde
144 {
145   uword length;
146   sword CIE_delta;
147   unsigned char pc_begin[];
148 } __attribute__ ((packed, aligned (__alignof__ (void *))));
149
150 typedef struct dwarf_fde fde;
151
152 /* Locate the CIE for a given FDE.  */
153
154 static inline const struct dwarf_cie *
155 get_cie (const struct dwarf_fde *f)
156 {
157   return (const void *)&f->CIE_delta - f->CIE_delta;
158 }
159
160 static inline const fde *
161 next_fde (const fde *f)
162 {
163   return (const fde *) ((const char *) f + f->length + sizeof (f->length));
164 }
165
166 extern const fde * _Unwind_Find_FDE (void *, struct dwarf_eh_bases *);
167
168 static inline int
169 last_fde (struct object *obj __attribute__ ((__unused__)), const fde *f)
170 {
171 #ifdef DWARF2_OBJECT_END_PTR_EXTENSION
172   return f == (const fde *) obj->fde_end || f->length == 0;
173 #else
174   return f->length == 0;
175 #endif
176 }

打出来fde的结构如下:

(gdb) p this_fde
$2 = (const fde *) 0x7f87af8f3024
(gdb) p *this_fde
$3 = {length = 13928, CIE_delta = 40, pc_begin = 0x7f87af8f302c "\324\337\276\352\377\377\377\377\221\321\001"}
(gdb) p (const union unaligned *)(this_fde->pc_begin)
$4 = (const union unaligned *) 0x7f87af8f302c
(gdb) p *(const union unaligned *)(this_fde->pc_begin)
$5 = {p = 0xffffffffeabedfd4, u2 = 57300, u4 = 3938377684, u8 = 18446744073352962004, s2 = -8236, s4 = -356589612, s8 = -356589612}
(gdb) p /x (int64_t)(this_fde->pc_begin)+((const union unaligned *)(this_fde->pc_begin))->s8
$6 = 0x7f879a4e1000
(gdb) p next_fde(this_fde)
$7 = (const fde *) 0x7f87af8f6690
(gdb) p *next_fde(this_fde)
$8 = {length = 0, CIE_delta = 0, pc_begin = 0x7f87af8f6698 ""}

结合Exception Frames 这篇文档,我们大致能够看出fde的数据组织格式。猜测是CIE后跟了若干个FDE,通过FDE里的CIE_delta能够算出CIE的位置。

寻找u.single的来源

grep u.single,找到对single赋值的地方,发现一共三处,分别在unwind-dw2-fde-dip.c的435行,unwind-dw2-fde.c的第95行和784行。

$find -name "*.c" | xargs grep "u\.single"
./unwind-dw2-fde-dip.c:  ob.u.single = (fde *) eh_frame;
./unwind-dw2-fde.c:  ob->u.single = begin;
./unwind-dw2-fde.c:    if ((*p)->u.single == begin)
./unwind-dw2-fde.c:	if ((*p)->u.single == begin)
./unwind-dw2-fde.c:	  count = classify_object_over_fdes (ob, ob->u.single);
./unwind-dw2-fde.c:	      ob->u.single = &terminator;
./unwind-dw2-fde.c:    add_fdes (ob, &accu, ob->u.single);
./unwind-dw2-fde.c:  accu.linear->orig_data = ob->u.single;
./unwind-dw2-fde.c:	return linear_search_fdes (ob, ob->u.single, pc);
./config/unwind-dw2-fde-darwin.c:	    ob->u.single = (struct dwarf_fde *)real_fde;
vi ./unwind-dw2-fde-dip.c

429   /* We have no sorted search table, so need to go the slow way.
430      As soon as GLIBC will provide API so to notify that a library has been
431      removed, we could cache this (and thus use search_object).  */
432   ob.pc_begin = NULL;
433   ob.tbase = data->tbase;
434   ob.dbase = data->dbase;
435   ob.u.single = (fde *) eh_frame;
436   ob.s.i = 0;
437   ob.s.b.mixed_encoding = 1;  /* Need to assume worst case.  */
438   data->ret = linear_search_fdes (&ob, (fde *) eh_frame, (void *) data->pc);
439   if (data->ret != NULL)
440     {
441       _Unwind_Ptr func;
442       unsigned int encoding = get_fde_encoding (data->ret);
443
444       read_encoded_value_with_base (encoding,
445                                     base_from_cb_data (encoding, data),
446                                     data->ret->pc_begin, &func);
447       data->func = (void *) func;
448     }
449   return 1;
450 }
  vi ./unwind-dw2-fde.c
  
  84 void
  85 __register_frame_info_bases (const void *begin, struct object *ob,
  86                              void *tbase, void *dbase)
  87 {
  88   /* If .eh_frame is empty, don't register at all.  */
  89   if ((const uword *) begin == 0 || *(const uword *) begin == 0)
  90     return;
  91
  92   ob->pc_begin = (void *)-1;
  93   ob->tbase = tbase;
  94   ob->dbase = dbase;
  95   ob->u.single = begin;
  96   ob->s.i = 0;
  97   ob->s.b.encoding = DW_EH_PE_omit;
  98 #ifdef DWARF2_OBJECT_END_PTR_EXTENSION
  99   ob->fde_end = NULL;
 100 #endif
 101
 102   init_object_mutex_once ();
 103   __gthread_mutex_lock (&object_mutex);
 104
 105   ob->next = unseen_objects;
 106   unseen_objects = ob;
 107 #ifdef ATOMIC_FDE_FAST_PATH
 108   /* Set flag that at least one library has registered FDEs.
 109      Use relaxed MO here, it is up to the app to ensure that the library
 110      loading/initialization happens-before using that library in other
 111      threads (in particular unwinding with that library's functions
 112      appearing in the backtraces).  Calling that library's functions
 113      without waiting for the library to initialize would be racy.  */
 114   if (!any_objects_registered)
 115     __atomic_store_n (&any_objects_registered, 1, __ATOMIC_RELAXED);
 116 #endif
 117
 118   __gthread_mutex_unlock (&object_mutex);
 119 }
vi ./unwind-dw2-fde.c 

 755 static inline void
 756 init_object (struct object* ob)
 757 {
 758   struct fde_accumulator accu;
 759   size_t count;
 760
 761   count = ob->s.b.count;
 762   if (count == 0)
 763     {
 764       if (ob->s.b.from_array)
 765         {
 766           fde **p = ob->u.array;
 767           for (count = 0; *p; ++p)
 768             {
 769               size_t cur_count = classify_object_over_fdes (ob, *p);
 770               if (cur_count == (size_t) -1)
 771                 goto unhandled_fdes;
 772               count += cur_count;
 773             }
 774         }
 775       else
 776         {
 777           count = classify_object_over_fdes (ob, ob->u.single);
 778           if (count == (size_t) -1)
 779             {
 780               static const fde terminator;
 781             unhandled_fdes:
 782               ob->s.i = 0;
 783               ob->s.b.encoding = DW_EH_PE_omit;
 784               ob->u.single = &terminator;
 785               return;
 786             }
 787         }
 788
 789       /* The count field we have in the main struct object is somewhat
 790          limited, but should suffice for virtually all cases.  If the
 791          counted value doesn't fit, re-write a zero.  The worst that
 792          happens is that we re-count next time -- admittedly non-trivial
 793          in that this implies some 2M fdes, but at least we function.  */
 794       ob->s.b.count = count;
 795       if (ob->s.b.count != count)
 796         ob->s.b.count = 0;
 797     }
 798
 799   if (!start_fde_sort (&accu, count))
 800     return;
 801
 802   if (ob->s.b.from_array)
 803     {
 804       fde **p;
 805       for (p = ob->u.array; *p; ++p)
 806         add_fdes (ob, &accu, *p);
 807     }
 808   else
 809     add_fdes (ob, &accu, ob->u.single);
 810
 811   end_fde_sort (ob, &accu, count);
 812
 813   /* Save the original fde pointer, since this is the key by which the
 814      DSO will deregister the object.  */
 815   accu.linear->orig_data = ob->u.single;
 816   ob->u.sort = accu.linear;
 817
 818   ob->s.b.sorted = 1;
 819 }

分别打下断点,重启之后重新执行。

X86:

b unwind-dw2-fde.c:784
b unwind-dw2-fde.c:95
b unwind-dw2-fde-dip.c:435
b unwind-pe.h:269 if encoding==28
b unwind-dw2-fde.c:184

ARM:

b unwind-dw2-fde.c:784
b unwind-dw2-fde.c:95
b unwind-dw2-fde-dip.c:435
b unwind-pe.h:269 if encoding==27
b unwind-dw2-fde.c:184

发现断在了unwind-dw2-fde.c:95处。

#0  __register_frame_info_bases (begin=0x7fd5b540c000, ob=0x5489b130, tbase=0x0, dbase=0x0) at ../.././libgcc/unwind-dw2-fde.c:95
#1  0x000000000d1b65a5 in __register_frame_info (begin=0x7fd5b540c000, ob=0x5489b130) at ../.././libgcc/unwind-dw2-fde.c:124
#2  0x000000000d1b65df in __register_frame (begin=0x7fd5b540c000) at ../.././libgcc/unwind-dw2-fde.c:137
#3  0x000000000cc30d1c in llvm::RTDyldMemoryManager::registerEHFrames(unsigned char*, unsigned long, unsigned long) ()
#4  0x000000000cc44d7a in llvm::RuntimeDyldELF::registerEHFrames() ()
#5  0x000000000cc34cf2 in llvm::RuntimeDyld::finalizeWithMemoryManagerLocking() ()
#6  0x000000000b1bba98 in llvm::orc::RTDyldObjectLinkingLayer::addObject(std::shared_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile> >, std::shared_ptr<llvm::JITSymbolResolver>)::{lambda(std::_List_iterator<std::unique_ptr<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject, std::default_delete<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject> > >, llvm::RuntimeDyld&, std::shared_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile> > const&, std::function<void ()>)#1}::operator()(std::_List_iterator<std::unique_ptr<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject, std::default_delete<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject> > >, llvm::RuntimeDyld&, std::shared_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile> > const&, std::function<void ()>) const (this=0x54a7f0c0, H=..., RTDyld=..., ObjToLoad=..., LOSHandleLoad=...) at ./rpm/.dep_create/var/usr/include/llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h:274
#7  llvm::orc::RTDyldObjectLinkingLayer::ConcreteLinkedObject<std::shared_ptr<llvm::RuntimeDyld::MemoryManager>, std::shared_ptr<llvm::JITSymbolResolver>, llvm::orc::RTDyldObjectLinkingLayer::addObject(std::shared_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile> >, std::shared_ptr<llvm::JITSymbolResolver>)::{lambda(std::_List_iterator<std::unique_ptr<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject, std::default_delete<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject> > >, llvm::RuntimeDyld&, std::shared_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile> > const&, std::function<void ()>)#1}>::finalize() (this=0x54bb6150)
    at ./rpm/.dep_create/var/usr/include/llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h:144
#8  0x000000000b1bc3d6 in llvm::orc::RTDyldObjectLinkingLayer::ConcreteLinkedObject<std::shared_ptr<llvm::RuntimeDyld::MemoryManager>, std::shared_ptr<llvm::JITSymbolResolver>, llvm::orc::RTDyldObjectLinkingLayer::addObject(std::shared_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile> >, std::shared_ptr<llvm::JITSymbolResolver>)::{lambda(std::_List_iterator<std::unique_ptr<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject, std::default_delete<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject> > >, llvm::RuntimeDyld&, std::shared_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile> > const&, std::function<void ()>)#1}>::getSymbolMaterializer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}::operator()() const (this=0x549b44f0) at ./rpm/.dep_create/var/usr/include/llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h:159
#9  0x000000000b1bc251 in std::_Function_handler<llvm::Expected<unsigned long> (), llvm::orc::RTDyldObjectLinkingLayer::ConcreteLinkedObject<std::shared_ptr<llvm::RuntimeDyld::MemoryManager>, std::shared_ptr<llvm::JITSymbolResolver>, llvm::orc::RTDyldObjectLinkingLayer::addObject(std::shared_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile> >, std::shared_ptr<llvm::JITSymbolResolver>)::{lambda(std::_List_iterator<std::unique_ptr<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject, std::default_delete<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject> > >, llvm::RuntimeDyld&, std::shared_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile> > const&, std::function<void ()>)#1}>::getSymbolMaterializer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/local/gcc-5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/../../../../include/c++/5.2.0/functional:1856
#10 0x000000000b1b8f0e in std::function<llvm::Expected<unsigned long> ()>::operator()() const (this=0x7fd436c157e0) at /usr/local/gcc-5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/../../../../include/c++/5.2.0/functional:2271
#11 llvm::JITSymbol::getAddress (this=0x7fd436c157e0) at ./rpm/.dep_create/var/usr/include/llvm/ExecutionEngine/JITSymbol.h:238
#12 0x000000000b1b7e00 in oceanbase::jit::core::ObOrcJit::get_function_address (this=0x7fd45bb14f98, name=...) at ./src/objit/src/core/ob_orc_jit.cpp:108
#13 0x000000000b1d230e in oceanbase::jit::ObLLVMHelper::get_function_address (this=<optimized out>, name=...) at ./src/objit/src/ob_llvm_helper.cpp:589
#14 0x0000000004299594 in oceanbase::pl::ObPLCodeGenerator::generate (this=0x7fd5b1b49f80, pl_func=...) at ./src/pl/ob_pl_code_generator.cpp:6952
#15 0x00000000041cea7a in oceanbase::pl::ObPLCompiler::compile (this=0x7fd436c16a78, id=1101710651032556, func=...) at ./src/pl/ob_pl_compile.cpp:340
#16 0x000000000418ae7b in oceanbase::pl::ObPL::generate_pl_function (this=0xdc84068 <oceanbase::observer::ObServer::get_instance()::THE_ONE+1114152>, ctx=..., proc_id=1101710651032556, routine=@0x7fd436c1ad98: 0x7fd45bb14630, debug_mode=false) at ./src/pl/ob_pl.cpp:1493
#17 0x0000000004188eef in oceanbase::pl::ObPL::get_pl_function (this=0xdc84068 <oceanbase::observer::ObServer::get_instance()::THE_ONE+1114152>, ctx=..., package_guard=..., package_id=-1, routine_id=1101710651032556, subprogram_path=..., debug_mode=false, routine=@0x7fd436c1ad98: 0x7fd45bb14630) at ./src/pl/ob_pl.cpp:1351
#18 0x0000000004177bf0 in oceanbase::pl::ObPL::execute (this=0xdc84068 <oceanbase::observer::ObServer::get_instance()::THE_ONE+1114152>, ctx=..., package_id=18446744073709551615, routine_id=1101710651032556, subprogram_path=..., params=..., nocopy_params=..., result=..., status=0x0, inner_call=false, in_function=false) at ./src/pl/ob_pl.cpp:1112
#19 0x0000000009e0387e in oceanbase::sql::ObCallProcedureExecutor::execute (this=0x7fd436c1d030, ctx=..., stmt=...) at ./src/sql/engine/cmd/ob_routine_executor.cpp:114
#20 0x0000000009da6a58 in oceanbase::sql::ObCmdExecutor::execute (ctx=..., cmd=...) at ./src/sql/executor/ob_cmd_executor.cpp:617
#21 0x0000000009d8fd91 in oceanbase::sql::ObResultSet::open_cmd (this=0x7fd5a8db2420) at ./src/sql/ob_result_set.cpp:79
#22 0x0000000009d92799 in oceanbase::sql::ObResultSet::execute (this=0x7fd5a8db2420) at ./src/sql/ob_result_set.cpp:162
#23 0x0000000009d907b5 in oceanbase::sql::ObResultSet::sync_open (this=0x7fd5a8db2420) at ./src/sql/ob_result_set.cpp:147
#24 0x00000000076f607f in oceanbase::observer::ObSyncCmdDriver::response_result (this=0x7fd436c1f058, result=...) at ./src/observer/mysql/ob_sync_cmd_driver.cpp:47
#25 0x0000000007d03ae0 in oceanbase::observer::ObMPQuery::response_result (this=0x7fd5b1b07130, query_ctx=..., force_sync_resp=false, async_resp_used=@0x7fd436c1f8e1: false) at ./src/observer/mysql/obmp_query.cpp:1261
#26 oceanbase::observer::ObMPQuery::do_process (this=0x7fd5b1b07130, session=..., has_more_result=false, force_sync_resp=false, async_resp_used=@0x7fd436c1f8e1: false, need_disconnect=@0x7fd436c1f8e2: true) at ./src/observer/mysql/obmp_query.cpp:731
#27 oceanbase::observer::ObMPQuery::process_single_stmt (this=0x7fd5b1b07130, multi_stmt_item=..., session=..., has_more_result=false, force_sync_resp=false, async_resp_used=@0x7fd436c1f8e1: false, need_disconnect=@0x7fd436c1f8e2: true) at ./src/observer/mysql/obmp_query.cpp:466
#28 0x0000000007cff3c0 in oceanbase::observer::ObMPQuery::process (this=0x7fd5b1b07130) at ./src/observer/mysql/obmp_query.cpp:258
#29 0x000000000b7c11ae in oceanbase::rpc::frame::ObReqProcessor::run (this=0x7fd5b1b07130) at ./deps/oblib/src/rpc/frame/ob_req_processor.cpp:42
#30 0x0000000007d722cb in oceanbase::omt::ObWorkerProcessor::process_one (this=0xe46e840 <oceanbase::observer::ObServer::get_instance()::THE_ONE+9414656>, req=..., process_ret=@0x7fd436c1fb48: 0) at ./src/observer/omt/ob_worker_processor.cpp:62
#31 0x0000000007d71a12 in oceanbase::omt::ObWorkerProcessor::process (this=0xe46e840 <oceanbase::observer::ObServer::get_instance()::THE_ONE+9414656>, req=...) at ./src/observer/omt/ob_worker_processor.cpp:120
#32 0x0000000007d6d6a2 in oceanbase::omt::ObThWorker::process_request (this=0x7fd6adec2aa0, req=...) at ./src/observer/omt/ob_th_worker.cpp:213
#33 0x0000000007d6ca48 in oceanbase::omt::ObThWorker::run (this=0x7fd6adec2aa0, idx=0) at ./src/observer/omt/ob_th_worker.cpp:343
#34 0x0000000003174038 in oceanbase::lib::CoKThreadTemp<oceanbase::lib::CoUserThreadTemp<oceanbase::lib::CoSetSched> >::start()::{lambda()#1}::operator()() const (this=0x5467ce28) at ./deps/oblib/src/lib/coro/co_user_thread.h:267
#35 0x0000000003173edd in std::_Function_handler<void (), oceanbase::lib::CoKThreadTemp<oceanbase::lib::CoUserThreadTemp<oceanbase::lib::CoSetSched> >::start()::{lambda()#1}>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/local/gcc-5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/../../../../include/c++/5.2.0/functional:1871
#36 0x0000000002bff4ae in std::function<void ()>::operator()() const (this=0x5467ce28) at /usr/local/gcc-5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/../../../../include/c++/5.2.0/functional:2271
#37 0x000000000b470aa4 in oceanbase::lib::CoSetSched::Worker::run (this=0x54678b88) at ./deps/oblib/src/lib/coro/co_set_sched.cpp:95
#38 0x000000000b46e6b5 in oceanbase::lib::CoRoutine::__start (from=...) at ./deps/oblib/src/lib/coro/co_routine.cpp:130
#39 0x000000000b46d6ef in make_fcontext () at /data/6/ryan.ly/fix_arm_exception/deps/oblib/src/lib/coro/context/asm/make_x86_64_sysv_elf_gas.S:71
#40 0x0000000000000000 in ?? ()

从堆栈上是从LLVM的RuntimeDyld模块调用过来的。至此似乎流程比较清楚了:

  • 编译PL的时候LLVM的RuntimeDyld会调用libgcc的__register_frame_info_bases注册frame,此时的frame放在unseen_objects里。
  • _Unwind_RaiseException的时候会调用uw_init_context_1初始化调用栈,此时从unseen_objects计算pc_begin并挂在了seen_objects里。
  • _Unwind_RaiseException调用uw_frame_state_for根据pc寻找frame的时候,pc_begin是错的一直都比pc大,所以找不到。

结论:所以看起来是LLVM的RuntimeDyld传给libgcc的begin参数是错的。

  • 14
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值