1,当程序运行时经常会发生段错误,如果在linux系统,可以利用core dump文件,用gdb来帮助查找,前面已经有介绍过这种方法,如果不是linux系统,则很多时候不能使用gdb,则可以利用反汇编来查找出错的位置,一般段错误会打印出出错的指针位置,如下:
Oops: Data Abort caused by READ instruction!
Fault: Alignment fault
pc: 0021034c
r0: 20000053 r1: 00000001 r2: 00000000 r3: 20000053
r4: aaaaaaaa r5: 00256984 r6: aaaaaaaa r7: dddddddd
r8: aaaaaaaa r9: dddddddd r10: 00060000
fp: 00000000 ip: 00000000 sp: 00240bc0
SPSR: 600000d3
从上面看出,pc就是程序计数器,即程序运行到此位置发生错误,然后我们利用objdump将运行的程序反汇编,命令如下:/usr/local/linaro-armv8l-eabi-2017.08-gcc7.1/bin/arm-eabi-objdump -D -S app_demo.elf > a.s
,然后在a.s文件中搜索该pc地址,就可以找到是哪个函数出现问题,可以利用加log等方法来调试。
- 当段错误的时候,利用libunwind库可以将stack打印出来,如下:
[ ERROR][ 157: am_signal_unwind.cpp: void critical_error_handler()]: #0 : [Function: 0x0000007f938e26b0] sp=0x0000007f2cbf9460 + 0x0
[ ERROR][ 142: am_signal_unwind.cpp: void critical_error_handler()]: #1 : [Function: 0x0000007f9292b024] sp=0x0000007f2cbfa6c0 AMMuxerFmp4Builder::fill_hevc_decoder_configuration_record_box() !!!! BAD Address 0x7c
[ ERROR][ 157: am_signal_unwind.cpp: void critical_error_handler()]: #2 : [Function: 0x0000007f9292b024] sp=0x0000007f2cbfa6c0 AMMuxerFmp4Builder::fill_hevc_decoder_configuration_record_box() + 0x44
[ ERROR][ 157: am_signal_unwind.cpp: void critical_error_handler()]: #3 : [Function: 0x0000007f9292b5f4] sp=0x0000007f2cbfa6f0 AMMuxerFmp4Builder::fill_visual_sample_description_box() + 0x174
[ ERROR][ 157: am_signal_unwind.cpp: void critical_error_handler()]: #4 : [Function: 0x0000007f9292bb78] sp=0x0000007f2cbfa720 AMMuxerFmp4Builder::fill_video_sample_table_box() + 0x28
[ ERROR][ 157: am_signal_unwind.cpp: void critical_error_handler()]: #5 : [Function: 0x0000007f9292c058] sp=0x0000007f2cbfa740 AMMuxerFmp4Builder::fill_video_media_info_box() + 0x38
[ ERROR][ 157: am_signal_unwind.cpp: void critical_error_handler()]: #6 : [Function: 0x0000007f9292c224] sp=0x0000007f2cbfa760 AMMuxerFmp4Builder::fill_video_media_box() + 0x74
[ ERROR][ 157: am_signal_unwind.cpp: void critical_error_handler()]: #7 : [Function: 0x0000007f9292c3f0] sp=0x0000007f2cbfa780 AMMuxerFmp4Builder::fill_video_track_box() + 0x30
[ ERROR][ 157: am_signal_unwind.cpp: void critical_error_handler()]: #8 : [Function: 0x0000007f9292c6b0] sp=0x0000007f2cbfa7a0 AMMuxerFmp4Builder::fill_movie_box() + 0x30
[ ERROR][ 157: am_signal_unwind.cpp: void critical_error_handler()]: #9 : [Function: 0x0000007f9292fe94] sp=0x0000007f2cbfa7c0 AMMuxerFmp4Builder::end_current_fragment() + 0xb4
[ ERROR][ 157: am_signal_unwind.cpp: void critical_error_handler()]: #10: [Function: 0x0000007f92930230] sp=0x0000007f2cbfa7e0 AMMuxerFmp4Builder::end_file() + 0x120
[ ERROR][ 157: am_signal_unwind.cpp: void critical_error_handler()]: #11: [Function: 0x0000007f92854fd8] sp=0x0000007f2cbfa830 _ZN9__gnu_cxx12__to_xstringINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEcEET_PFiPT0_mPKS8_St9__va_listEmSB_z + 0x120
[ ERROR][ 157: am_signal_unwind.cpp: void critical_error_handler()]: #12: [Function: 0x0000007f92855740] sp=0x0000007f2cbfa890 _ZN9__gnu_cxx12__to_xstringINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEcEET_PFiPT0_mPKS8_St9__va_listEmSB_z + 0x120
[ ERROR][ 157: am_signal_unwind.cpp: void critical_error_handler()]: #13: [Function: 0x0000007f9387f64c] sp=0x0000007f2cbfa950 AMThread::static_entry(void*) + 0x6c
[ ERROR][ 157: am_signal_unwind.cpp: void critical_error_handler()]: #14: [Function: 0x0000007f9338c6fc] sp=0x0000007f2cbfa990 __libpthread_freeres + 0x13dc
[ ERROR][ 157: am_signal_unwind.cpp: void critical_error_handler()]: #15: [Function: 0x0000007f9347dfbc] sp=0x0000007f2cbfaaa0 clone + 0x5c
首先,我们从stack的信息可以看到是:fill_hevc_decoder_configuration_record_box() 函数内部出现了段错误,但我们不知到是哪一句出现了问题。我们可以利用objdump工具来将这个函数反汇编来获得汇编源码,来帮助分析是哪一句出现了问题。反汇编如下:
AM_STATE AMMuxerFmp4Builder::fill_hevc_decoder_configuration_record_box()
{
AM_STATE ret = AM_STATE_OK;
do {
HEVCDecoderConfigurationRecord& box = m_movie_box.m_video_track.\
m_media.m_media_info.m_sample_table.m_sample_description.\
m_visual_entry.m_HEVC_config;
box.m_base_box.m_enable = true;
box.m_base_box.m_type = DISOM_HEVC_DECODER_CONFIGURATION_RECORD_TAG;
box.m_configuration_version = 1;
box.m_general_profile_space = m_hevc_vps_struct->ptl.general_profile_space;
box.m_general_tier_flag = m_hevc_vps_struct->ptl.general_tier_flag;
box.m_general_profile_idc = m_hevc_vps_struct->ptl.general_profile_idc;
for(uint32_t i = 0; i < 32; ++ i) {
box.m_general_profile_compatibility_flags |=
m_hevc_vps_struct->ptl.general_profile_compatibility_flag[i] << (31 - i);
}
00000000000153e0 <_ZN18AMMuxerFmp4Builder42fill_hevc_decoder_configuration_record_boxEv>:
153e0: d503233f paciasp
153e4: a9bd7bfd stp x29, x30, [sp, #-48]!
153e8: 52800022 mov w2, #0x1 // #1
153ec: 910003fd mov x29, sp
153f0: a90153f3 stp x19, x20, [sp, #16]
153f4: aa0003f4 mov x20, x0
153f8: 528ecd00 mov w0, #0x7668 // #30312
153fc: 72a86c60 movk w0, #0x4363, lsl #16
15400: a9025bf5 stp x21, x22, [sp, #32]
15404: b94b7a83 ldr w3, [x20, #2936]
15408: 392e1a82 strb w2, [x20, #2950]
1540c: 912dc293 add x19, x20, #0xb70
15410: b9002a60 str w0, [x19, #40]
15414: d2800001 mov x1, #0x0 // #0
15418: f9403680 ldr x0, [x20, #104]
1541c: 3900b262 strb w2, [x19, #44]
15420: 528003e6 mov w6, #0x1f // #31
15424: 91021005 add x5, x0, #0x84
15428: b9407c02 ldr w2, [x0, #124]
1542c: 39005e62 strb w2, [x19, #23]
15430: 39457002 ldrb w2, [x0, #348]
15434: 39006262 strb w2, [x19, #24]
15438: b9408002 ldr w2, [x0, #128]
1543c: 39006662 strb w2, [x19, #25]
15440: b86178a2 ldr w2, [x5, x1, lsl #2]
15444: 4b0100c4 sub w4, w6, w1
15448: 91000421 add x1, x1, #0x1
1544c: f100803f cmp x1, #0x20
15450: 1ac42042 lsl w2, w2, w4
15454: 2a020063 orr w3, w3, w2
15458: b9000a63 str w3, [x19, #8]
1545c: 54ffff21 b.ne 15440 <_ZN18AMMuxerFmp4Builder42fill_hevc_decoder_configuration_record_boxEv+0x60> // b.any
我们从sp=0x0000007f2cbfa6c0 AMMuxerFmp4Builder::fill_hevc_decoder_configuration_record_box() + 0x44可以看出,问题是出在这个函数+ offset=0x44的地方,然后我们去找汇编,找到这个函数的0x44行,0x153e0 + 0x44 = 15424, 就是如下这一句。
15424: 91021005 add x5, x0, #0x84
然后结合源码和每个变量的偏移量,大体猜测出是m_hevc_vps_struct为空导致的。
我们从sp=0x0000007f2cbfa6c0 AMMuxerFmp4Builder::fill_hevc_decoder_configuration_record_box() !!! BAD Address 0x7c这一据也大体能看出,0x7c应该是个偏移量,不是一个正常的指针,说明要访问的变量的偏移是0x7c, 要访问这个变量,需要将基地址+0x7c, 因为基地址为0,才导致算出来的地址为0x7c这么一个非法地址,所以结合源码,看看哪个变量在struct中的偏移为0x7c, 这个struct的指针应该是null才导致这个错误。
所以stack打印出来的信息都是有用的,要结合多个信息来大体判断出错误的原因。