如何调查Windows Embedded CE上的Exception“凶手”(1)

最新推荐文章于 2022-05-03 13:56:02 发布

weixin_33967071

最新推荐文章于 2022-05-03 13:56:02 发布

阅读量123

点赞数

文章标签：移动开发操作系统数据库

傻了吧，又犯事了

不管是在Windows Desktop、Windows Embedded CE，还是Windows Mobile(Phone)上，不管你是开发人员、测试人员还是用户，相信对应用程序的Crash是相对熟悉了——那画面太美了。

如果你有Log输出窗口的话，跟这个画面一起你还会看到：

 654322 PID:c8a002a TID:c8b002a Exception 'Raised Exception' (-1): Thread-Id=0c8b002a(pth=8207bbcc), 
  Proc-Id=0c8a002a(pprc=9903e6f0) 'testos_5.exe', VM-active=0c8a002a(pprc=9903e6f0) 'testos_5.exe'
654322 PID:c8a002a TID:c8b002a PC=40042c58(coredll.dll+0x00022c58) RA=88046608(kernel.dll+0x00008608)
  SP=0002f770, BVA=ffffffff

以及：

 656644 PID:400002 TID:2e80002   DwXfer!TransferDumpFile: Dump file transfered to local file system,
  Size=0x000419B0, Name=\Windows\System\DumpFiles\Ce053110-01\Ce053110-01.kdmp

现在要捉拿凶手，可是我是“傻瓜警探”(菜鸟程序员)啊，我不懂高级技巧啊，咋办啊？操作系统技术已经有很久的积淀，事后分析(Postmortem Debug)机制已经非常完善。

在微软的大家庭里有关的工具更傻瓜化，而且Windows Embedded上面的一套机制与Windows Desktop很多是通用的。我们拿着工具配置好参数几行命令一输问题差不多就可能知道出来哪了。如果你只想知道How to问题，那么只需要关注“手把手教你断案”部分带颜色的字，当然这篇文章我尝试把问题扩展并深入一下，我知道很多不愿意挖掘内部机制的朋友对此是不感兴趣的。如果你跟我一样是比较痴迷深一点机制的，那么欢迎加我QQ：3423 67 776。

关于Android的题外话

做Android深入一点的开发估计你得汗流浃背了：
How do I obtain crash-data from my Android application?
Capturing Android Exceptions remotely

不过应该很快能够完善起来，我们看到Android 2.2就有新的Error Reports机制：
Android Application Error Reports

手把手教你断案

1.不管通过上面提到的Error Report机制通过网络获得，还是直接如下图所示从本地直接取得，我们最后要得到Dump文件(案发现场)。

PS：Windows下(Desktop/Embedded)的Dump文件一般有3种，

Complete memory dump
Kernel memory dump
Small memory dump

在CE上我们根据实际的情况选择对应的Dump File Type (比如通过网络上传到服务器的Dump File就不能太大，而压力测试时就应该生成Complete Dumps)：

2.Dump文件分析工具还有很多，这里就下载并安装Debugging Tools for Windows(Windbg)：http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx#b

3.在Windbg的File选择中设置好Symblo Search Path、Excutable Image Search Path以及Source Search Path。

PS：PDB代表Program Database，PDB文件由微软开发出来用于存储调试Windows Native Code的符号。

如下图，把模块(EXE、DLL)对应的PDB文件的路径设置好。

在Source Search Path中设置好所有可能相关的源码，比如直接告诉它“E:\WINCE600”。

4.Open Crash Dump。

5.走到这一步时如果符号和代码比较全的话我们发现Windbg已经帮我们找到抛出异常的代码了：

此时我们输入命令：

!anlyze –v

从上面我们看到STATUS_INVALID_PARAMETER，顾名思义“无效参数”，很多朋友到这里时因为没有更多可用的信息还是找不到Root Cause，到底哪里传入的参数是无效的，也就不了了之了。

在有Callstack等信息，甚至源码的情况下，这个问题就好办了，我们可用正推/逆推相结合找到问题代码(另外我们不要忘记了Platform Builder工具以及它的一些扩展插件，比如CeDebugX，非常之强大。有机会以后另起文章详细介绍。)，下面的Deep Dig就是我正推的过程，借此也偷窥一下CE里CRT的异常机制。

其实我在应用程序代码里这么干的：

fseek(0, 0, 0);

很坏吧？其实我们的软件里面暗藏了很多“杀手”，只是更隐蔽。

STATUS_INVALID_PARAMETER的值其实为0xC000000D，Error_Code指示的就是错误的ID。

更多关于STATUS_xxx请见WINCE600\PUBLIC\COMMON\SDK\INC\ntstatus.h(没错，它就是复用Windows NT的，跟NT的是一个鸟样。而且你能看到这个文件的日期是1989年。)

Deep Dig

从fseek函数入手。

WINCE600\private\winceos\coreos\core\corelibc\crtw32\stdio\fseek.c L99

int __cdecl fseek (
    FILEX *stream,
    long offset,
    int whence
    )
{
    int retval;

    _VALIDATE_RETURN((stream != NULL), EINVAL, -1); //参数检验
    _VALIDATE_RETURN(((whence == SEEK_SET) ||
                      (whence == SEEK_CUR) ||
                      (whence == SEEK_END)), EINVAL, -1);

这些烦人的宏就略过了，最后在WINCE600\private\winceos\coreos\core\corelibc\crtw32\misc\invarg.c L72的以下函数被调用：

_CRTIMP void __cdecl _invalid_parameter(
    const wchar_t *pszExpression,
    const wchar_t *pszFunction,
    const wchar_t *pszFile,
    unsigned int nLine,
    uintptr_t pReserved
    )
{
    _invalid_parameter_handler pHandler = __pInvalidArgHandler;

    pszExpression;
    pszFunction;
    pszFile;

    pHandler = (_invalid_parameter_handler) _decode_pointer(pHandler);
    if (pHandler != NULL)
    {
        pHandler(pszExpression, pszFunction, pszFile, nLine, pReserved);
        return;
    }

    __crt_unrecoverable_error(pszExpression, pszFunction, pszFile, nLine, pReserved);//执行这里
}

最后就是你上面在Windbg中看到的抛出异常的函数(在WINCE600\PRIVATE\WINCEOS\COREOS\CORE\DLL\crtsupp.cpp L43)：

void
__cdecl
__crt_unrecoverable_error(
    const wchar_t *pszExpression,
    const wchar_t *pszFunction,
    const wchar_t *pszFile,
    unsigned int nLine,
    uintptr_t pReserved
    )
    {
    /* Fake an exception to call reportfault. */
    EXCEPTION_RECORD ExceptionRecord;
    CONTEXT ContextRecord;
    EXCEPTION_POINTERS ExceptionPointers;

    (pszExpression);
    (pszFunction);
    (pszFile);
    (nLine);
    (pReserved);

#if defined(_X86_)

    __asm
        {
        mov dword ptr [ContextRecord.Eax], eax
        mov dword ptr [ContextRecord.Ecx], ecx
        mov dword ptr [ContextRecord.Edx], edx
        mov dword ptr [ContextRecord.Ebx], ebx
        mov dword ptr [ContextRecord.Esi], esi
        mov dword ptr [ContextRecord.Edi], edi
        mov word ptr [ContextRecord.SegSs], ss
        mov word ptr [ContextRecord.SegCs], cs
        mov word ptr [ContextRecord.SegDs], ds
        mov word ptr [ContextRecord.SegEs], es
        mov word ptr [ContextRecord.SegFs], fs
        mov word ptr [ContextRecord.SegGs], gs
        pushfd
        pop [ContextRecord.EFlags]
        }

    ContextRecord.ContextFlags = CONTEXT_CONTROL;
#pragma warning(push)
#pragma warning(disable:4311)
    ContextRecord.Eip = (ULONG)_ReturnAddress();
    ContextRecord.Esp = (ULONG)_AddressOfReturnAddress();
#pragma warning(pop)
    ContextRecord.Ebp = *((ULONG *)_AddressOfReturnAddress()-1);

#elif defined(_ARM_) || defined(_MIPS_) || defined(_SHX_)

    _CRT_CAPTURE_CONTEXT(&ContextRecord);//是模拟器，所以在这里抛出异常，如果你的平台是CEPC，那么异常在上面的x86分支抛出

#else

    ZeroMemory(&ContextRecord, sizeof(ContextRecord));

#endif

    ZeroMemory(&ExceptionRecord, sizeof(ExceptionRecord));

    ExceptionRecord.ExceptionCode = STATUS_INVALID_PARAMETER;
    ExceptionRecord.ExceptionAddress = _ReturnAddress();

    ExceptionPointers.ExceptionRecord = &ExceptionRecord;
    ExceptionPointers.ContextRecord = &ContextRecord;

    ReportFault(&ExceptionPointers, 0);

    if (IsDebuggerPresent()) DebugBreak();

    ExitProcess(STATUS_INVALID_PARAMETER);
}

如果你对_CRT_CAPTURE_CONTEXT的实现感兴趣(在WINCE600\private\winceos\COREOS\core\inc\corecrt.h)，它其实是通过造成一个访问违例(简称AV)异常来实现的：

/*
 * This is a crude way to capture the current CPU context, but it works.
 * It causes an access violation exception and copies the CPU context
 * captured by the OS into the provided context structure.  Debugger
 * exception processing is disabled.  Note that it debugger notifications
 * are reenabled after this block.
 */
#define _CRT_CAPTURE_CONTEXT(pContextRecord)                           \
do {                                                                   \
    DWORD TlsKernBackup = UTlsPtr()[TLSSLOT_KERNEL];                   \
    UTlsPtr()[TLSSLOT_KERNEL] |= TLSKERN_NOFAULT | TLSKERN_NOFAULTMSG; \
    __try                                                              \
    {                                                                  \
        *(unsigned char volatile *)0 = 0;                              \
    }                                                                  \
    __except(memcpy(pContextRecord,                                    \
                    (GetExceptionInformation())->ContextRecord,        \
                    sizeof(*pContextRecord)),                          \
             EXCEPTION_EXECUTE_HANDLER)                                \
    {                                                                  \
    }                                                                  \
    UTlsPtr()[TLSSLOT_KERNEL] = TlsKernBackup;                         \
} while (0)