Classes, Methods and RTTI

 

Abstract

Microsoft Visual C++ is the most widely used compiler for Win32 so it is important for the Win32 reverser to be familiar with its inner working. Being able to recognize the compiler-generated glue code helps to quickly concentrate on the actual code written by the programmer. It also helps in recovering the high-level structure of the program.

In part II of this 2-part article (see also: Part I: Exception Handling), I will cover how C++ machinery is implemented in MSVC, including classes layout, virtual functions, RTTI. Familiarity with basic C++ and assembly language is assumed.

 

Basic Class Layout

To illustrate the following material, let's consider this simple example:

  1.  class A
  2.     {
  3.       int a1;
  4.     public:
  5.       virtual int A_virt1();
  6.       virtual int A_virt2();
  7.       static void A_static1();
  8.       void A_simple1();
  9.     };
  10.     class B
  11.     {
  12.       int b1;
  13.       int b2;
  14.     public:
  15.       virtual int B_virt1();
  16.       virtual int B_virt2();
  17.     };
  18.     class C: public A, public B
  19.     {
  20.       int c1;
  21.     public:
  22.       virtual int A_virt2();
  23.       virtual int B_virt2();
  24.     };

In most cases MSVC lays out classes in the following order:

  • 1. Pointer to virtual functions table (_vtable_ or _vftable_), added only when the class has virtual methods and no suitable table from a base class can be reused.
  • 2. Base classes
  • 3. Class members

Virtual function tables consist of addresses of virtual methods in the order of their first appearance. Addresses of overloaded functions replace addresses of functions from base classes.

Thus, the layouts for our three classes will look like following:




  1. class A size(8):
  2.         +---
  3.      0  | {vfptr}
  4.      4  | a1
  5.         +---
  6.     A's vftable:
  7.      0  | &A::A_virt1
  8.      4  | &A::A_virt2
  9.     class B size(12):
  10.         +---
  11.      0  | {vfptr}
  12.      4  | b1
  13.      8  | b2
  14.         +---
  15.     B's vftable:
  16.      0  | &B::B_virt1
  17.      4  | &B::B_virt2
  18.     class C size(24):
  19.         +---
  20.         | +--- (base class A)
  21.      0  | | {vfptr}
  22.      4  | | a1
  23.         | +---
  24.         | +--- (base class B)
  25.      8  | | {vfptr}
  26.     12  | | b1
  27.     16  | | b2
  28.         | +---
  29.     20  | c1
  30.         +---
  31.     C's vftable for A:
  32.      0  | &A::A_virt1
  33.      4  | &C::A_virt2
  34.     C's vftable for B:
  35.      0  | &B::B_virt1
  36.      4  | &C::B_virt2

The above diagram was produced by the VC8 compiler using an undocumented switch. To see the class layouts produced by the compiler, use: -d1reportSingleClassLayout to see the layout of a single class -d1reportAllClassLayout to see the layouts of all classes (including internal CRT classes) The layouts are dumped to stdout.

As you can see, C has two vftables, since it has inherited two classes which both already had virtual functions. Address of C::A_virt2 replaces address of A::A_virt2 in C's vftable for A, and C::B_virt2 replaces B::B_virt2 in the other table. 

Calling Conventions and Class Methods 

All class methods in MSVC by default use _thiscall_ convention. Class instance address (_this_ pointer) is passed as a hidden parameter in the ecx register. In the method body the compiler usually tucks it away immediately in some other register (e.g. esi or edi) and/or stack variable. All further adressing of the class members is done through that register and/or variable. However, when implementing COM classes, _stdcall_ convention is used. The following is an overview of the various class method types.

1) Static Methods
Static methods do not need a class instance, so they work the same way as common functions. No _this_ pointer is passed to them. Thus it's not possible to reliably distinguish static methods from simple functions. Example:

    A::A_static1();
    call    A::A_static1


2) Simple Methods
Simple methods need a class instance, so _this_ pointer is passed to them as a hidden first parameter, usually using _thiscall_ convention, i.e. in _ecx_ register. When the base object is not situated at the beginning of the derived class, _this_ pointer needs to be adjusted to point to the actual beginning of the base subobject before calling the function. Example:

    ;pC->A_simple1(1);
    ;esi = pC
    push    1
    mov ecx, esi
    call    A::A_simple1

    ;pC->B_simple1(2,3);
    ;esi = pC
    lea edi, [esi+8] ;adjust this
    push    3
    push    2
    mov ecx, edi
    call    B::B_simple1


As you see, _this_ pointer is adjusted to point to the B subobject before calling B's method.

3) Virtual Methods
To call a virtual method the compiler first needs to fetch the function address from the _vftable_ and then call the function at that address same way as a simple method (i.e. passing _this_ pointer as an implicit parameter). Example:

    ;pC->A_virt2()
    ;esi = pC
    mov eax, [esi]  ;fetch virtual table pointer
    mov ecx, esi
    call [eax+4]  ;call second virtual method
    
    ;pC->B_virt1()
    ;edi = pC
    lea edi, [esi+8] ;adjust this pointer
    mov eax, [edi]   ;fetch virtual table pointer
    mov ecx, edi
    call [eax]       ;call first virtual method


4) Constructors and Destructors
Constructors and destructors work similar to a simple method: they get an implicit _this_ pointer as the first parameter (e.g. ecx in case of _thiscall_ convention). Constructor returns the _this_ pointer in eax, even though formally it has no return value.

RTTI Implementation

RTTI (Run-Time Type Identification) is special compiler-generated information which is used to support C++ operators like dynamic_cast<> and typeid(), and also for C++ exceptions. Due to its nature, RTTI is only required (and generated) for polymorphic classes, i.e. classes with virtual functions.

MSVC compiler puts a pointer to the structure called "Complete Object Locator" just before the vftable. The structure is called so because it allows compiler to find the location of the complete object from a specific vftable pointer (since a class can have several of them). COL looks like following:

struct RTTICompleteObjectLocator
{
    DWORD signature; //always zero ?
    DWORD offset;    //offset of this vtable in the complete class
    DWORD cdOffset;  //constructor displacement offset
    struct TypeDescriptor* pTypeDescriptor; //TypeDescriptor of the complete class
    struct RTTIClassHierarchyDescriptor* pClassDescriptor; //describes inheritance hierarchy
};


Class Hierarchy Descriptor describes the inheritance hierarchy of the class. It is shared by all COLs for a class.

struct RTTIClassHierarchyDescriptor
{
    DWORD signature;      //always zero?
    DWORD attributes;     //bit 0 set = multiple inheritance, bit 1 set = virtual inheritance
    DWORD numBaseClasses; //number of classes in pBaseClassArray
    struct RTTIBaseClassArray* pBaseClassArray;
};


Base Class Array describes all base classes together with information which allows compiler to cast the derived class to any of them during execution of the _dynamic_cast_ operator. Each entry (Base Class Descriptor) has the following structure:

struct RTTIBaseClassDescriptor
{
    struct TypeDescriptor* pTypeDescriptor; //type descriptor of the class
    DWORD numContainedBases; //number of nested classes following in the Base Class Array
    struct PMD where;        //pointer-to-member displacement info
    DWORD attributes;        //flags, usually 0
};

struct PMD
{
    int mdisp;  //member displacement
    int pdisp;  //vbtable displacement
    int vdisp;  //displacement inside vbtable
};


The PMD structure describes how a base class is placed inside the complete class. In the case of simple inheritance it is situated at a fixed offset from the start of object, and that value is the _mdisp_ field. If it's a virtual base, an additional offset needs to be fetched from the vbtable. Pseudo-code for adjusting _this_ pointer from derived class to a base class looks like the following:

    //char* pThis; struct PMD pmd;
    pThis+=pmd.mdisp;
    if (pmd.pdisp!=-1)
    {
      char *vbtable = pThis+pmd.pdisp;
      pThis += *(int*)(vbtable+pmd.vdisp);
    }


For example, the RTTI hierarchy for our three classes looks like this:


Extracting Information

1) RTTI
If present, RTTI is a valuable source of information for reversing. From RTTI it's possible to recover class names, inheritance hierarchy, and in some cases parts of the class layout. My RTTI scanner script shows most of that information. (see Appendix I)

2) Static and Global Initializers
Global and static objects need to be initialized before the main program starts. MSVC implements that by generating initializer funclets and putting their addresses in a table, which is processed during CRT startup by the _cinit function. The table usually resides in the beginning of .data section. A typical initializer looks like following:

    _init_gA1:
        mov     ecx, offset _gA1
        call    A::A()
        push    offset _term_gA1
        call    _atexit
        pop     ecx
        retn
    _term_gA1:
        mov     ecx, offset _gA1
        call    A::~A()
        retn


Thus, from this table way we can find out:

  • Global/static objects addresses
  • Their constructors
  • Their destructors

See also MSVC _#pragma_ directive _init_seg_ [5].

3) Unwind Funclets
If any automatic objects are created in a function, VC++ compiler automatically generates exception handling structures which ensure deletion of those objects in case an exception happens. See Part I for a detailed description of C++ exception implementation. A typical unwind funclet destructs an object on the stack:

    unwind_1tobase:  ; state 1 -> -1
        lea     ecx, [ebp+a1]
        jmp     A::~A()


By finding the opposite state change inside the function body or just the first access to the same stack variable, we can also find the constructor:

    
    lea     ecx, [ebp+a1]
    call    A::A()
    mov     [ebp+__$EHRec$.state], 1


For the objects constructed using new() operator, the unwind funclet ensures deletion of allocated memory in case the constructor fails:

    unwind_0tobase: ; state 0 -> -1
        mov     eax, [ebp+pA1]
        push    eax
        call    operator delete(void *)
        pop     ecx
        retn


In the function body:

    ;A* pA1 = new A();
        push    
 
 
  
  
        call    operator new(uint)
        add     esp, 4
        mov     [ebp+pA1], eax
        test    eax, eax
        mov     [ebp+__$EHRec$.state], 0; state 0: memory allocated but object is not yet constructed
        jz      short @@new_failed
        mov     ecx, eax
        call    A::A()
        mov     esi, eax
        jmp     short @@constructed_ok
    @@new_failed:
        xor     esi, esi
    @@constructed_ok:
        mov     [esp+14h+__$EHRec$.state], -1
     ;state -1: either object was constructed successfully or memory allocation failed
     ;in both cases further memory management is done by the programmer

 
 


Another type of unwind funclets is used in constructors and destructors. It ensures destruction of the class members in case of exception. In this case the funclets use the _this_ pointer, which is kept in a stack variable:

    unwind_2to1:
        mov     ecx, [ebp+_this] ; state 2 -> 1
        add     ecx, 4Ch
        jmp     B1::~B1


Here the funclet destructs a class member of type B1 at the offset 4Ch. Thus, from unwind funclets we can find out:

  • Stack variables representing C++ objects or pointers to objects allocated with _operator new_.
  • Their destructors
  • Their constructors
  • in case of new'ed objects, their size


4) Constructors / Destructors Recursion
This rule is simple: constructors call other constructors (of base classes and member variables) and destructors call other destructors. A typical constructor does the following:

  • Call constructors of the base classes.
  • Call constructors of complex class members.
  • Initialize vfptr(s) if the class has virtual functions
  • Execute the constructor body written by the programmer.

Typical destructor works almost in the reverse order:

  • Initialize vfptr if the class has virtual functions
  • Execute the destructor body written by the programmer.
  • Call destructors of complex class members
  • Call destructors of base classes

Another distinctive feature of destructors generated by MSVC is that their _state_ variable is usually initialized with the highest value and then gets decremented with each destructed subobject, which make their identification easier. Be aware that simple constructors/destructors are often inlined by MSVC. That's why you can often see the vftable pointer repeatedly reloaded with different pointers in the same function.

5) Array Construction Destruction
The MSVC compiler uses a helper function to construct and destroy an array of objects. Consider the following code:

    A* pA = new A[n];
    
 
 
  
  
    delete [] pA;

 
 


It is translated into the following pseudocode:

    array = new char(sizeof(A)*n+sizeof(int))
    if (array)
    {
      *(int*)array=n; //store array size in the beginning
      'eh vector constructor iterator'(array+sizeof(int),sizeof(A),count,&A::A,&A::~A);
    }
    pA = array;
    
 
 
  
  
    'eh vector destructor iterator'(pA,sizeof(A),count,&A::~A);

 
 


If A has a vftable, a 'vector deleting destructor' is invoked instead when deleting the array:

    ;pA->'vector deleting destructor'(3);
    mov ecx, pA
    push 3 ; flags: 0x2=deleting an array, 0x1=free the memory
    call A::'vector deleting destructor'


If A's destructor is virtual, it's invoked virtually:

    mov ecx, pA
    push 3
    mov eax, [ecx] ;fetch vtable pointer
    call [eax]     ;call deleting destructor


Consequently, from the vector constructor/destructor iterator calls we can determine:

  • addresses of arrays of objects
  • their constructors
  • their destructors
  • class sizes


6) Deleting Destructors
When class has a virtual destructor, compiler generates a helper function - deleting destructor. Its purpose is to make sure that a proper _operator delete_ gets called when destructing a class. Pseudo-code for a deleting destructor looks like following:

    virtual void * A::'scalar deleting destructor'(uint flags)
    {
      this->~A();
      if (flags&1) A::operator delete(this);
    };


The address of this function is placed into the vftable instead of the destructor's address. This way, if another class overrides the virtual destructor, _operator delete_ of that class will be called. Though in real code _operator delete_ gets overriden quite rarely, so usually you see a call to the default delete(). Sometimes compiler can also generate a vector deleting destructor. Its code looks like this:

    virtual void * A::'vector deleting destructor'(uint flags)
    {
      if (flags&2) //destructing a vector
      {
        array = ((int*)this)-1; //array size is stored just before the this pointer
        count = array[0];
        'eh vector destructor iterator'(this,sizeof(A),count,A::~A);
        if (flags&1) A::operator delete(array);
      }
      else {
        this->~A();
        if (flags&1) A::operator delete(this);
      }
    };


I skipped most of the details on implementation of classes with virtual bases since they complicate things quite a bit and are rather rare in the real world. Please refer to the article by Jan Gray[1]. It's very detailed, if a bit heavy on Hungarian notation. The article [2] describes an example of the virtual inheritance implementation in MSVC. See also some of the MS patents [3] for more details.

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
城市应急指挥系统是智慧城市建设的重要组成部分,旨在提高城市对突发事件的预防和处置能力。系统背景源于自然灾害和事故灾难频发,如汶川地震和日本大地震等,这些事件造成了巨大的人员伤亡和财产损失。随着城市化进程的加快,应急信息化建设面临信息资源分散、管理标准不统一等问题,需要通过统筹管理和技术创新来解决。 系统的设计思路是通过先进的技术手段,如物联网、射频识别、卫星定位等,构建一个具有强大信息感知和通信能力的网络和平台。这将促进不同部门和层次之间的信息共享、交流和整合,提高城市资源的利用效率,满足城市对各种信息的获取和使用需求。在“十二五”期间,应急信息化工作将依托这些技术,实现动态监控、风险管理、预警以及统一指挥调度。 应急指挥系统的建设目标是实现快速有效的应对各种突发事件,保障人民生命财产安全,减少社会危害和经济损失。系统将包括预测预警、模拟演练、辅助决策、态势分析等功能,以及应急值守、预案管理、GIS应用等基本应用。此外,还包括支撑平台的建设,如接警中心、视频会议、统一通信等基础设施。 系统的实施将涉及到应急网络建设、应急指挥、视频监控、卫星通信等多个方面。通过高度集成的系统,建立统一的信息接收和处理平台,实现多渠道接入和融合指挥调度。此外,还包括应急指挥中心基础平台建设、固定和移动应急指挥通信系统建设,以及应急队伍建设,确保能够迅速响应并有效处置各类突发事件。 项目的意义在于,它不仅是提升灾害监测预报水平和预警能力的重要科技支撑,也是实现预防和减轻重大灾害和事故损失的关键。通过实施城市应急指挥系统,可以加强社会管理和公共服务,构建和谐社会,为打造平安城市提供坚实的基础。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值