本文内容均在64位操作系统运行,编译器gcc version 4.8.5。
预备知识
虚函数表其实是一个数组,元素大小等于一个指针的字节大小。里面除了存放指向虚函数的指针外,还会存放一些辅助信息,比如top_offset(当前位置与对象起始位置的偏移量,用于多继承)、vcall_offset、vbase_offset等等。
无继承
类的数据成员只有static和non-static两种类型,函数成员只有static、non-static和virtual三种类型。以下面Base类为例,研究无继承体系的类对象模型。
class Base {
public:
Base(int i) :baseI(i) {}
int getI() { return baseI; }
static void countI() {}
virtual void print() { cout << "Base::print()" << endl; }
virtual ~Base(){}
virtual void printBase() { cout << "Base::printBase()" << endl; }
private:
int baseI;
static int baseS;
};
现在创建一个Base b对象,b的内存布局如下:
证明如下:
int main(int argc, char *argv[])
{
typedef void (*Fun)();
Base b(1000);
cout << "b对象大小为: " << sizeof(b) << endl;
Fun fun1 = (Fun)*(long*)*(long*)(&b);
fun1();
// 析构函数无法通过地址调用
Fun fun2 = (Fun)*((long*)*(long*)(&b) + 3);
fun2();
cout << "baseI = " << *((int*)(&b) + 2) << endl;
cout << "padding = " << *((int*)(&b) + 3) << endl;
return 0;
}
运行结果如下:
b对象大小为: 16
Base::print()
Base::printBase()
baseI = 1000
padding = 0
gdb打印虚函数表如下:
(gdb) set print asm-demangle on
(gdb) p /a (*(void***)&b)[-2]@7
$18 = {0x0, 0x400d60 <typeinfo for Base>, 0x400b82 <Base::print()>, 0x400bac <Base::~Base()>, 0x400bda <Base::~Base()>, 0x400c00 <Base::printBase()>, 0x6573614234}
结论:无继承体系的对象模型由虚函数指针+对象数据成员+padding组成。当然对象中拥有虚函数指针的前提是该类至少定义或继承了一个虚函数。
虚函数表中为什么会有两个虚函数?
一个叫complete object destructor, 另一个叫deleting destructor。区别在于前者只执行析构函数不回收内存,后面在执行析构之后会回收内存。那析构子类对象的时候会调用子类的deleting destructor,在deleting destructor过程中会调用父类对象的complete object destructor来析构父类对象,最后再由子类统一回收内存,避免了重复回收内存。了解更多
单继承
沿用上述Base类,
class Derive : public Base
{
public:
Derive(int d) :Base(1000), DeriveI(d){};
//overwrite父类虚函数
virtual void print(void){ cout << "Derive::print()" << endl; }
// Derive声明的新的虚函数
virtual void printDerive(){ cout << "Derive::printDerive()" << endl; }
virtual ~Derive(){}
private:
int DeriveI;
};
现在创建一个Derive d对象,d的内存布局如下:
证明如下:
int main(int argc, char *argv[])
{
typedef void (*Fun)();
Derive d(2000);
cout << "d对象大小为: " << sizeof(d) << endl;
Fun fun1 = (Fun)*(long*)*((long*)(&d));
fun1();
Fun fun2 = (Fun)*((long*)*((long*)(&d)) + 3);
fun2();
Fun fun3 = (Fun)*((long*)*((long*)(&d)) + 4);
fun3();
cout << "baseI = " << *((int*)(&d) + 2) << endl;
cout << "DeriveI = " << *((int*)(&d) + 3) << endl;
return 0;
}
运行结果如下:
d对象大小为: 16
Derive::print()
Base::printBase()
Derive::printDerive()
baseI = 1000
DeriveI = 2000
gdb打印虚函数表如下:
(gdb) set print asm-demangle on
(gdb) p /a (*(void***)&d)[-2]@8
$2 = {0x0, 0x400f20 <typeinfo for Derive>, 0x400cec <Derive::print()>, 0x400d40 <Derive::~Derive()>, 0x400d7a <Derive::~Derive()>, 0x400c8a <Base::printBase()>, 0x400d16 <Derive::printDerive()>, 0x0}
结论:单继承体系的对象模型由父类子对象+子对象数据成员+padding组成。子类继承父类的虚函数表,如果子类重写父类的虚函数,就覆盖父类的实现(黄色区),本质就是修改对应指针值让其指向子类实现;如果子类新增虚函数,就在虚函数表基础上扩展(绿色区)。
思考:
1、为什么父类子对象要放在子类对象的最前面?
因为只有这样,当你执行Derive d; Base* b = &d;时,编译器才不需要调整b值,因为b所指向就是Base类型对象(与b的静态类型保持一致)。也许这里你还不清楚为什么要这么做,等到后面多继承部分自然就理解了。
2、 子类为何要继承父类的虚函数表?其实就是控制父类和子类的虚函数在虚函数表中的排放顺序一致,这也是实现多态的基础。试想以下函数:
void fun(Base* b) { b->print(); }
该函数就用到了多态的思想,只有在运行期才能知道调用哪个print()。那编译期能做的是什么呢?b->print()会被编译成伪代码:(this->vptr[0])(),即调用虚指针指向的第一个虚函数。
多继承
沿用上述Base类,
class Base2
{
public:
Base2(int i) :base2I(i){};
int getI(){ return base2I; }
static void countI(){};
virtual void print(){ cout << "Base2::print()" << endl; }
virtual ~Base2(){}
virtual void printBase2(){ cout << "Base2::printBase2()" << endl; }
private:
int base2I;
static int base2S;
};
class Derive :public Base, public Base2
{
public:
Derive(int d) :Base(1000), Base2(2000), deriveI(d){};
virtual void print(){ cout << "Derive::print()" << endl; }
virtual void printDerive(){ cout << "Derive::printDerive()" << endl; }
private:
int deriveI;
};
现在创建一个Derive d对象,d的内存布局如下:
证明如下:
int main(int argc, char *argv[])
{
typedef void (*Fun)();
Derive d(3000);
cout << "d对象大小为: " << sizeof(d) << endl;
Fun fun1 = (Fun)*(long*)*((long*)(&d));
fun1();
Fun fun2 = (Fun)*((long*)*((long*)(&d)) + 3);
fun2();
Fun fun3 = (Fun)*((long*)*((long*)(&d)) + 4);
fun3();
cout << "baseI = " << *((int*)(&d) + 2) << endl;
// 4字节内存对齐
Fun fun4 = (Fun)*((long*)*((long*)(&d) + 2) + 3);
fun4();
cout << "base2I = " << *((int*)(&d) + 6) << endl;
cout << "deriveI = " << *((int*)(&d) + 7) << endl;
return 0;
}
运行结果如下:
d对象大小为: 32
Derive::print()
Base::printBase()
Derive::printDerive()
baseI = 1000
Base2::printBase2()
base2I = 2000
deriveI = 3000
gdb打印虚函数表如下:
(gdb) set print asm-demangle on
(gdb) p /a (*(void***)&d)[-2]@14
$1 = {0x0, 0x401160 <typeinfo for Derive>, 0x400e20 <Derive::print()>, 0x400e7a <Derive::~Derive()>, 0x400eda <Derive::~Derive()>, 0x400cd4 <Base::printBase()>, 0x400e50 <Derive::printDerive()>, 0xfffffffffffffff0, 0x401160 <typeinfo for Derive>, 0x400e4a <non-virtual thunk to Derive::print()>, 0x400ed0 <non-virtual thunk to Derive::~Derive()>, 0x400f00 <non-virtual thunk to Derive::~Derive()>, 0x400d9e <Base2::printBase2()>, 0x0}
结论:多继承体系的对象模型由最左父类子对象+次左父类子对象+...+子对象数据成员+padding组成。子类会继承所有父类的虚函数表,并覆盖其中重写的虚函数,但只会在最左父类继承来的虚函数表中追加子类新增的虚函数,而且次左父类中被重写的虚函数都使用了non-virtual thunk技术。
最左:是指按继承顺序排在最左边的类
non-virtual thunk?
不妨反汇编个non-virtual thunk函数,如0x400e4a <non-virtual thunk to Derive::print()>
(gdb) disassemble 0x400e4a
Dump of assembler code for function _ZThn16_N6Derive5printEv:
0x0000000000400e4a <+0>: sub $0x10,%rdi
0x0000000000400e4e <+4>: jmp 0x400e20 <Derive::print()>
End of assembler dump.
发现,在跳转到真正的Derive::print()之前,会将this指针偏移-16(rdi寄存器存放的便是this指针值,-16就是保存在虚函数表中的top_offset(0xfffffffffffffff0))。
为什么要这么做呢?比如Derive d(3000); Base2* b2 = &d; 编译器为了让b2指向真正的Base2类型对象,会将this指针偏移+16(Base2子对象在Derive对象中的偏移量)。而当你执行b2->print()时,基于多态是想执行Derive::print(),因此要先将this指针调整(-16)让它指向Derive对象。
可见多继承还是比较恶心的,非最左父类子对象与子类对象无法公用一个this指针,所以就有了各种调整。其它语言如Java已经废弃了多继承。
可以做个简单验证:
int main(int argc, char *argv[])
{
Derive d(3000);
Base* b = &d;
Base2* b2 = &d;
cout << &d << endl;
cout << b << endl;
cout << b2 << endl;
return 0;
}
运行结果如下:
0x7ffd49359c00
0x7ffd49359c00
0x7ffd49359c10
虚继承
考虑以下代码:
class Base
{
public:
int base;
public:
Base(int i = 1) :base(i) {}
virtual void fun() { cout << "Base::fun()" << endl; }
virtual void funBase() { cout << "Base::funBase()" << endl; }
};
class Derive: virtual public Base
{
public:
int derive;
public:
Derive(int i = 100) :derive(i) {}
virtual void fun() { cout << "Derive::fun()" << endl; }
virtual void funDerive() { cout << "Derive::funDerive()" << endl; }
};
现在创建一个Derive d对象,d的内存布局如下:
证明如下:
int main(int argc, char *argv[])
{
typedef void (*Fun)();
Derive d;
cout << "d对象内存大小为: " << sizeof(d) << endl;
Fun fun1 = (Fun)*(long*)*((long*)(&d));
fun1();
Fun fun2 = (Fun)*((long*)*((long*)(&d)) + 1);
fun2();
cout << "derive = " << *((int*)(&d) + 2) << endl;
Fun fun3 = (Fun)*((long*)*((long*)(&d) + 2) + 1);
fun3();
cout << "base = " << *((int*)(&d) + 6) << endl;
return 0;
}
运行结果如下:
d对象内存大小为: 32
Derive::fun()
Derive::funDerive()
derive = 100
Base::funBase()
base = 1
gdb打印虚函数表如下:
(gdb) set print asm-demangle on
(gdb) p /a (*(void***)&d)[-3]@12
$3 = {0x10, 0x0, 0x400da0 <typeinfo for Derive>, 0x400b64 <Derive::fun()>, 0x400b98 <Derive::funDerive()>, 0x0, 0xfffffffffffffff0, 0xfffffffffffffff0, 0x400da0 <typeinfo for Derive>, 0x400b8e <virtual thunk to Derive::fun()>, 0x400aea <Base::funBase()>, 0x0}
结论:虚继承体系的对象模型由派生虚表指针+派生数据成员+虚基类子对象+padding组成。编译器重新为派生类生成一个新的虚表指针,安排在对象开始位置,指向派生类虚表(只包含派生类的虚函数)。虚基类子对象会被安排在对象尾部。虚基类被重写的虚函数会使用了virtual thunk技术。
virtual thunk和non-virtual thunk区别?
上一节已经解释过non-virtual thunk,这里再解释下virtual thunk下。直接反汇编0x400b8e <virtual thunk to Derive::fun()>
(gdb) disassemble 0x400b8e
Dump of assembler code for function _ZTv0_n24_N6Derive3funEv:
0x0000000000400b8e <+0>: mov (%rdi),%r10
0x0000000000400b91 <+3>: add -0x18(%r10),%rdi
0x0000000000400b95 <+7>: jmp 0x400b64 <Derive::fun()>
End of assembler dump.
前两条汇编相当于this = this + this->vptr[-3],实则还是调整this,只不过偏移值是存放在vcall_offset中。
菱形继承
考虑以下代码:
class Base
{
public:
int base;
public:
Base(int i = 1) :base(i) {}
virtual void fun() { cout << "Base::fun()" << endl; }
virtual void funBase() { cout << "Base::funBase()" << endl; }
};
class Derive1: virtual public Base
{
public:
int derive1;
public:
Derive1(int i = 100) :derive1(i) {}
virtual void fun() { cout << "Derive1::fun()" << endl; }
virtual void funDerive1() { cout << "Derive1::funDerive1()" << endl; }
};
class Derive2: virtual public Base
{
public:
int derive2;
public:
Derive2(int i = 200) :derive2(i) {}
virtual void fun() { cout << "Derive2::fun()" << endl; }
virtual void funDerive2() { cout << "Derive2::funDerive2()" << endl; }
};
class Son: public Derive1, public Derive2
{
public:
int son;
public:
Son(int i = 1000) :son(i) {}
virtual void fun() { cout << "Son::fun()" << endl; }
virtual void funSon() { cout << "Son::funSon()" << endl; }
};
现在创建一个Son s对象,s的内存布局如下:
证明如下:
int main(int argc, char *argv[])
{
typedef void (*Fun)();
Son s;
cout << "s对象内存大小为: " << sizeof(s) << endl;
Fun fun1 = (Fun)*((int*)*(int*)&s);
fun1();
Fun fun2 = (Fun)*((int*)*(int*)&s + 2);
fun2();
Fun fun3 = (Fun)*((int*)*(int*)&s + 4);
fun3();
cout << "derive1 = " << *((int*)&s + 2) << endl;
Fun fun5 = (Fun)*((int*)*((int*)&s + 4));
fun5();
Fun fun6 = (Fun)*((int*)*((int*)&s + 4) + 2);
fun6();
cout << "derive2 = " << *((int*)&s + 6) << endl;
cout << "son = " << *((int*)&s + 7) << endl;
Fun fun7 = (Fun)*((int*)*((int*)&s + 8) + 2);
fun7();
cout << "base = " << *((int*)&s + 10) << endl;
return 0;
}
运行结果如下:
s对象内存大小为: 48
Son::fun()
Derive1::funDerive1()
Son::funSon()
derive1 = 100
Son::fun()
Derive2::funDerive2()
derive2 = 200
son = 1000
Base::funBase()
base = 1
gdb打印虚函数表如下:
(gdb) set print asm-demangle on
(gdb) p /a (*(void***)&s)[-3]@18
$1 = {0x20, 0x0, 0x401160 <typeinfo for Son>, 0x400da8 <Son::fun()>, 0x400c40 <Derive1::funDerive1()>, 0x400de2 <Son::funSon()>, 0x10, 0xfffffffffffffff0, 0x401160 <typeinfo for Son>, 0x400dd2 <non-virtual thunk to Son::fun()>, 0x400cea <Derive2::funDerive2()>, 0x0, 0xffffffffffffffe0, 0xffffffffffffffe0, 0x401160 <typeinfo for Son>, 0x400dd8 <virtual thunk to Son::fun()>, 0x400b96 <Base::funBase()>, 0x0}
结论:菱形继承体系的对象模型由最左父类子对象+次左父类子对象+...+ 虚基对象+子对象数据成员+padding组成。
另外,如果菱形继承不采用虚继承的对象模型大家可以参考单继承+多继承,没什么好说的,无非就是子类对象中有两份最基类数据。
最后再讲下vbase_offset和vcall_offset的作用。
vbase_offset是指this指针与虚基类子对象的偏移量,通过该值可以调整this到虚基类子对象起始位置,以便执行虚基类未被重写的虚函数。比如执行s.funBase()时,编译器就会调整this = this + vbase_offset值指向虚基类子对象,就可以正确执行Base::funBase()。
vcall_offset是针对具体函数的,虚基类的每一个虚函数都对应一个vcall_offset,通过vcall_offset值调整this到适当位置。比如执行Base* b = &s; b->fun(); 此时b是指向s对象中的虚基类子对象起始位置,通过b调用fun()时,其实是想调用Son::fun(),因此要先将this = this + vcall_offset,让this指向s对象起始位置,然后再调用Son::fun()。