How Polymorphism Works

透过对 C 语言的深入理解,可以更好地揭示更高级语言工作的原理。一个例子是 “面向对象”,我们在 C 里也可以实现:

struct foo {
  int (*bar)(struct foo *this, int a, int b); // 函数指针
};

void baz() {
  struct foo *ptr = fetch_object();
  ptr->bar(ptr, 3, 4);
  // 等效于C++: ptr->bar(3, 4)
}

而 C++ 里的对象,的确也是这样 (类似) 实现的。如果要实现动态绑定 (用父类的指针调用子类的方法),我们只需把虚函数的入口放到一张表中,通过查表得到函数实际的入口地址:

struct object_header {
  void **vptr;
};

struct foo {
  struct object_header header;
  ...
};

void baz {
  struct foo *ptr = fetch_object();
  // ptr->bar(3, 4), dynamic binding
  // INDEX_OF_BAR在编译时由编译器确定
  (int (*)(void *, int, int)) (ptr->header.vptr[INDEX_OF_BAR]) (ptr, 3, 4);
}

How Polymorphism Works

Polymorphism: the core of object oriented programming. Most modern languages have some concept of interfaces, virtual functions, and classes. Though each language differs in details, and may have specialized concepts, the core idea remains the same. You define a base class with virtual functions; a derived class can override some, all, or none of those functions. Have you ever stopped to wonder how this works? What overhead, or cost, is involved in such object oriented programming?

I’ll walk us through how polymorphism works. Rather than just explain it, we’ll recreate polymorphism from the ground up. That is, using C, the language without polymorphism, I’ll show how you can create it, and let you discover how languages implement this feature. I won’t jump directly to a full implementation, instead opting to go through a logical set of steps which eventually brings us to an implementation used in common compilers.

The Interface

Rather than start at a base class we’ll start with an interface. You’ll quickly see that every base class actually has an implicit interface, thus it seems like the reasonable place to start. Let’s define a basic interface using Java syntax — it is simple enough that anybody can understand it.

interface PolyFunc
{
  public void oneParam( int a );
  public int withReturn( );
}

A class that implements this interface must define both functions. These are the member functions that an implementation defines.

Now consider a function that takes a parameter of PolyFunc. What is that function actually expecting? Step down to C at this point and consider a function with the signature void someFunc( PolyFunc * pf ). It is a function that takes a pointer to a PolyFunc interface. What is this function expecting behind that pointer? It is an interface with functions so it must have some way to call those functions. Since functions can be expressed by pointers themselves it makes even more sense if it were just a set of function pointers.

In C we define the interface as a struct with two members: one for each of the functions. If you haven’t done much C coding you’ll have to excuse the rather ugly syntax for declaring function pointers.

struct PolyFunc
{
  void (*oneParam)(int);
  int (*withReturn)();
};

Does that look right or does something appear to be missing. In Java when a member function is called it has access to a variable called this, which points to the current object. In C there are no implicit parameters to functions, so we’ll need an explicit way to communicate this to the function. What type is this? The only type we currently have is PolyFunc so we’ll have to assume that is the type of the pointer. Let’s redefine our structure.

struct PolyFunc
{
  void (*oneParam)( struct PolyFunc *, int);
  int (*withReturn)( struct PolyFunc *);
};

Not seeing how such functions are invoked leaves this a bit unclear. So here is a function, called someFunc, which does exactly that.

void someFunc( struct PolyFunc * pf )
{
  int r = (pf->withReturn)( pf );
  (pf->oneParam)( pf, r );
}

In C the (pf->withReturn)( pf ); is saying to call the function pointed to by withReturn from the structure pf. It also says to call it with one parameter, the value of pf itself. The second call adds an additional parameter to show how this is just normal function call syntax.

There you have it. PolyFunc now completely defines the interface we need as a base of polymorphism. Our example shows exactly how one would call functions via this interface. We just need an implementation now.

The Base Class

An interface isn’t very useful if nobody implements it. We’ll define our class again in Java as it has a clear syntax for implementing interfaces. We won’t care too much about making a useful class, but rather just something that demonstrates how implement works.

class PolyBase implements PolyFunc {
  public int value;

  public void oneParam( int a ) { System.out.println( "" + value ); }
  public int withReturn( ) { return value; }
}

That has to be converted into C syntax that represents about the same thing. We know that we have the interface PolyFunc in there somewhere. We also have a variable value. Let’s just create a simple structure which contains both.

struct PolyBase {
  struct PolyFunc interfacePolyFunc;
  int value;
};

Recall that we’ll need a pointer to the PolyFunc interface. This structure has a PolyFunc of which we can take an address. It is also the first member of the structure; this gives us something extra. In C the first member will actually have the same pointer address as the structure itself. This may not seem important now, but remember it for later.

We have to initialize this structure. Before we can initialize it we need to define each of our functions. We’ll do that all together in the below code. To indicate the function relates to our PolyBase class we’ll prefix each name. We won’t define the body of the functions just yet; this is known as a foward declaration.

void PolyBase_oneParam( struct PolyFunc * pf, int a );
int PolyBase_withReturn( struct PolyFunc * pf );

void init_PolyBase( struct PolyBase * pb) {
  pb->interfacePolyFunc.oneParam = &PolyBase_oneParam;
  pb->interfacePolyFunc.withReturn = &PolyBase_withReturn;
}

It may look complicated, but we actually haven’t done much. First we declared the functions that will be our member functions in the PolyBase class. The second step is far more important: it is the first part of the actual polymorphism. Here we have populated the PolyFunc struct in a PolyBase with pointers to our member functions. Now, if some code would like to instantiate a PolyBase object it could do so as follows.

struct PolyBase pb;
init_PolyBase( &pb );
pb.value= 123;
someFunc( &pb.interfacePolyFunc );

In C that declares an instance of PolyBase on the stack and calls the initializer. Once initialized we can pass the pointer to our previously defined someFunc function. We have explicitly passed a pointer to the interface variable to show what we are doing. The point was made previously however that &pb.interfacePolyFunc and &pb will actually be the same address. This is an extremely important point to consider. We have to look at the body of our member functions to understand why.

Consider the signature for our second member function int PolyBase_withReturn( PolyFunc * pf ). The pf parameter is of type PolyFunc and not PolyBase. Yet we can forsee that our member functions, if they intend to access the value variable, will need a pointer to PolyBase. Let’s use what we know and define our function.

int PolyBase_withReturn( struct PolyFunc * pf) {
  struct PolyBase * this = (struct PolyBase*)pf;
  return this->value;
}

That was actually quite easy. Since we initialized the interface we know that callers to this function actually have a pointer to PolyBase.interfacePolyFunc. We also know from before that this will actually be the exact same pointer as the PolyBase object itself. Thus we can statically cast it to our type and we have our familiar this pointer.

That’s everything we need to define classes implementing the PolyFunc interface. You can see that we could easily define any number of PolyBase like classes, each with their own member functions. The next step is to show how we can derive a class from PolyBase.

A Derived Class

One of the key features of polymorphism is virtual functions. A dervied class should be able to override functions in a base class, specializing its behaviour. Continuing with our PolyBase example we’ll now create a class called PolyDerived. First we’ll start with what this looks like in Java.

class PolyDerived extends PolyBase {
  String prefix;

  PolyDerived() { prefix = "ABC:"; }
  void oneParam( int a ) { System.out.println( prefix + a ); }
}

In this derived class oneParam is overridden to write the value with a prefix. Notice that a constructor has also been added in this class; when we convert to C you’ll see this isn’t really much of a change. Since we haven’t introduced any new functions PolyDesired simply implements the PolyFunc interface. Converting to C we get the below.

struct PolyDerived {
  struct PolyBase base_PolyBase;
  char prefix[20];
};

void PolyDerived_oneParam( struct PolyFunc * pf, int a );

void init_PolyDerived( struct PolyDerived * pd ) {
  init_PolyBase( (struct PolyBase*)pd );
  pd->base_PolyBase.interfacePolyFunc.oneParam = &PolyDerived_oneParam;

  strcpy( pd->prefix, "ABC:" );
}

The derived structure PolyDerived simply contains the PolyBase structure as its first member. Recall that tip about the pointers in PolyBase? In this case we have three equivalent pointers, the pointer to PolyDerived is also a pointer to PolyBase which is in turn a pointer to the interface PolyFunc. Notice how the derived init function simply calls the base class init function and overrides a single function pointer.

In addition to the base class we’ve also defined the additional member prefix. It is also initialized in the init_PolyDerived method. The base class doesn’t initialize any members, though it could, and probably should initialize the value member. In this regards init_PolyDerived has become the constructor for our class.

That completes the derived class. Quite easy wasn’t it? Now we can simply use the same code we had before and substitue PolyDerived for PolyBase. someFunc will now call a function in the base class, and in the derived class, exactly as you’d expect a virtual function to work.

struct PolyDerived pd;
init_PolyDerived( &pd );
pd.base_PolyBase.value= 123;
someFunc( (struct PolyFunc*)&pd );

First Review

In this article we’ve learned a way to implement simple polymorphism in C. The intent is to expose what is required to get virtual function calls working in a language like Java, or C++. I do not know of any compiler which actually implements virtual functions this way. We’ll get to the reason in the next article about virtual tables. Nonetheless the techniques you’ll encounter are quite similar to this. They simply improve on some aspects, and offer more features.

If you want to analyze the cost of virtual tables this is your starting point. Rather than making a simple function call you see that you first have to look up the value of the function pointer. You also need to pass a this pointer, but that isn’t really an additional cost: you would presume a non-virtual function version would also need a pointer to a data structure. In this simple case we’ve seen that conversion of the this pointer itself is zero-cost, so no overhead there. Our biggest overhead is the size of our structures, and that is what we’ll get to in the next article.

In the article on “How Polymorphism Works: Part 1” we learned how to create virtual functions. The method which we chose has at one significant problem with it: it requires a lot of memory and repeated initialization. In this article we will look at a way to improve on the situation and bring us closer to a modern vtable implementation.

Previous Problem

In our rather simple virtual function implementation we simply put a pointer to each virtual function inside the data for an instance of a class. This looked like the below C structures.

struct PolyFunc {
  void (*oneParam)(struct PolyFunc *,int);
  int (*withReturn)(struct PolyFunc *);
};
 
struct PolyBase {
  struct PolyFunc interfacePolyFunc;
  int value;
};

This means that for each instance of PolyBase we have a complete copy of the member function pointers. That seems like a lot of wasted memory, especially for small objects. It also seems unusual that every addtional virtual function would increase the size of an object. Additionally each time we initialize a new object we have to set all of those pointers. While it may not seem like much for a tiny class, this will add up and create a performance issue.

A Virtual Table

Knowing that the pointers to virtual functions are the same for every instance of the class it seems like we should be able to create that list of pointers just once. That is, we should only ever need one instance of PolyFunc per implementing class. Easily done, we simply modify the definition of PolyBase.

struct PolyBase {
  struct PolyFunc * interfacePolyFunc;
  int value;
};

Now, instead of setting the pointers each time we initialize this class we can define a single global singleton and assign the pointer at construction time.

struct PolyFunc PolyBase_VTable = {
  &PolyBase_oneParam,
  &PolyBase_withReturn
};
 
void init_PolyBase( struct PolyBase * pb ) {
  pb->interfacePolyFunc = &PolyBase_VTable;
}

We’ve called this global instance a VTable, short for Virtual Table, and you’ll find it called that quite often. The C initialization syntax is used here to set each member to the address of the function. With this setup we can add any number of virtual functions and never increase the size of an object nor increase the initialization time.

The Callee

We have a bit of a problem however. The function we are calling is expecting a pointer to the PolyFunc struct, but it is also expecting that pointer to be a pointer to the instance itself. Seen in isolation you may not notice a problem, so below it is shown alongside the calling code and the member function.

void someFunc( struct PolyFunc * pf) {
  int r = (pf->withReturn)( pf );
  (pf->oneParam)( pf, r );
}
 
int PolyBase_withReturn( struct PolyFunc * pf ) {
  struct PolyBase * this = (struct PolyBase*)pf;
  return this->value;
}
 
void calling() {
  struct PolyBase pb;
  init_PolyBase( &pb );
  someFunc( pb.interfacePolyFunc ); //No & here!
}

Look at calling first. Previously we took the address of interfacePolyFunc since that is what someFunc expects. Now since we already have a pointer we shouldn’t need to do that. This seems okay so far. Inside the someFunc code the first function call will actually work, the PolyFunc object we’ve provided is valid in this regards. The trouble starts with the parameter we pass to the function.

In PolyBase_withReturn the first line expects the PolyFunc pointer to be directly convertible to a PolyBase pointer. This is no longer valid and this function will now fail: it may cause the program to crash, or silently return an invalid value. The problem started with the call to someFunc. By passing the value of the virtual table directly we’ve completely lost the reference to the original object. We need a way to pass the original object and the virtual table.

Pointer to Pointer

We could simple pass two pointers, but this introduces a slight overhead in every function that we do and we’d like to avoid that. It may not be a terrible idea however, as there are likely situations where it is the better thing to do. In fact there is a very common situation where something like this is done: with callback functions. Within many APIs you’ll see functions with signatures that include a callback function and a user defined value to provide to that function.

void FuncWithCallback( void (*callback)( int ), int userValue ) {
  callback( userValue );
}

This is pretty much the standard C way of doing callback functions. You provide both an explicit function and the data to pass to that function. Here only a single function pointer is being passed rather than a table of functions, but that is rather a small detail. Since this method is used often you can’t completely discount the two parameter approach. In our general case however we will try to stick with one parameter.

In order to preserve the reference to the original object and provide a pointer to the virtual table someFunc is going to have to expect something slightly different as a parameter. Let’s first start by making it expect a pointer to the pointer.

void someFunc( struct PolyFunc * * pf ) {
  int r = ((*pf)->withReturn)( pf );
  ((*pf)->oneParam)( pf, r );
}
 
void calling() {
  struct PolyBase pb;
  init_PolyBase( &pb );
  someFunc( &pb.interfacePolyFunc );
}

This works! The calling code is exactly as we had in the last article and someFunc needs only to dereference the pointer. Again, since interfacePolyFunc is the first member in the PolyBase structure it actually has the same address as the instance itself. This allows the member functions to trivially cast this pointer back into their own type as they did before.

The use of a pointer to a pointer may not look so clean however. Here C provides many options to make this cleaner. We could simply make a typedef of this type, or we could instead make a new structure which contains a pointer as its only member. Whichever way we chose won’t alter the actual semantics of the code, so we won’t go over it here.

The Common VTable

This implementation of a virtual table is a rather common one. What essentially defines it is that the first member of every object instance is a pointer to its virtual table. It reduces the overhead of instantiating classes and reduces the amount of memory used. The tradeoff is that functions have one additional dereference before they can call the function. The cost of this dereference is extremely low, generally a single low-cost CPU operation. You’d generally find this cost to be insignificant in most programs.

GCC, MSVC, and many other compilers use this setup for C++ virtual tables. In fact MS defined this as the binary standard for C++ interfaces in windows at one point. I’d suspect it is still the standard in use today. This doesn’t mean a compiler has to use this method (well, on Windows it kind of does). Ultimately any compiler will need to maintain a list of function tables somewhere. Knowing how this works for C++ will be a good step to understanding whatever other method you may encounter.

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
完整版:https://download.csdn.net/download/qq_27595745/89522468 【课程大纲】 1-1 什么是java 1-2 认识java语言 1-3 java平台的体系结构 1-4 java SE环境安装和配置 2-1 java程序简介 2-2 计算机中的程序 2-3 java程序 2-4 java类库组织结构和文档 2-5 java虚拟机简介 2-6 java的垃圾回收器 2-7 java上机练习 3-1 java语言基础入门 3-2 数据的分类 3-3 标识符、关键字和常量 3-4 运算符 3-5 表达式 3-6 顺序结构和选择结构 3-7 循环语句 3-8 跳转语句 3-9 MyEclipse工具介绍 3-10 java基础知识章节练习 4-1 一维数组 4-2 数组应用 4-3 多维数组 4-4 排序算法 4-5 增强for循环 4-6 数组和排序算法章节练习 5-0 抽象和封装 5-1 面向过程的设计思想 5-2 面向对象的设计思想 5-3 抽象 5-4 封装 5-5 属性 5-6 方法的定义 5-7 this关键字 5-8 javaBean 5-9 包 package 5-10 抽象和封装章节练习 6-0 继承和多态 6-1 继承 6-2 object类 6-3 多态 6-4 访问修饰符 6-5 static修饰符 6-6 final修饰符 6-7 abstract修饰符 6-8 接口 6-9 继承和多态 章节练习 7-1 面向对象的分析与设计简介 7-2 对象模型建立 7-3 类之间的关系 7-4 软件的可维护与复用设计原则 7-5 面向对象的设计与分析 章节练习 8-1 内部类与包装器 8-2 对象包装器 8-3 装箱和拆箱 8-4 练习题 9-1 常用类介绍 9-2 StringBuffer和String Builder类 9-3 Rintime类的使用 9-4 日期类简介 9-5 java程序国际化的实现 9-6 Random类和Math类 9-7 枚举 9-8 练习题 10-1 java异常处理 10-2 认识异常 10-3 使用try和catch捕获异常 10-4 使用throw和throws引发异常 10-5 finally关键字 10-6 getMessage和printStackTrace方法 10-7 异常分类 10-8 自定义异常类 10-9 练习题 11-1 Java集合框架和泛型机制 11-2 Collection接口 11-3 Set接口实现类 11-4 List接口实现类 11-5 Map接口 11-6 Collections类 11-7 泛型概述 11-8 练习题 12-1 多线程 12-2 线程的生命周期 12-3 线程的调度和优先级 12-4 线程的同步 12-5 集合类的同步问题 12-6 用Timer类调度任务 12-7 练习题 13-1 Java IO 13-2 Java IO原理 13-3 流类的结构 13-4 文件流 13-5 缓冲流 13-6 转换流 13-7 数据流 13-8 打印流 13-9 对象流 13-10 随机存取文件流 13-11 zip文件流 13-12 练习题 14-1 图形用户界面设计 14-2 事件处理机制 14-3 AWT常用组件 14-4 swing简介 14-5 可视化开发swing组件 14-6 声音的播放和处理 14-7 2D图形的绘制 14-8 练习题 15-1 反射 15-2 使用Java反射机制 15-3 反射与动态代理 15-4 练习题 16-1 Java标注 16-2 JDK内置的基本标注类型 16-3 自定义标注类型 16-4 对标注进行标注 16-5 利用反射获取标注信息 16-6 练习题 17-1 顶目实战1-单机版五子棋游戏 17-2 总体设计 17-3 代码实现 17-4 程序的运行与发布 17-5 手动生成可执行JAR文件 17-6 练习题 18-1 Java数据库编程 18-2 JDBC类和接口 18-3 JDBC操作SQL 18-4 JDBC基本示例 18-5 JDBC应用示例 18-6 练习题 19-1 。。。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值