How Polymorphism Works

最新推荐文章于 2024-09-03 18:59:13 发布

IAMZWH

最新推荐文章于 2024-09-03 18:59:13 发布

阅读量360

点赞数 1

分类专栏： C/C++ 文章标签： c语言

本文链接：https://blog.csdn.net/weixin_41365236/article/details/116720719

版权

C/C++ 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

透过对 C 语言的深入理解，可以更好地揭示更高级语言工作的原理。一个例子是 “面向对象”，我们在 C 里也可以实现：

struct foo {
  int (*bar)(struct foo *this, int a, int b); // 函数指针
};

void baz() {
  struct foo *ptr = fetch_object();
  ptr->bar(ptr, 3, 4);
  // 等效于C++: ptr->bar(3, 4)
}

而 C++ 里的对象，的确也是这样 (类似) 实现的。如果要实现动态绑定 (用父类的指针调用子类的方法)，我们只需把虚函数的入口放到一张表中，通过查表得到函数实际的入口地址：

struct object_header {
  void **vptr;
};

struct foo {
  struct object_header header;
  ...
};

void baz {
  struct foo *ptr = fetch_object();
  // ptr->bar(3, 4), dynamic binding
  // INDEX_OF_BAR在编译时由编译器确定
  (int (*)(void *, int, int)) (ptr->header.vptr[INDEX_OF_BAR]) (ptr, 3, 4);
}

How Polymorphism Works

Polymorphism: the core of object oriented programming. Most modern languages have some concept of interfaces, virtual functions, and classes. Though each language differs in details, and may have specialized concepts, the core idea remains the same. You define a base class with virtual functions; a derived class can override some, all, or none of those functions. Have you ever stopped to wonder how this works? What overhead, or cost, is involved in such object oriented programming?

I’ll walk us through how polymorphism works. Rather than just explain it, we’ll recreate polymorphism from the ground up. That is, using C, the language without polymorphism, I’ll show how you can create it, and let you discover how languages implement this feature. I won’t jump directly to a full implementation, instead opting to go through a logical set of steps which eventually brings us to an implementation used in common compilers.

The Interface

Rather than start at a base class we’ll start with an interface. You’ll quickly see that every base class actually has an implicit interface, thus it seems like the reasonable place to start. Let’s define a basic interface using Java syntax — it is simple enough that anybody can understand it.

interface PolyFunc
{
  public void oneParam( int a );
  public int withReturn( );
}

A class that implements this interface must define both functions. These are the member functions that an implementation defines.

Now consider a function that takes a parameter of PolyFunc. What is that function actually expecting? Step down to C at this point and consider a function with the signature void someFunc( PolyFunc * pf ). It is a function that takes a pointer to a PolyFunc interface. What is this function expecting behind that pointer? It is an interface with functions so it must have some way to call those functions. Since functions can be expressed by pointers themselves it makes even more sense if it were just a set of function pointers.

In C we define the interface as a struct with two members: one for each of the functions. If you haven’t done much C coding you’ll have to excuse the rather ugly syntax for declaring function pointers.

struct PolyFunc
{
  void (*oneParam)(int);
  int (*withReturn)();
};

Does that look right or does something appear to be missing. In Java when a member function is called it has access to a variable called this, which points to the current object. In C there are no implicit parameters to functions, so we’ll need an explicit way to communicate this to the function. What type is this? The only type we currently have is PolyFunc so we’ll have to assume that is the type of the pointer. Let’s redefine our structure.

struct PolyFunc
{
  void (*oneParam)( struct PolyFunc *, int);
  int (*withReturn)( struct PolyFunc *);
};

Not seeing how such functions are invoked leaves this a bit unclear. So here is a function, called someFunc, which does exactly that.

void someFunc( struct PolyFunc * pf )
{
  int r = (pf->withReturn)( pf );
  (pf->oneParam)( pf, r );
}

In C the (pf->withReturn)( pf ); is saying to call the function pointed to by withReturn from the structure pf. It also says to call it with one parameter, the value of pf itself. The second call adds an additional parameter to show how this is just normal function call syntax.

There you have it. PolyFunc now completely defines the interface we need as a base of polymorphism. Our example shows exactly how one would call functions via this interface. We just need an implementation now.

The Base Class

An interface isn’t very useful if nobody implements it. We’ll define our class again in Java as it has a clear syntax for implementing interfaces. We won’t care too much about making a useful class, but rather just something that demonstrates how implement works.

class PolyBase implements PolyFunc {
  public int value;

  public void oneParam( int a ) { System.out.println( "" + value ); }
  public int withReturn( ) { return value; }
}

That has to be converted into C syntax that represents about the same thing. We know that we have the interface PolyFunc in there somewhere. We also have a variable value. Let’s just create a simple structure which contains both.

struct PolyBase {
  struct PolyFunc interfacePolyFunc;
  int value;
};

Recall that we’ll need a pointer to the PolyFunc interface. This structure has a PolyFunc of which we can take an address. It is also the first member of the structure; this gives us something extra. In C the first member will actually have the same pointer address as the structure itself. This may not seem important now, but remember it for later.

We have to initialize this structure. Before we can initialize it we need to define each of our functions. We’ll do that all together in the below code. To indicate the function relates to our PolyBase class we’ll prefix each name. We won’t define the body of the functions just yet; this is known as a foward declaration.

void PolyBase_oneParam( struct PolyFunc * pf, int a );
int PolyBase_withReturn( struct PolyFunc * pf );

void init_PolyBase( struct PolyBase * pb) {
  pb->interfacePolyFunc.oneParam = &PolyBase_oneParam;
  pb->interfacePolyFunc.withReturn = &PolyBase_withReturn;
}

It may look complicated, but we actually haven’t done much. First we declared the functions that will be our member functions in the PolyBase class. The second step is far more important: it is the first part of the actual polymorphism. Here we have populated the PolyFunc struct in a PolyBase with pointers to our member functions. Now, if some code would like to instantiate a PolyBase object it could do so as follows.

struct PolyBase pb;
init_PolyBase( &pb );
pb.value= 123;
someFunc( &pb.interfacePolyFunc );

In C that declares an instance of PolyBase on the stack and calls the initializer. Once initialized we can pass the pointer to our previously defined someFunc function. We have explicitly passed a pointer to the interface variable to show what we are doing. The point was made previously however that &pb.interfacePolyFunc and &pb will actually be the same address. This is an extremely important point to consider. We have to look at the body of our member functions to understand why.

Consider the signature for our second member function int PolyBase_withReturn( PolyFunc * pf ). The pf parameter is of type PolyFunc and not PolyBase. Yet we can forsee that our member functions, if they intend to access the value variable, will need a pointer to PolyBase. Let’s use what we know and define our function.

int PolyBase_withReturn( struct PolyFunc * pf) {
  struct PolyBase * this = (struct PolyBase*)pf;
  return this->value;
}

That was actually quite easy. Since we initialized the interface we know that callers to this function actually have a pointer to PolyBase.interfacePolyFunc. We also know from before that this will actually be the exact same pointer as the PolyBase object itself. Thus we can statically cast it to our type and we have our familiar this pointer.

That’s everything we need to define classes implementing the PolyFunc interface. You can see that we could easily define any number of PolyBase like classes, each with their own member functions. The next step is to show how we can derive a class from PolyBase.

A Derived Class

One of the key features of polymorphism is virtual functions. A dervied class should be able to override functions in a base class, specializing its behaviour. Continuing with our PolyBase example we’ll now create a class called PolyDerived. First we’ll start with what this looks like in Java.

class PolyDerived extends PolyBase {
  String prefix;

  PolyDerived() { prefix = "ABC:"; }
  void oneParam( int a ) { System.out.println( prefix + a ); }
}

In this derived class oneParam is overridden to write the value with a prefix. Notice that a constructor has also been added in this class; when we convert to C you’ll see this isn’t really much of a change. Since we haven’t introduced any new functions PolyDesired simply implements the PolyFunc interface. Converting to C we get the below.

struct PolyDerived {
  struct PolyBase base_PolyBase;
  char prefix[20];
};

void PolyDerived_oneParam( struct PolyFunc * pf, int a );

void init_PolyDerived( struct PolyDerived * pd ) {
  init_PolyBase( (struct PolyBase*)pd );
  pd->base_PolyBase.interfacePolyFunc.oneParam = &PolyDerived_oneParam;

  strcpy( pd->prefix, "ABC:" );
}

The derived structure PolyDerived simply contains the PolyBase structure as its first member. Recall that tip about the pointers in PolyBase? In this case we have three equivalent pointers, the pointer to PolyDerived is also a pointer to PolyBase which is in turn a pointer to the interface PolyFunc. Notice how the derived init function simply calls the base class init function and overrides a single function pointer.

In addition to the base class we’ve also defined the additional member prefix. It is also initialized in the init_PolyDerived method. The base class doesn’t initialize any members, though it could, and probably should initialize the value member. In this regards init_PolyDerived has become the constructor for our class.

That completes the derived class. Quite easy wasn’t it? Now we can simply use the same code we had before and substitue PolyDerived for PolyBase. someFunc will now call a function in the base class, and in the derived class, exactly as you’d expect a virtual function to work.

struct PolyDerived pd;
init_PolyDerived( &pd );
pd.base_PolyBase.value= 123;
someFunc( (struct PolyFunc*)&pd );

First Review

In this article we’ve learned a way to implement simple polymorphism in C. The intent is to expose what is required to get virtual function calls working in a language like Java, or C++. I do not know of any compiler which actually implements virtual functions this way. We’ll get to the reason in the next article about virtual tables. Nonetheless the techniques you’ll encounter are quite similar to this. They simply improve on some aspects, and offer more features.

If you want to analyze the cost of virtual tables this is your starting point. Rather than making a simple function call you see that you first have to look up the value of the function pointer. You also need to pass a this pointer, but that isn’t really an additional cost: you would presume a non-virtual function version would also need a pointer to a data structure. In this simple case we’ve seen that conversion of the this pointer itself is zero-cost, so no overhead there. Our biggest overhead is the size of our structures, and that is what we’ll get to in the next article.

In the article on “How Polymorphism Works: Part 1” we learned how to create virtual functions. The method which we chose has at one significant problem with it: it requires a lot of memory and repeated initialization. In this article we will look at a way to improve on the situation and bring us closer to a modern vtable implementation.

Previous Problem

In our rather simple virtual function implementation we simply put a pointer to each virtual function inside the data for an instance of a class. This looked like the below C structures.

struct PolyFunc {
  void (*oneParam)(struct PolyFunc *,int);
  int (*withReturn)(struct PolyFunc *);
};
 
struct PolyBase {
  struct PolyFunc interfacePolyFunc;
  int value;
};

This means that for each instance of PolyBase we have a complete copy of the member function pointers. That seems like a lot of wasted memory, especially for small objects. It also seems unusual that every addtional virtual function would increase the size of an object. Additionally each time we initialize a new object we have to set all of those pointers. While it may not seem like much for a tiny class, this will add up and create a performance issue.

A Virtual Table

Knowing that the pointers to virtual functions are the same for every instance of the class it seems like we should be able to create that list of pointers just once. That is, we should only ever need one instance of PolyFunc per implementing class. Easily done, we simply modify the definition of PolyBase.

struct PolyBase {
  struct PolyFunc * interfacePolyFunc;
  int value;
};

Now, instead of setting the pointers each time we initialize this class we can define a single global singleton and assign the pointer at construction time.

struct PolyFunc PolyBase_VTable = {
  &PolyBase_oneParam,
  &PolyBase_withReturn
};
 
void init_PolyBase( struct PolyBase * pb ) {
  pb->interfacePolyFunc = &PolyBase_VTable;
}

We’ve called this global instance a VTable, short for Virtual Table, and you’ll find it called that quite often. The C initialization syntax is used here to set each member to the address of the function. With this setup we can add any number of virtual functions and never increase the size of an object nor increase the initialization time.

The Callee

We have a bit of a problem however. The function we are calling is expecting a pointer to the PolyFunc struct, but it is also expecting that pointer to be a pointer to the instance itself. Seen in isolation you may not notice a problem, so below it is shown alongside the calling code and the member function.

void someFunc( struct PolyFunc * pf) {
  int r = (pf->withReturn)( pf );
  (pf->oneParam)( pf, r );
}
 
int PolyBase_withReturn( struct PolyFunc * pf ) {
  struct PolyBase * this = (struct PolyBase*)pf;
  return this->value;
}
 
void calling() {
  struct PolyBase pb;
  init_PolyBase( &pb );
  someFunc( pb.interfacePolyFunc ); //No & here!
}

Look at calling first. Previously we took the address of interfacePolyFunc since that is what someFunc expects. Now since we already have a pointer we shouldn’t need to do that. This seems okay so far. Inside the someFunc code the first function call will actually work, the PolyFunc object we’ve provided is valid in this regards. The trouble starts with the parameter we pass to the function.

In PolyBase_withReturn the first line expects the PolyFunc pointer to be directly convertible to a PolyBase pointer. This is no longer valid and this function will now fail: it may cause the program to crash, or silently return an invalid value. The problem started with the call to someFunc. By passing the value of the virtual table directly we’ve completely lost the reference to the original object. We need a way to pass the original object and the virtual table.

Pointer to Pointer

We could simple pass two pointers, but this introduces a slight overhead in every function that we do and we’d like to avoid that. It may not be a terrible idea however, as there are likely situations where it is the better thing to do. In fact there is a very common situation where something like this is done: with callback functions. Within many APIs you’ll see functions with signatures that include a callback function and a user defined value to provide to that function.

void FuncWithCallback( void (*callback)( int ), int userValue ) {
  callback( userValue );
}

This is pretty much the standard C way of doing callback functions. You provide both an explicit function and the data to pass to that function. Here only a single function pointer is being passed rather than a table of functions, but that is rather a small detail. Since this method is used often you can’t completely discount the two parameter approach. In our general case however we will try to stick with one parameter.

In order to preserve the reference to the original object and provide a pointer to the virtual table someFunc is going to have to expect something slightly different as a parameter. Let’s first start by making it expect a pointer to the pointer.

void someFunc( struct PolyFunc * * pf ) {
  int r = ((*pf)->withReturn)( pf );
  ((*pf)->oneParam)( pf, r );
}
 
void calling() {
  struct PolyBase pb;
  init_PolyBase( &pb );
  someFunc( &pb.interfacePolyFunc );
}

This works! The calling code is exactly as we had in the last article and someFunc needs only to dereference the pointer. Again, since interfacePolyFunc is the first member in the PolyBase structure it actually has the same address as the instance itself. This allows the member functions to trivially cast this pointer back into their own type as they did before.

The use of a pointer to a pointer may not look so clean however. Here C provides many options to make this cleaner. We could simply make a typedef of this type, or we could instead make a new structure which contains a pointer as its only member. Whichever way we chose won’t alter the actual semantics of the code, so we won’t go over it here.

The Common VTable

This implementation of a virtual table is a rather common one. What essentially defines it is that the first member of every object instance is a pointer to its virtual table. It reduces the overhead of instantiating classes and reduces the amount of memory used. The tradeoff is that functions have one additional dereference before they can call the function. The cost of this dereference is extremely low, generally a single low-cost CPU operation. You’d generally find this cost to be insignificant in most programs.

GCC, MSVC, and many other compilers use this setup for C++ virtual tables. In fact MS defined this as the binary standard for C++ interfaces in windows at one point. I’d suspect it is still the standard in use today. This doesn’t mean a compiler has to use this method (well, on Windows it kind of does). Ultimately any compiler will need to maintain a list of function tables somewhere. Knowing how this works for C++ will be a good step to understanding whatever other method you may encounter.

IAMZWH

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
How Polymorphism Works

透过对 C 语言的深入理解，可以更好地揭示更高级语言工作的原理。一个例子是 “面向对象”，我们在 C 里也可以实现：struct foo { int (*bar)(struct foo *this, int a, int b); // 函数指针};void baz() { struct foo *ptr = fetch_object(); ptr->bar(ptr, 3, 4); // 等效于C++: ptr->bar(3, 4)}而 C++ 里的对象，的确也是这样
复制链接

扫一扫