A gentle introduction to Template Metaprogramming with C++

A gentle introduction to Template Metaprogramming with C++

moliate6 Mar 2003
Rate this:  
Abusing your compiler for extremely early binding

Background

A while ago I was working on a program that used a large lookup table to do some computations. The contents of the table depended on a large number of parameters that I had to tweak to get optimal performance from my code. And every time I changed one parameter the whole table had to be recalculated...

I had written a function that dumped the content of the table to standard output. That way I could compile my program with the new parameters set, cut-and-paste the table from screen and into my source code and finally recompile the whole project. I didn't mind doing that - the first twenty times. After that I figured there had to be a better way. There was. Enter Template Metaprogramming.

What is Template Metaprogramming (TMP)

Template Metaprogramming is a generic programming technique that uses extremely early binding. The compiler acts as an interpreter or a "virtual computer" that emits the instructions that make up the final program. It can be used for static configuration, adaptive programs, optimization and much more. 

 

In this article I intend to explain how to use C++ template classes to produce compile-time generated code.

Different kind of Metatemplates

I make a difference between two kinds on Metatemplates - ones that  calculate a constant value and ones that  produce code. The difference is that the first kind should  never produce instructions that are executed at runtime.

Templates that calculate a value

Assume you want to calculate the number of bits set in an byte. If you do this at runtime you might have a function looking like this:
int bits_set(unsigned char byte)
{int count = 0;
 for (int i = 0; i < 8; i++)
    if ( (0x1L << i) & byte ) 
        count++;
 
 return count;
}
In cases where the byte is known at compile time this can also be done by the compiler, using TMP:
template< unsigned char byte > class BITS_SET
{
public:
    enum {
     B0 = (byte & 0x01) ? 1:0,
     B1 = (byte & 0x02) ? 1:0,
     B2 = (byte & 0x04) ? 1:0,
     B3 = (byte & 0x08) ? 1:0,
     B4 = (byte & 0x10) ? 1:0,
     B5 = (byte & 0x20) ? 1:0,
     B6 = (byte & 0x40) ? 1:0,
     B7 = (byte & 0x80) ? 1:0
    };
public:
 enum{RESULT = B0+B1+B2+B3+B4+B5+B6+B7};
};

I have used an enum for the temporary variables as well as for the result since they are easier to use and enumerators have the type of const int. Another way would be to use static const int:s in the class instead.

You can now use BITS_SET<15>::RESULT and get the constant 4 in your code. In this case the compiler evaluate the line enum{RESULT = B0+B1+B2+B3+B4+B5+B6+B7}; to enum{RESULT = 1+1+1+1+0+0+0+0}; and finally to enum{RESULT = 4};.

It is also possible to calculate a value using a loop. With TMP we rely on recursion within the template definition. The following code is a compile-time factorial calculator:
template< int i >
class FACTOR{
  public:
      enum {RESULT = i * FACTOR<I-1>::RESULT};
};

class FACTOR< 1 >{
  public:
      enum {RESULT = 1};
};
If we for example write this:
int j = FACTOR< 5 >::RESULT;

somewhere in our code the compiler will generate something like the following line of assembler code:

; int j = FACTOR< 5 >::RESULT;
mov    DWORD PTR _j$[ebp], 120            ; 00000078H - a constant value!

How does this work? As we instantiate FACTOR<5> the definition of this class depends on FACTOR<4>, which in turn depend on FACTOR<3> and so on. The compiler needs to create all these classes until the template specialization FACTOR<1> is reached. This means the entire recursion is done by the compiler, while the final program just contain a constant.

Templates that unroll loops/specialize functions

Template metaprograms can generate useful code when interpreted by the compiler, for example a massively inlined algorithm that has its loops unrolled. The result is usually a large speed increase in the application.

For example, look at the following code that calculates the sum of the numbers 1..1000:

int sum = 0;
for (int i = 1 ; i <= 1000; i++)
     sum += i;

We are actually performing 2000 additions, rather than 1000 (as we have to increment i by one for each loop). In addition we perform a thousand test operations on the variable i. Another way would be to write the code the following way:

int sum = 0;
sum += 1;
sum += 2;
...
sum += 1000;

This is the way a Template Metaprogram would expand a loop. Now we perform exactly a thousand additions, but this method also has a prize. The code size increase, meaning that we theoretically could take a performance hit by increasing the number of page faults. In practice, though, code is often invoked multiple times and already loaded in cache.

Loop unrolling

Loop unrolling is easily defined using recursive templates, similar to calculating a value:

template< int i >
class LOOP{
  public:
    static inline void EXEC(){
      cout << "A-" << i << " ";
            LOOP< i-1 >::EXEC();
       cout << "B-" << i << " ";
    }
};

class LOOP< 0 >{
  public:
    static inline void EXEC(){
      cout << "A-" << i;
      cout << "\n"; 
       cout << "B-" << i;
    }
};

The output of LOOP< 8 >::EXEC() is shown below:

A-8 A-7 A-6 A-5 A-4 A-3 A-2 A-1 A-0 
B-0 B-1 B-2 B-3 B-4 B-5 B-6 B-7 B-8

Again, the thing to notice is that there is no loop in the resulting binary code. The loop unrolls itself to produce code like:

cout << "A-" << 8 << " ";
cout << "A-" << 7 << " ";
...
cout << "A-" << 0;
cout << "\n"; 
cout << "B-" << 0;
...
cout << "B-" << 7 << " ";
cout << "B-" << 8 << " ";

An unrelated, but interesting thing can be found in class LOOP< 0 >. Look at how LOOP< 0 >::EXEC() uses i. This identifier has been declared in the template LOOP< int i >, but is still accessible from the "special case" LOOP< 0 >. I don't know if this is standard C++ behavior, however.

Beside loops other statements can be constructed:

IF - statement
template< bool Condition >
class IF {
public:
    static inline void EXEC(){
    cout << "Statement is true";
    }
};

class IF< false > {
public:
    static inline void EXEC(){
    cout << "Statement is false";
    }
};
 
SWITCH - statement
template< int _case >
class SWITCH {
public:
    static inline void EXEC(){
        cout << " SWITCH - default ";
    }
};

class SWITCH< 1 > {
    public:
    static inline void EXEC(){
        cout << " SWITCH - 1 ";
    }
};

class SWITCH< 2 > {
    public:
    static inline void EXEC(){
        cout << " SWITCH - 2 ";
    }
};

...

Example of usage of the two classes:

SWITCH< 2 > myTwoSwitch; // store for delayed execution
myTwoSwitch.EXEC();
IF< false >::EXEC();
myTwoSwitch.EXEC();

The output will be: " SWITCH - 2 Statement is false SWITCH - 2 "

Using Meta-Metatemplates

It is possible to define a generic template for a special kind of operation, like an  if- or  for-statement. I like to call this a  Meta-Metatemplate since the operation is defined in a class outside the template itself. Even if this might be useful it can make the code very hard to understand in complex cases. Also it will probably take a few years before compilers will be able to take full advantage of these kinds of constructs. For now I prefer to use specialized templates, but for simple cases Meta-Metatemplates might be useful. Sample code for the simplest of these constructs, the  if-statement is given below:
template< bool Condition, class THEN, class ELSE > struct IF
{
    template< bool Condition > struct selector
    {typedef THEN SELECT_CLASS;}; 
 
    struct selector< false >
    {typedef ELSE SELECT_CLASS;};     

 typedef selector< Condition >::SELECT_CLASS RESULT;
};

Example of usage:

struct THEN
{
 static int func() 
 {cout << "Inside THEN";
  return 42;
 }
};

struct ELSE
{
 static int func() 
 {cout << "Inside ELSE";
  return 0;
 }
};

int main(int argc, char* argv[])
{
 int result = IF< 4 == sizeof(int), THEN, ELSE >::RESULT::func();
 cout << " - returning: " << result;
}

On 32-bit architectures this will print "Inside THEN - returning: 42" to standard output. Note that if func() was not defined inside ELSE this would be a simple compile-time assert breaking compilation on 4 != sizeof(int).

A real world example

Included as a sample is a small class that does generic CRC (Cyclic Redundancy Codes - a set of simple hash algorithms) calculations. The class uses a set of parameters that are #define:d at the top of the header. The main reason I chose CRC for this article is that the CRC algorithms usually use a lookup table based on a set of parameters, making the example somewhat similar to my original problem.

The class generate a lookup table with 256 entries at compile time. The implementation is pretty straightforward, and I hope that the comments in the source is enough to explain usage of the class. In some cases I have used macros where TMP would have been a possible solution. The reason for this is readability, an important factor when choosing which technique to use. 
If you want a further explanation of CRC algorithms, I suggest reading Ross N. Williams "A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS". I have included a verbatim copy of the text in the sample.

One thing to remember when compiling this is that the source is using a lot of compiler heap memory. I had to increase the memory allocation limit by using the compiler option '/Zm400'. This is one drawback of TMP - it really pushes the compiler to the limit.

Coding conventions

The compiler doesn't like being abused as described above. Not a bit. And it will put up a struggle. Warnings and errors will range from cryptic to C1001 - INTERNAL COMPILER ERROR. And you can't debug a Metatemplate program like a runtime program.

For those reasons using well defined coding conventions is more important with TMP than with other programming techniques. Below I give some rules that I've found useful.

General suggestions

As template metaprograms are somewhat similar to macros I prefer to give all TMP classes uppercase names. Also try to make the name as descriptive as possible. The main reason for this is that public variables/functions usually have non-descriptive names (see below). A TMP class is the one defining the operation, not the member functions/variables.

Also try to limit a TMP class to a single operation. Template Metaprogramming is challenging enough without trying to generalize the class to support multiple operations. Often this will only result in code bloat.

Variable/functions names

I usually prefer to have only one of two possible public operations defined in a TMP class:
  • RESULT for templates that calculate a value
  • EXEC() for loop unrolling/function simplification.
Usually one function  or one member variable is the only thing the Template Metaprogram needs to expose to the programmer. It makes sense to give the  type of operation a single name for all possible classes. Another advantage is that you can look at a template class and immediately see if it is a Template Metaprogram or an ordinary template class.

Should I use TMP in my program?

Template Metaprogramming is a great technique when used correctly. On the other hand it  might result in code bloat and performance decrease. Below are some rules of thumb when to use TMP.

Use TMP when:

  • A macro is not enough. You need something more complex than a macro, and you need it expanded before compiled.
  • Using recursive function with a predetermined number of loops. In this case the overhead of function calls and setting up stack variables can be avoided and runtime will significantly decrease.
  • Using loops that can be unrolled at compile time. For example hash-algorithms like MD5 and SHA1 contains well-defined block processing loops that can be unrolled with TMP.
  • When calculating constant values. If you have constants that depend on other constants in your program, they might be a candidate for TMP.
  • When the program should be portable to other platforms. In this case TMP might be an alternative to macros.

Don't use TMP when:

  • When a macro will do. In most cases a macro will be enough. And a macro is often easier to understand than TMP.
  • You want a small executable. Templates in general, and TMP in particular will in often increase the code size.
  • Your program already takes a long time to compile. TMP might significantly increase compile time.
  • You are on a strict deadline. As noted above the complier will be very unfriendly when working with Template Metaprograms. Unless the changes you intend to make are very simple, you might want to save yourself a few hours of banging your head in the wall.

Acknowledgements and references

  1. Thanks to Joaquín M López Muñoz for introducing me to TMP.
  2. Ross N. Williams has written a great tutorial on CRC algorithms: "A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS". I have included a verbatim copy in the sample (zip-ball?) above.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
图论入门 图论是一门研究图的性质和关系的数学分支。图是由节点和边组成的数据结构,节点表示对象,边表示对象之间的关系。图论可以用于解决许多实际问题,如社交网络分析、路线规划、电路设计等。 图可以分为有向图和无向图。有向图中的边有方向,表示从一个节点到另一个节点的单向关系;无向图中的边没有方向,表示两个节点之间的双向关系。 图还可以分为加权图和非加权图。加权图中的边有权值,表示两个节点之间的距离或代价;非加权图中的边没有权值,表示两个节点之间的关系仅仅是存在或不存在。 图的表示方法有邻接矩阵和邻接表。邻接矩阵是一个二维数组,其中每个元素表示两个节点之间的边;邻接表是一个链表数组,其中每个链表表示一个节点的邻居节点。 图的遍历方法有深度优先搜索和广度优先搜索。深度优先搜索从一个节点开始,沿着一条路径一直走到底,然后回溯到上一个节点继续搜索;广度优先搜索从一个节点开始,先访问它的所有邻居节点,然后再访问邻居节点的邻居节点,以此类推。 图的算法有最短路径算法和最小生成树算法。最短路径算法用于找到两个节点之间的最短路径,常用的算法有Dijkstra算法和Bellman-Ford算法;最小生成树算法用于找到一个无向图的最小生成树,常用的算法有Prim算法和Kruskal算法。 图论是计算机科学中的重要分支,它提供了许多解决实际问题的方法和工具。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值