Abstract
A few months ago, I wrote two snippets that composed the basics of the initial half of this article. This one is the complete and formal version.
Often than not, a seemingly trivial problem could become a challenge when you take various requirements into account, when you have to decide among a number of design options, and when you have to balance a variety of tradeoffs. Challenges like this are what a library programmer faces every day. This article will show how I think, design, and implement for the task of adding things up, i.e., an accumulator. The task may sound too trivial to deserve an article, but it will actually not if you put all requirements together to create a really flexible and customizable framework. You will see that modern C++ template techniques play as the key weapon for taking down many difficult design problems.
Copyright (c) prototype, all rights reserved。
在不对原文内容(包括作者信息)做任何改动的前提下,欢迎自由转载。
1. The primer
Often I come across writing a small fragment of code that sums up things. The code usually has the following look:
float sum; // I could forget to properly initialize `sum'.
int n = 0;
while (...) {
sum += array[i];
n++;
... // some other stuff
}
Potential problems for codes like this are the following:
I. The programmer has to remember to properly initialize the sum variable.
II. The programmer has to keep a counter around and initialize it before the summation starts and increment it along with each addition operation.
III. If the type of the sum variable changes, the summation code may likely change.
IV. For floating number summations, a simple summation like the above code may not be accurate enough, whereas more accurate summation is actually quite complicated. The same thing could also happen for other types. For complicated algorithms, it is essential to hide the implementation detail so as to keep the client code as clean and easy to understand and maintain as possible.
Below we will attack these problems one by one. The actions will eventually lead to a complete and customizable framework that allows user to treat any summation in a consistent and clean way.
2. The class and the interface
For problems I and II, the correct approach is generally to create a class. This is a very natural decision so I would like to skip its rationalization for succinctness. The crucial part is the interface of the class. Let’s think about it carefully:
1. Problems I and II require that the sum variable and the counter are initialized in an automatic fashion. Thus we need to define a default ctor for doing this.
2. The default initialization might not always be what a user wants, a ctor for customized initialization should also be provided. The most natural way to initialize a variable is to set a value to it. We therefore need to define another ctor for the class. This ctor will take only 1 parameter that is a constant reference to the initializing value.
3. A member function should be available for accumulating a given value to the sum. Plain member function can work, but += operator is a more natural interface.
4. The summation result should be available for the user. Exposing a data member violates the encapsulation and should always be avoided. A member function should be provided. Let’s call it “sum()”.
The above is the basic interface. However, a couple of useful things can also be added.
5. The interface allows the user to know how many values have been accumulated. So a member function can be provided for reading the current value of the counter. Let’s call this function “size()”, indicating the size of the data sample.
6. It is almost certain that an average will be calculated after summation. So a member function for this can be useful. Let’s call this function “average()”.
OK, admitted that the actual process of defining the interface may not sound as straightforward as I wrote here, but let’s be succinct and get to the most exciting part faster. Once the interface is defined, an initial implementation of the class is easy:
class Accumulator
{
public:
Accumulator() : sum_( 0 ), size_( 0 ) { }
explicit Accumulator( const float& a ) : sum_( a ), size_( 0 ) { }
Accumulator& operator += ( const float& a ) { sum_ += a; return *this; }
unsigned long size() const { return size_; }
float sum() const { return sum_; }
float average() const { return sum() / size(); }
private:
float sum_;
unsigned long size_;
};
Just an aside to clarify a potential snag: “explicit” is a C++ keyword, by putting it before a ctor, we prevent unwant implicit type conversion from happening. Specifically speaking for this case, a declaration like `explicit Accumulator( const float& a )’ will make a code like the following produce a compile error:
void foo( const Accumulator& a );
float a;
…
foo( a ); // error
If the ctor was defined without “explicit”, the compiler would generate a temporary “Accumulator” object (by calling to the ctor) and then pass this object into “foo” without any complaints. Such a behavior is a source of bugs and should be avoided unless it is what you really want to implement.
It is worth mentioning a technical merit in the implementation of `average()’. It uses `sum()’ and `size()’, instead of `sum_’ and `size_’, to obtain the average. It is always better to use public member functions instead of member data in your implementation no matter it is within or outside the same class. This is simply because public member functions are much more stable than member data. As the class evolves to become more complicated and powerful, an implementation using `sum_’ and `size_’ will not be possible any more.
Now, with this implementation, we basically solved problems I and II. Next, we need to genericitize this implementation for many different types.
3. The generic implementation
To obtain a generic version of the implementation, we can replace the `float’ with a generic type `T’ to get the initial templatized version. Then we examine and fix any problems resulted by the templatization. This is useful approach to obtaining generic design and implementations.
The initial template version looks like this:
template <typename T>
class Accumulator
{
public:
Accumulator() : sum_( 0 ), size_( 0 ) { }
explicit Accumulator( const T& a ) : sum_( a ), size_( 0 ) { }
Accumulator& operator += ( const T& a ) { sum_ += a; return *this; }
unsigned long size() const { return size_; }
T sum() const { return sum_; }
T average() const { return sum() / size(); }
private:
T sum_;
unsigned long size_;
};
This is actually a pretty good template that works for all built-in types that have well-defined += operators. However, there are still problems:
- For a user define type (UDT), `sum_’ may be unable to be initialized by zero.
- A UDT may not define += operator.
- The user may not want the returned value of `average()’ to be of the same type as the data. For example, the user may want to sum up integer data whereas obtaining an average of `float’ type.
Below let’s try to solve these problems.
For poblem (1), we first need to realize that there is no way to know a priori how an object of any type should be initialized. The best bet is that if the user wants an `Accumulator’ object to be constructed via the default ctor he also wants the `sum_’ to be initialized by the default ctor of `T’. This assumption is the best that we can do at this point (but we will indeed do better later on, I promise). For now, we accept this assumption and make it a requirement or constraint for `T’. This means that default ctor should be changed to this:
Accumulator() : sum_(), size_( 0 ) { }
You might doubt whether this works for build-in types. The answer is yes. The form of expression like `sum_()’ means the object (`sum_’ in this case) is zero initialized and has no difference from `sum_( 0 )’.
Problem (2) is difficult for now. As we will see below, the solution will change the framework of the class significantly. Let’s skip this for now, and we just make it a requirement for `T’ that the UDT must have a well-defined += operator.
Problem (3) is not a severe problem. The user can always use `sum()’ and `size()’ and necessary type conversions to get the wanted average value. But since we already put `average’ into the interface, we should complete the work. We need the user to specify what type of value he needs `average()’ to return. To this end, a simple and yet elegant way to overload the `average()’ function with a function template like this:
template <typename S> S average() const { return S( sum() ) / size(); }
The client code will look like this:
Accumulator<int> a;
…
float b = a.average<float>();
This beautifully solves our problem (3).
OK. Putting these solutions together (omitted here), we get a generic implementation of the `Accumulation’ class. It works nicely for any type that satisfies the two constraints mentioned above.
3. The customizable framework
Now, let’s think about the problems of our generic implementation of `Accumulator’:
- `T’ must satisfy the aforementioned constraints in order to use the template. This considerably restricts the use of the template.
- `sum_’ has to be the same type as the data, which is not always wanted. For example, we sometimes want to use `double’ for `sum_’, while the data are still `float’. There could be more complicated scenarios, for example, `sum_’ may not even be stored as a single variable.
The problems are unfortunately fundamental for our implementation; by no means, we can solve them within the current framework of `Accumulator’. What these problems really require for the implementation is that the user is allowed to define, for `Accumulator’, how `sum_’ is stored, how summation is performed, how average is obtained, and so on. In other words, they require a mechanism to plug-in not only a type (our generic implementation already did this), but also algorithms and internal data structures. Can we do this?
The short answer is yes. Long answer is what follows. Below, I will just show the implementation and then explain how it works.
namespace private
{
template <typename T>
class Simple
{
public:
typedef T DataType;
Simple() : sum_(), size_( 0 ) { }
Simple( const T& t ) : sum_( t ), size_( 0 ) { }
void operator += ( const T& rhs ) { sum_ += rhs; size_++; }
const T& sum() const { return sum_; }
T average() const { return sum_ / size_; }
unsigned long size() const { return size_; }
template <typename S>
S average() const { return S( sum_ ) / size_; }
private:
T sum_;
unsigned long size_;
};
}
template <typename T, typename M = private::Simple<T> >
class Accumulator : private M
{
public:
Accumulator() : M() { }
Accumulator( const T& t ) : M( t ) { }
Accumulator& operator += ( const T& rhs ) { M::operator += ( rhs ); return *this; }
T sum () const { return T( M::sum() ); }
T average() const { return T( M::average() ); }
unsigned long size () const { return M::size(); }
template <typename S>
S average() const { return M::template average<S>(); }
};
In this implementation, we use a powerful technique that I called it
template mixin. What this technique does is to derive the current class from a template argument so that the current class’ behavior is completely or partly customized by the base class. In some sense, this technique can be thought opposite to the object-oriented method. In this technique, you define the most derived class first, leaving the base class later for the user to define. In the object-oriented method, you define the base class first and let the user define the derived class. In this technique, the interface is defined at the most-derived class, whereas in the object-oriented method, the interface is defined at the base class. This opposite thinking actually results in a very important feature – static polymorphism, and no virtual functions. In our case, the base class `M’ is where all the algorithms and data structure are implemented for the type `T’. This leaves the `Accumulator’ a real framework that basically only defines the interface, leaving all behavior to be customized by the UDT `M’. Because it doesn’t really define the implementation in any specific way by itself, `Accumulator’ achieves the maximal customizability.
OK. That is the general idea. Let’s inspect `Accumulator’ closely to clarify a few important points in the design and implementation:
1. The implementation of the member functions of `Accumulator’ is just very thin wrapping of the corresponding member function of `M’. Any compiler should be able to remove the overhead of the additional layer of function call, so no loss of efficiency.
2. By defining `private::Simple<T>’, we provide implementation for any types that satisfies the aforementioned constraints. We made this implementation the default by the template expression `typename M = private::Simple<T>’.
3. The user can change the default behavior by plugin their own class for `M’. Let’s see this through an example:
To use `double’ type, instead of `float’ type to sum over a number data of `float’ type, we just change the declaration from:
Accumulator<float> a;
to
to
Accumulator<float, private::Simple<double> > a;
All other code remains the same.
All other code remains the same.
4. Conclusions
Through three versions of the `Accumulator’ class, we incrementally solved the problems I~IV, eventually achieving a flexible and fully customizable implementation. This is a general approach how I personally design and implement software.
C++ template is an essential and very powerful tool for conquering the difficult design and implementation problems. Whereas generics is a powerful technique that liberates the programmer from implementation for each specific case, it can still cause rigidity on the underlying data structure and algorithms. The mix-in technique, as I demonstrated here, is a key to breaking such rigidity for maximal freedom.