Generic Programming in C++

最新推荐文章于 2024-05-31 18:06:20 发布

seamanj

最新推荐文章于 2024-05-31 18:06:20 发布

阅读量1.9k

点赞数

2.1 Introduction

In generic programming, we take the notion of an ADT a step further. Instead of writing

down the specification for a single type, we describe a family of types that all have a com-
mon interface and semantic behavior. The set of requirements that describe the interface and
semantic behavior is referred to as a concept. Algorithms constructed in the generic style are
then applicable to any type that satisfies the requirements of the algorithm. This ability to
use many different types with the same variable (or parameter of a function) is referred to as

polymorphism.

2.1.1 Polymorphism in Object-Oriented Programming

In object-oriented programming (OOP), polymorphism is realized with virtual functions and
inheritance, which is called subtype polymorphism. The interface requirements of a concept
can be written as virtual functions in an abstract base class. The preconditions and invariants
become assertions when possible. Concrete classes inherit from the abstract base class and
provide the implementation of these functions. The concrete classes are said to be subtypes
(or derived classes) of the base class. Generic functions are written in terms of the abstract
base class and the function calls are dispatched at run-time based on the concrete type of the
object (via virtual function tables in C++). Any subtype of the abstract base class can be
interchanged and used in the generic function.
A classic example of a concept from mathematics is an Additive Abelian Group, which
is a set of elements with an addition operator that obeys the associative law, has an inverse,
and has an identity element (zero) [45]. We can represent this concept in C++ by defining an
abstract base class as follows:
// The AdditiveAbelianGroup concept as an abstract base class:
class AdditiveAbelianGroup {
public:
virtual void add(AdditiveAbelianGroup* y) = 0;
virtual AdditiveAbelianGroup* inverse() = 0;
virtual AdditiveAbelianGroup* zero() = 0;
};
Using this abstract base class we can write a reusable function such as sum() .
AdditiveAbelianGroup* sum(array<AdditiveAbelianGroup*> v)
{
AdditiveAbelianGroup* total = v[0]−>zero();
for (int i = 0; i < v.size(); ++i)
total−>add(v[i]);
return total;
}
The sum() function will work on any array as long as the element type derives from Additive-
AbelianGroup . Examples of such types would be real numbers and vectors.
class Real : public AdditiveAbelianGroup {
// . . .
};
class Vector : public AdditiveAbelianGroup {
// . . .
};

2.1.2 Polymorphism in Generic Programming

In generic programming, polymorphism is realized through class or function templates. Tem-
plates provide parametric polymorphism. Below is sum() written as a function template.
The AdditiveAbelianGroup base class is no longer needed, although by convention (and for
documentation purposes) we use the name AdditiveAbelianGroup for the template parameter.
template <typename AdditiveAbelianGroup>
AdditiveAbelianGroup sum(array<AdditiveAbelianGroup> v)
{
AdditiveAbelianGroup total = v[0].zero();
for (int i = 0; i < v.size(); ++i)
total.add(v[i]);
return total;
}
In C++ a concept is a set of requirements that a template argument must meet so that the
class template or function template can compile and execute properly.
Even though concepts exist only implicitly in generic programming, they are vitally im-
portant and must be carefully documented. Currently, such documentation is typically ac-
complished in the comments of the code or in books such as Generic Programming and the
STL [3]. Consider again the example of an AdditiveAbelianGroup, but this time as a concept.
// concept AdditiveAbelianGroup
//
valid expressions:
//
x.add(y)
//
y = x.inverse()
//
y = x.zero()
//
semantics:
//
...
Concrete types that satisfy the requirements of AdditiveAbelianGroup do not need to
inherit from a base class. The types of the template argument are substituted into the function
template during instantiation (at compile time). The term model is used to describe the rela-
tionship between concrete types and the concepts they satisfy. For example, Real and Vector
model the AdditiveAbelianGroup concept.
struct Real { // no inheritance
// . . .
};
struct Vector { // no inheritance
// . . .
};

2.1.3 Comparison of GP and OOP

So far, we have loosely described generic programming as “programming with templates”
and object-oriented programming as “programming with inheritance.” This is somewhat mis-
leading because the core semantics of these two methodologies are only indirectly related
to templates and inheritance. More formally, generic programming is based on parametric
polymorphism, while object-oriented programming is based on subtype polymorphism. In
C++ these ideas are implemented with templates and inheritance, but other languages provide
different mechanisms. For example, the signatures extension in GNU C++ [4] provides an
alternate form of subtype polymorphism. Multimethods (in languages such as CLOS [21])
provide semantics closer to that of parametric polymorphism but with run-time dispatching
(compared to the compile-time dispatching of templates).
Nevertheless, since Standard C++ is our language choice, it is useful to compare GP and
OOP by comparing inheritance (and virtual functions) with templates in the context of C++.

Virtual Functions Are Slower than Function Templates

A virtual function call is slower than a call to a function template (which is the same speed
as a call to a normal function). A virtual function call includes an extra pointer dereference
to find the appropriate method in the virtual function table. By itself, this overhead may
not be significant. Significant slowdowns can result indirectly in compiled code, however,
because the indirection may prevent an optimizing compiler from inlining the function and
from applying subsequent optimizations to the surrounding code after inlining.
Of course the overall impact of the overhead is entirely dependent on the amount of work
done in the function—that is, how much the overhead will be amortized. For components at
the level of the STL iterators and containers, or at the level of graph iterators, function call
overhead is significant. Efficiency at this level is affected greatly by whether functions like
operator++() are inlined. For this reason, templates are the only choice for implementing
efficient, low-level, reusable components such as those you find in the STL or the BGL.

Run-time Dispatch versus Compile-time Dispatch

The run-time dispatch of virtual functions and inheritance is certainly one of the best features
of object-oriented programming. For certain kinds of components, run-time dispatching is an
absolute requirement; decisions need to be made based on information that is only available
at run time. When this is the case, virtual functions and inheritance are needed.
Templates do not offer run-time dispatching, but they do offer significant flexibility at
compile time. In fact, if the dispatching can be performaed at compile time, templates offer
more flexibility than inheritance because they do not require the template arguments types to
inherit from some base class (more about this later).

Code Size: Virtual Functions Are Small, Templates Are Big

A common concern in template-based programs is code bloat, which typically results from
naive use of templates. Carefully designed template components need not result in signifi-
cantly larger code size than their inheritance-based counterparts. The main technique in con-
trolling the code size is to separate out the functionality that depends on the template types
and the functionality that is independent of the template types. An example of how to do this
can be seen in the SGI STL implementation of std::list .

The Binary Method Problem

There is a serious problem that shows up when using subtyping (inheritance and virtual func-
tions) to express operations that work on two or more objects. This problem is known as
the binary method problem [8]. The classic example for this problem, which we illustrate
next, is a point class interface (a coordinate in a plane) that has an equal() member function.
This problem is particularly important for the BGL, since most of the types it defines (vertex
and edge desriptors and iterators) require an operator==() much like a point class equal()
function.
The following abstract base class describes the interface for a point class.
class Point {
public:
virtual bool equal(const Point* p) const = 0;
};
Using this interface, a library writer could write a “generic” function that takes any class
derived from Point and print out whether the two objects are equal.
void print equal(const Point* a, const Point* b) {
std::cout << std::boolalpha << a−>equal(b) << std::endl;
}
Now consider an implementation of a particular point class, say the ColorPoint class. Suppose
that in our application the only point class we will be using is the ColorPoint class. It is only
necessary to define equality between two color point objects, and not between a color point
and any other kind of point.

class ColorPoint : public Point {
public:
ColorPoint(float x, float y, std::string c) : x(x), y(y), color(c) { }
virtual bool equal(const ColorPoint* p) const
{ return color == p−>color && x == p−>x && y == p−>y; }
protected:
float x, y;
std::string color;
};

However, when we try to use this class, we find out that the ColorPoint::equal() function did
not override the Point::equal() function. When trying to instantiate a ColorPoint object we
get the following error.
error: object of abstract class type "ColorPoint" is not allowed:
pure virtual function "Point::equal" has no overrider
It turns out that by the contravariance subtyping rule, the parameter type in the derived classes
member function must be either the same type or a base class of the type as the parameter
in the base class. In the case of the ColorPoint class, the parameter to equal() must be Point ,
not ColorPoint . However, making this change causes another problem. Inside the equal()
function, the Point argument must be downcast to be able to check to determine if the data
members are equal. The insertion of this downcast means that it is no longer known at compile
time whether a program using the ColorPoint class is type safe. An object of a different point
class could be passed to the equal() function in error, causing an exception at run time. The
following ColorPoint2 class changes the parameter of the equal() to Point and inserts the
downcast.
class ColorPoint2 : public Point {
public:
ColorPoint2(float x, float y, std::string s) : x(x), y(y), color(s) { }
virtual bool equal(const Point* p) const {
const ColorPoint2* cp = dynamic cast<const ColorPoint2*>(p);
return color == cp−>color && x == cp−>x && y == cp−>y;
}
protected:
float x, y;
std::string color;
};
Now suppose that we were using function templates instead of virtual functions to express
polymorphism. Then the print equal() function could be written like this:
template <typename PointType>
void print equal2(const PointType* a, const PointType* b) {
std::cout << std::boolalpha << a−>equal(b) << std::endl;
}
To use this function, the color point class does not need to inherit from Point , and the subtyping
issues are irrelevant. When the print equal2() function is called with two objects of type
ColorPoint , the PointType parameter is substituted for ColorPoint and the call to equal simply
resolves to ColorPoint::equal() . Full compile-time type safety is therefore retained.
ColorPoint* a = new ColorPoint(0.0, 0.0, "blue");
ColorPoint* b = new ColorPoint(0.0, 0.0, "green");
print equal2(a, b);

Since the BGL is implemented in terms of function templates, we did not have to be concerned
with the binary method problem. If instead the BGL had been implemented with virtual
functions, the binary method problem would have been a constant source of trouble.

2.2 Generic Programming and the STL

The problem domain underlying the STL is that of basic algorithms for computer science
(e.g., array and list structures, searching and sorting algorithms—the kind of things you dealt
with in your data structure and algorithms classes). Now, there have been any number of
“foundational” library collections that have attempted to provide some kind of comprehensive
set of data structures and algorithms. What diffrerentiates the STL from the rest of these
efforts is generic programming (process and practice).
As described by Musser and Stepanov [35], the GP process as it is applied to a particular
problem domain consists of the following basic steps:
1. Identify useful and efficient algorithms
2. Find their generic representation (i.e., parameterize each algorithm such that it makes
the fewest possible requirements of the data on which it operates)
3. Derive a set of (minimal) requirements that allow these algorithms to run and to run
efficiently
4. Construct a framework based on classifications of requirements
This process is reflected in the structure and organization of the STL components.
In terms of programming practice, the minimization process and framework design imply
a structure where algorithms are expressed independently of any particular data types upon
which they might operate. Rather, algorithms are written to generic specifications that are
deduced from the algorithms’ needs.
For instance, algorithms typically need the abstract functionality of being able to traverse
through a data structure and to access its elements. If data structures provide a standard
interface for traversal and access, generic algorithms can be freely mixed and matched with
data structures (called containers in the terminology of the STL).
The main facilitator in the separation of algorithms and containers in the STL is the iter-
ator (sometimes called a generalized pointer). Iterators provide a mechanism for traversing
containers and accessing their elements. The interface between an algorithm and a container
is in terms of iterator requirements that must be met by the type of iterators exported by the
container. Generic algorithms are most flexible when they are written in terms of iterators and
do not rely on a particular container.
Iterators are classified into broad categories, some of which are InputIterator , ForwardIt-
erator , and RandomAccessIterator . Figure 2.1 depicts the relationship between containers,
algorithms, and iterators.

Accumulate Example
For a concrete example of generic programming we will look at the algorithm accumulate() ,
which successively applies a binary operator to an initial value and each element in a con-
tainer. A typical use of accumulate() would be to sum the elements of a container using the

addition operator. The following code shows how one could implement the accumulate() al-
gorithm in C++. The first and last arguments are iterators that mark the beginning and passed-
the-end of the sequence. All of the arguments to the function are parameterized on type so that
the algorithm can be used with any container that models the InputIterator concept. Iterator
traversal uses the same notation as pointers; specifically, operator++() increments to the next
position. Several other ways to move iterators (especially random access iterators) are listed
in Table 2.1. To access the container element under the iterator, one uses the dereference
operator, operator*() , or the subscript operator, operator[]() , to access at an offset from the
iterator.
template <typename InputIterator, typename T, typename BinaryOperator>
T accumulate(InputIterator first, InputIterator last, T init, BinaryOperator binary op)
{
for (; first != last; ++first)
init = binary op(init, *first);
return init;
}
To demonstrate the flexibility that the iterator interface provides, we use the accumulate()
function template with a vector and with a linked list (both from the STL).
// using accumulate with a vector
std::vector<double> x(10, 1.0);
double sum1;
sum1 = std::accumulate(x.begin(), x.end(), 0.0, std::plus<double>());
// using accumulate with a linked list
std::list<double> y;
double sum2;
// copy vector’s values into the list
std::copy(x.begin(), x.end(), std::back inserter(y));
sum2 = std::accumulate(y.begin(), y.end(), 0.0, std::plus<double>());
assert(sum1 == sum2); // they should be equal

2.3 Concepts and Models

The previous section showed an example of the RandomAccessIterator requirements. It also
showed how InputIterator was used as a requirement for the accumulate() function and how
both std::list::iterator and std::vector::iterator could be used with this function. In this section,
we define the terms that describe the relationships between sets of requirements, functions,
and types.
In the context of generic programming, the term concept is used to describe the collection
of requirements that a template argument must meet for the function template or class tem-
plate to compile and operate properly. In the text, the sans-serif font is used to distinguish
concept names.
Examples of concept definitions can be found in the C++ Standard, many of which deal
with the requirements for iterators. In addition, Matthew Austern’s book Generic Program-
ming and the STL [3] and the SGI STL Web site provide comprehensive documentation on
the concepts used in the STL. These concepts are used heavily in the definition of the BGL
concepts. The SGI STL Web site is at the following URL:
http://www.sgi.com/tech/stl/

2.3.1 Sets of Requirements

2.4 Associated Types and Traits Classes

One of the most important techniques used in generic programming is the traits class, which
was introduced by Nathan Myers [36]. The traits class technique may seem somewhat un-
natural when first encountered (due to the syntax) but the essence of the idea is simple. It is
essential to learn how to use traits classes, for they are used regularly in generic libraries such
as the STL and the BGL.

2.4.1 Associated Types Needed in Function Template

A traits class is basically a way of determining information about a type that you would
otherwise know nothing about. For example, consider a generic sum() function:
template <typename Array>
X sum(const Array& v, int n)
{
X total = 0;
for (int i = 0; i < n; ++i)
total += v[i];
return total;
}
From the point of view of this function template, not much is known about the template
type Array . For instance, the type of the elements that are inside the array is not given. How-
ever, this information is necessary in order to declare the local variable total , which should
be the same type as the elements of Array . The X that is there now is just a placeholder that
needs to be replaced by something else to produce a correct sum() function.

2.4.2 Typedefs Nested in Classes

One way to access information out of a type is to use the scope operator :: to access typedef s
that are nested inside the class. For example, an array class might looks like the following:
class my array {
public:
typedef double value type; // the type for elements in the array
double& operator[ ](int i) { m data[i]; };
private:
double* m data;
};
The type of the elements in the array can be accessed via my array::value type . The generic
sum() function can be realized using this technique as follows (note that the X placeholders

have been replaced with typename Array::value type 1 ):

template <typename Array>
typename Array::value type sum(const Array& v, int n)
{
typename Array::value type total = 0;
for (int i = 0; i < n; ++i)
total += v[i];
return total;
}
In the sum() function above, the technique of using a nested typedef works as long as
Array is a class type that has such a nested typedef. However, there are important cases for
which having a nested typedef is neither practical nor possible. For instance, one might want
to use the generic sum() function with a class from a third party that did not provide the
required typedef. Or, one might want to use the sum() function with a built-in type such as
double * .
int n = 100;
double* x = new double[n];
sum(x, n);
In both of these cases, it is quite likely that the functional requirements of our desired use
are met; that is, the operator[]() works with double* and with our imaginary third-party array.
The limitation to reuse is in how to communicate the type information from the classes we
want to use to the sum() function.

2.4.3 Definition of a Traits Class

The solution to this is a traits class, which is a class template whose sole purpose is to pro-
vide a mapping from a type to other types, functions, or constants. The language mechanism
that allows a class template to create a mapping is template specialization. The mapping is
accomplished by creating different versions of the traits class to handle specific type param-
eters. We will show how this works by creating an array traits class that can be used in the
sum() function.
The array traits class will be templated on the Array type and will allow us to determine
the value type (the type of the element) of the array. The default (fully templated) case will
assume that the array is a class with a nested typedef such as my array :

template <typename Array>
struct array traits {
typedef typename Array::value type value type;
};

We can then create a specialization of the array traits template to handle when the Array
template argument is a built-in type like double* :
template <> struct array traits<double*> {
typedef double value type;
};
Third-party classes, say johns int array , can be similarly accommodated:
template <> struct array traits<johns int array> {
typedef int value type;
};
The sum() function, written with array traits class, is shown below. To access the type for the
total variable we extract the value type from array traits .
template <typename Array>
typename array traits<Array>::value type sum(const Array& v, int n)
{
typename array traits<Array>::value type total = 0;
for (int i = 0; i < n; ++i)
total += v[i];
return total;
}

2.4.4 Partial Specialization

Writing a traits class for every pointer type is not practical or desirable. The following shows
how to use partial specialization to provide array traits for all pointer types. The C++ compiler
will attempt a pattern match between the template argument provided at the instantiation of
the traits class and all the specializations defined, picking the specialization that is the best
match. The partial specialization for T* will match whenever the type is a pointer. The
previous complete specializations for double* would still match first for that particular pointer
type.
template <typename T>
struct array traits<T*> {
typedef T value type;
};
Partial specialization can also be used to create a version of array traits for a third-party class
template.
template <typename T>
struct array traits< johns array<T> > {
typedef T value type;
};

The most well-known use of a traits class is the iterator traits class used in the STL. The
BGL also uses traits classes such as graph traits and the property traits classes. Typically, a
traits class is used with a particular concept or family of concepts. The iterator traits class
is used with the family of iterator concepts. The graph traits class is used with the family of
BGL graph concepts.

2.4.5 Tag Dispatching

A technique that often goes hand in hand with traits classes is tag dispatching, which is a way
of using function overloading to dispatch based on properties of a type. A good example of
this is the implementation of the std::advance() function in the STL, which, in the default
case, increments an iterator n times. Depending on the kind of iterator, there are different
optimizations that can be applied in the implementation. If the iterator is random access, then
the advance() function can simply be implemented with i += n and is very efficient; that is, it
is in constant time. If the iterator is bidirectional, then it may be the case that n is negative, so
we can decrement the iterator n times. The relation between external polymorphism and traits
classes is that the property to be exploited for dispatch (in this case, the iterator category ) is
accessed through a traits class.
In the following example, the advance() function uses the iterator traits class to determine
the iterator category . It then makes a call to the overloaded advance dispatch() function. The
appropriate advance dispatch() is selected by the compiler based on whatever type the itera-
tor category resolves to (one of the tag classes in the following code). A tag is simply a class
whose only purpose is to convey some property for use in tag dispatching. By convention, the
name of a tag class ends in tag . We do not define a function overload for the forward iterator-
tag because that case is handled by the function overloaded for input iterator tag .
struct
struct
struct
struct
struct
input iterator tag {};
output iterator tag {};
forward iterator tag : public input iterator tag {};
bidirectional iterator tag : public forward iterator tag {};
random access iterator tag : public bidirectional iterator tag {};
template <typename InputIterator, typename Distance>
void advance dispatch(InputIterator& i, Distance n, input iterator tag)
{ while (n−−) ++i; }
template <typename BidirectionalIterator, typename Distance>
void advance dispatch(BidirectionalIterator& i, Distance n, bidirectional iterator tag)
{
if (n >= 0)
while (n−−) ++i;
else
while (n++) −−i;
}

template <typename RandomAccessIterator, typename Distance>
void advance dispatch(RandomAccessIterator& i, Distance n, random access iterator tag)
{
i += n;
}
template <typename InputIterator, typename Distance>
void advance(InputIterator& i, Distance n)
{
typedef typename iterator traits<InputIterator>::iterator category Cat;
advance dispatch(i, n, Cat());
}
The BGL graph traits class includes three categories: directed category , edge parallel-
category , and traversal category . The tags for these categories can be used for dispatching
similarly to iterator category .

seamanj

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Generic Programming in C++

2.1 IntroductionIn generic programming, we take the notion of an ADT a step further. Instead of writingdown the specification for a single type, we describe a family of types that all have a com-
复制链接

扫一扫