We will use a running example based on the vector data structure from Ref. 1. The structure is declared as follows:
typedef int data_t;
#define IDENT 1
#define OP *
typedef struct {
long int len;
data_t *data;
} vec_rec, *vec_ptr;
Some basic procedures for accessing vector elements and determining the length of a vector are shown below:
// Retrieve vector element and store at dest
// Return 0 (out of bounds) or 1 (successful)
int get_vec_element(vec_ptr v, long int index, data_t * dest)
{
if (index<0 || index>=v->len)
return 0;
*dest = v->data[index];
return 1;
}
// Return length of vector
long int vec_length(vec_ptr v)
{
return v->len;
}
// Return the starting address of the data array
data_t *get_vec_start(vec_ptr v)
{
return v->data;
}
As an optimization example, consider the following code, which combines all of the elements in a vector into a single value.
// Implementation with maximum use of data abstraction
void combine1(vec_ptr v, data *dest)
{
*dest = IDENT;
for (long int i=0; i!=vec_length(v); ++i) {
data_t val;
get_vec_element(v, i, &val);
*dest = *dest OP val;
}
}
Three modified versions called
combine2, combine3 and combine4 are presented below:
// Eliminating loop inefficiencies // Reducing procedure calls // Eliminating unneeded memory references
// Move call to vec_length out loop // Direct access to vector data // Accumulate result in local variable
void combine2(vec_ptr v, data_t *dest) void combine3(vec_ptr v, data_t *dest) void combine4(vec_ptr v, data_t *dest)
{ { {
long int length = vec_length(v); long int length = vac_length(v); long int length = vac_length(v);
data_t *data = get_vec_start(v); data_t *data = get_vec_start(v);
*dest = IDENT; *dest = IDENT; data_t acc = IDENT;
for (long int i=0; i!=length; ++i) { for (long int i=0; i!=length; ++i) { for (long int i=0; i!=length; ++i) {
data_t val;
get_vec_element(v, i, &val);
*dest = *dest OP val; *dest = *dest OP val; acc = acc OP val;
} } }
*dest = acc;
} } }
Optimizing compilers are typically very cautious about making transformations that change where or how many times a procedure is called. They cannot reliably detect whether or not a function will have side effects, and so they assume that it might. Therefore, the compiler cannot transform combine1 to combine2 implicitly.
A purist might say that transformation from combine2 to combine3 seriously impairs the program modularity. But a more pragmatic programmer would argue that this transformation is a necessary step toward achieving high-performance results. For applications in which performance is a significant issue, one must often compromise modularity and abstraction for speed.
One might think that a compiler should be able to automatically transform combine3 to combine4. In fact, however, the two functions can have different behaviors due to memory aliasing. When given combine3 to compiler, the conservative approach is to keep reading and writing memory, even though this is less efficient.
References
[1] Randal E. Bryant, David R. O'Hallaron(2011). COMPUTER SYSTEMS A Programmer's Perspective (Second Edition).Beijing: China Machine Press.