http://en.wikibooks.org/wiki/Optimizing_C%2B%2B
Keep vector
s capacity
To empty a vector<T> x
object without deallocating its memory, use the statement x.resize(0);
; to empty it and deallocate its memory, use the statement vector<T>().swap(x);
.
To empty a vector
object, there also exists the clear()
member function, but, the C++ standard does not specify whether or not this function preserves the allocated capacity of the vector
.
If you are repeatedly filling and emptying a vector
object, and thus you want to to avoid frequent reallocations, perform the emptying by calling the resize
member function, which, according to the standard, preserves the capacity of the object. If instead you have finished using a large vector
object, and you may not use it again or you are going to use it with substantially fewer elements, you should free the object's memory by calling the swap
function on a new empty temporary vector
object.
Function-objects[edit]
Instead of passing a function pointer as an argument to a function, pass a function-object (or, if using the C++0x standard, a lambda expression).
For example, if you have the following array of structures:
struct S { int a, b; }; S arr[n_items];
… and you want to sort it by the b
field, you could define the following comparison function:
bool compare(const S& s1, const S& s2) { return s1.b < s2.b; }
… and pass it to the standard sort algorithm:
std::sort(arr, arr + n_items, compare);
However, it is probably more efficient to define the following function-object class (aka functor):
struct Comparator { bool operator()(const S& s1, const S& s2) const { return s1.b < s2.b; } };
… and pass a temporary instance of it to the standard sort algorithm:
std::sort(arr, arr + n_items, Comparator());
Function-objects are usually expanded inline and are therefore as efficient as in-place code, while functions passed by pointers are rarely inlined. Lambda expressions are implemented as function-objects, so they have the same performance.
Search in sorted sequences[edit]
To search a sorted sequence, use the std::lower_bound
,std::upper_bound
, std::equal_range
, orstd::binary_search
generic algorithms.
Given that all the cited algorithms use a logarithmic complexity (O(log(n))) binary search, they are faster than the std::find
algorithm, which uses a linear complexity (O(n)) sequential scan.
static
member functions[edit]
In every class, declare every member function that does not access the non-static
members of the class as static
.
In other words, declare all the member functions that you can asstatic
.
In this way, the implicit this
argument is not passed.
Allocating many small objects[edit]
If you have to allocate many objects of the same size, use a block allocator.
A block allocator (aka pool allocator) allocates medium to large memory blocks and provides a service to allocate/deallocate smaller, fixed-size blocks. It allows high allocation/deallocation speed, low memory fragmentation and efficient use of data caches and of virtual memory.
In particular, an allocator of this kind can greatly improve the performance of the std::list
, std::set
, std::multiset
,std::map
, and std::multimap
standard containers.
If your standard library implementation does not already use a block allocator for such containers, you should get one and specify it as a template parameter of instances of such container templates. Boostprovides two customizable block allocators, pool_allocator
andfast_pool_allocator
. Other pool allocator libraries can be found on the World Wide Web. Always measure first to find the fastest allocator for the job at hand.
Appending elements to a collection[edit]
When you have to append elements to a collection, usepush_back
to append a single element, use insert
to append a sequence of elements, and use back_inserter
to cause an STL algorithm to append elements to a sequence.
The push_back
functions guarantees an amortized linear time, as, in case of vector
s, it increases the capacity exponentially.
The back_inserter
class calls the push_back
function internally.
The insert
function allows a whole sequence to be inserted in an optimized way and therefore a single insert
call is faster than many calls to push_back
.
Memory-mapped file[edit]
Except in a critical section of a real-time system, if you need to access most parts of a binary file in a non-sequential fashion, instead of accessing it repeatedly with seek operations, or loading it all in an application buffer, use a memory-mapped file, if your operating system provides such feature.
Memoization techniques (akacaching techniques) are based on the principle that if you must repeatedly compute a pure function, that is a referentially transparentfunction (aka mathematical function), for the same argument, and if such computation requires significant time, you can save time by storing the result of the first evaluation and retrieve that result the other times.
Partitioning[edit]
If you have to split a sequence according a criterion, use a partitioning algorithm, instead of a sorting one.
In STL there is the std::partition
algorithm, that is faster than thestd::sort
algorithm, as it has O(N) complexity, instead of O(N log(N)).
Stable partitioning and sorting[edit]
If you have to partition or sort a sequence for which equivalent entities may be swapped, don't use a stable algorithm.
In STL there is the std::stable_partition
partitioning algorithm, that is slightly slower than the std::partition
algorithm; and there is the std::stable_sort
sorting algorithm, that is slightly slower than the std::sort
algorithm.
Order partitioning[edit]
If you have to pick out the first N elements from a sequence, or the Nth element in a sequence, use an order partitioning algorithm, instead of a sorting one.
In STL there is the std::nth_element
algorithm, that, although slightly slower than the std::stable_partition
algorithm, is quite faster then the std::sort
algorithm, as it has O(N) complexity, instead of O(N log(N)).
Sorting only the first N elements[edit]
If you have to sort the first N elements of a much longer sequence, use an order statistic algorithm, instead of a sorting one.
In STL there are the std::partial_sort
andstd::partial_sort_copy
algorithms, that, although slower than the std::nth_element
algorithm, are so much faster than thestd::sort
algorithm as the partial sequence to sort is shorter than the whole sequence.
Number to string conversion[edit]
Use optimized functions to convert numbers to strings.
The standard functions to convert an integer number to a string or a floating point number to string are rather inefficient. To speed up such operations, use non-standard optimized function, possibly written by yourself.
Use of cstdio
functions[edit]
To perform input/output operations, instead of using the C++ streams, use the old C functions, declared in the cstdio
header.
C++ I/O primitives have been designed mainly for type safety and for customization rather than for performance, and many library implementation of them turn out to be rather inefficient. In particular, the C language I/O functions fread
and fwrite
are more efficient than thefstream
read
and write
member functions.
If you have to use C++ streams, use "\n"
instead of std::endl
sincestd::endl
also flushes the stream.
Access memory in increasing addresses order. In particular:
- scan arrays in increasing order;
- scan multi-dimensional arrays using the rightmost index for innermost loops;
- in class constructors and in assignment operators (
operator=
), access member variables in the order of declaration.
Data caches optimize memory access in increasing sequential order.
When a multi-dimensional array is scanned, the innermost loop should iterate on the last index, the innermost-but-one loop should iterate on the last-but-one index, and so on. In such a way, it is guaranteed that array cells are processed in the same order in which they are arranged in memory.
Moving declarations outside loops
If a variable is declared in the body of a loop, and an assignment to it costs less than a construction plus a destruction, move that declaration before the loop.
Variable scope[edit]
Declare variables as late as possible.
To do so, the programmer must declare all variables in the most local scope. By doing so, the variable is neither constructed nor destructed if that scope is never reached. Postponing declaration as far as possible within a scope means that should there be an early exit before the declaration (using areturn
or break
or continue
statement) the object associated to the variable is neither constructed nor destructed.
It is often the case that at the beginning of a routine no appropriate value is available with which to initialize a variable. The variable is therefore initialized with a default value and a later assignment sets the correct value when it becomes available. If, instead, the variable is defined only when an appropriate value is available, the object is initialized with this value and no subsequent assignment is necessary. This is advised by the guideline "Initializations" in this section.
Initializations[edit]
Use initializations instead of assignments. In particular, in constructors, use initialization lists.
For example, instead of writing:
string s; ... s = "abc"
write:
string s("abc");
Even if a class instance (s in the first example above) is not explicitly initialized, it is nevertheless automatically initialized by the default constructor.
To call the default constructor followed by an assignment with a value may be less efficient than to call only a constructor with the same value.
Increment/decrement operators[edit]
Use prefix increment (++
) or decrement (--
) operators instead of the corresponding postfix operators if the expression value is not used.
Assignment composite operators[edit]
Use the assignment composite operators (like in a += b
) instead of simple operators combined with assignment operators (like ina = a + b
).
Function argument passing[edit]
When you pass an object x
of type T
as argument to a function, use the following criterion:
- If
x
is a input-only argument,- if
x
may be null,- pass it by pointer to constant (
const T* x
),
- pass it by pointer to constant (
- otherwise, if
T
is a fundamental type or an iterator or a function-object,- pass it by value (
T x
) or by constant value (const T x
),
- pass it by value (
- otherwise,
- pass it by reference to constant (
const T& x
),
- pass it by reference to constant (
- if
- otherwise, i.e. if
x
is an output-only or input/output argument,- if
x
may be null,- pass it by pointer to non-constant (
T* x
),
- pass it by pointer to non-constant (
- otherwise,
- pass it by reference to non-constant (
T& x
).
- pass it by reference to non-constant (
- if
explicit
declaration[edit]
Declare as explicit
all constructors that receive only one argument, except for the copy constructors of concrete classes.
Non-explicit
constructors may be called automatically by the compiler when it performs an automatic (implicit) type conversion. The execution of such constructors may take much time.
If such conversion is made compulsorily explicit, and if a new class name is not specified in the code, the compiler could choose another overloaded function, avoiding to call the costly constructor, or it could generate an error, so forcing the programmer to choose another way to avoid the constructor call.
For copy constructors of concrete classes a distinction must be made to allow their pass by value. For abstract classes, even copy constructors may be declared explicit
, as, by definition, abstract classes cannot be instantiated and so objects of such type should never be passed by value.
Rearrange an array of structures as several arrays[edit]
Instead of processing a single array of aggregate objects, process in parallel two or more arrays having the same length.