- I suffer copy overhead
- I suffer slow speed for lookup tables
- My hand-coded (intrusive) lists are much faster than std::list
Let's look at those problems in turn. Often, a vector<X> is slower than somebody's specialized My_container<X> because My_container<X> is implemented as a container of pointers to X. The standard containers hold copies of values, and copy a value when you put it into the container. This is essentially unbeatable for small values, but can be quite unsuitable for huge objects:
vector<int> vi; vector<Image> vim; // ... int i = 7; Image im("portrait.jpg"); // initialize image from file // ... vi.push_back(i); // put (a copy of) i into vi vim.push_back(im); // put (a copy of) im into vimNow, if portrait.jpg is a couple of megabytes and Image has value semantics (i.e., copy assignment and copy construction make copies) then vim.push_back(im) will indeed be expensive. But -- as the saying goes -- if it hurts so much, just don't do it. Instead, either use a container of handles or a containers of pointers. For example, if Image had reference semantics, the code above would incur only the cost of a copy constructor call, which would be trivial compared to most image manipulation operators. If some class, say Image again, does have copy semantics for good reasons, a container of pointers is often a reasonable solution:
vector<int> vi; vector<Image*> vim; // ... Image im("portrait.jpg"); // initialize image from file // ... vi.push_back(7); // put (a copy of) 7 into vi vim.push_back(&im); // put (a copy of) &im into vimNaturally, if you use pointers, you have to think about resource management, but containers of pointers can themselves be effective and cheap resource handles (often, you need a container with a destructor for deleting the "owned" objects).
The second frequently occuring genuine performance problem is the use of a map for a large number of (string,X) pairs. Maps are fine for relatively small containers (say a few hundred or few thousand elements -- access to an element of a map of 10000 elements costs about 9 comparisons), where less-than is cheap, and where no good hash-function can be constructed. If you have lots of strings and a good hash function, use a hash table. The unordered_map from the standard committee's Tecnical Report is now widely available and is far better than most people's homebrew.
Sometimes, you can speed up things by using (const char*,X) pairs rather than (string,X) pairs, but remember that < doesn't do lexicographical comparison for C-style strings. Also, if X is large, you may have the copy problem also (solve it in one of the usual ways).
Intrusive lists can be really fast. However, consider whether you need a list at all: a vector is more compact and is therefore smaller and faster in many cases - even when you do inserts and erases. For example, if you logically have a list of a few integer elements, a vector is significantly faster than a list (any list). Also, intrusive lists cannot hold built-in types directly (an int does not have a link member). So, assume that you really need a list and that you can supply a link field for every element type. The standard-library list by default performs an allocation followed by a copy for each operation inserting an element (and a deallocation for each operation removing an element). For std::list with the default allocator, this can be significant. For small elements where the copy overhead is not significant, consider using an optimized allocator. Use a hand-crafted intrusive lists only where a list and the last ounce of performance is needed.
People sometimes worry about the cost of std::vector growing incrementally. I used to worry about that and used reserve() to optimize the growth. After measuring my code and repeatedly having trouble finding the performance benefits of reserve() in real programs, I stopped using it except where it is needed to avoid iterator invalidation (a rare case in my code). Again: measure before you optimize.