Power up C++ with STL: Part IV (Implementing real algorithms)

最新推荐文章于 2024-09-18 23:53:27 发布

touzani

最新推荐文章于 2024-09-18 23:53:27 发布

阅读量2.8k

点赞数

分类专栏： C/ C++ 文章标签： c++ up vector graph algorithm templates

C/ C++ 专栏收录该内容

44 篇文章 0 订阅

订阅专栏

I strongly recommend you to read the three articles below first.
Power up C++ with STL: Part I (introduction, vector)
Power up C++ with STL: Part II (string, set, map)
Power up C++ with STL: Part III (more on STL)

Armed with STL, let's go on to the most interesting part of this tutorial: how to implement real algorithms efficiently.

Depth-first search (DFS)
I will not explain the theory of DFS here – instead, read this section of gladius's Introduction to Graphs and Data Structures tutorial – but I will show you how STL can help.

At first, imagine we have an undirected graph. The simplest way to store a graph in STL is to use the lists of vertices adjacent to each vertex. This leads to the vector< vector<int> > W structure, where W[i] is a list of vertices adjacent to i. Let’s verify our graph is connected via DFS:

 
 
  
  
   
   
   
   
   
   /**/
   
   /*
Reminder from Part 1:
typedef vector<int> vi;
typedef vector<vi> vvi;
*/
   
   

 
   
   int
   
    N; 
   
   //
   
    number of vertices 
   
   

   
    vvi W; 
   
   //
   
    graph 
   
   

   
    vi V; 
   
   //
   
    V is a visited flag 
   
   

   
    
 
   
   void
   
    dfs(
   
   int
   
    i) 
   
   ...
   
   { 
       if(!V[i]) ...{ 
            V[i] = true; 
            for_each(all(W[i]), dfs); 
       } 
 }
   
    
  
 
   
   bool
   
    check_graph_connected_dfs() 
   
   ...
   
   { 
       int start_vertex = 0; 
       V = vi(N, false); 
       dfs(start_vertex); 
       return (find(all(V), 0) == V.end()); 
 }

That’s all. STL algorithm 'for_each' calls the specified function, 'dfs', for each element in range. In check_graph_connected() function we first make the Visited array (of correct size and filled with zeroes). After DFS we have either visited all vertices, or not – this is easy to determine by searching for at least one zero in V, by means of a single call to find().

Notice on for_each: the last argument of this algorithm can be almost anything that “can be called like a function”. It may be not only global function, but also adapters, standard algorithms, and even member functions. In the last case, you will need mem_fun or mem_fun_ref adapters, but we will not touch on those now.

One note on this code: I don't recommend the use of vector<bool>. Although in this particular case it’s quite safe, you're better off not to use it. Use the predefined ‘vi’ (vector<int>). It’s quite OK to assign true and false to int’s in vi. Of course, it requires 8*sizeof(int)=8*4=32 times more memory, but it works well in most cases and is quite fast on TopCoder.

A word on other container types and their usage
Vector is so popular because it's the simplest array container. In most cases you only require the functionality of an array from vector – but, sometimes, you may need a more advanced container.

It is not good practice to begin investigating the full functionality of some STL container during the heat of a Single Round Match. If you are not familiar with the container you are about to use, you'd be better off using vector or map/set. For example, stack can always be implemented via vector, and it’s much faster to act this way if you don’t remember the syntax of stack container.

STL provides the following containers: list, stack, queue, deque, priority_queue. I’ve found list and deque quite useless in SRMs (except, probably, for very special tasks based on these containers). But queue and priority_queue are worth saying a few words about.

Queue
Queue is a data type that has three operations, all in O(1) amortized: add an element to front (to “head”) remove an element from back (from “tail”) get the first unfetched element (“tail”) In other words, queue is the FIFO buffer.

Breadth-first search (BFS)
Again, if you are not familiar with the BFS algorithm, please refer back to this TopCoder tutorial first. Queue is very convenient to use in BFS, as shown below:

 
 
  
  
   
   
   
   
   
   /**/
   
   /*
Graph is considered to be stored as adjacent vertices list.
Also we considered graph undirected.
 
vvi is vector< vector<int> >
W[v] is the list of vertices adjacent to v
*/
   
   
 
 
   
   int
   
    N; 
   
   //
   
    number of vertices
   
   

   
    vvi W; 
   
   //
   
    lists of adjacent vertices
   
   

   
      
   
 
   
   bool
   
    check_graph_connected_bfs() 
   
   ...
   
   { 
      int start_vertex = 0; 
      vi V(N, false); 
      queue<int> Q; 
      Q.push(start_vertex); 
      V[start_vertex] = true; 
      while(!Q.empty()) ...{ 
           int i = Q.front(); 
           // get the tail element from queue
           Q.pop(); 
           tr(W[i], it) ...{ 
                if(!V[*it]) ...{ 
                     V[*it] = true; 
                     Q.push(*it); 
                } 
           } 
      } 
      return (find(all(V), 0) == V.end()); 
 }

More precisely, queue supports front(), back(), push() (== push_back()), pop (== pop_front()). If you also need push_front() and pop_back(), use deque. Deque provides the listed operations in O(1) amortized.

There is an interesting application of queue and map when implementing a shortest path search via BFS in a complex graph. Imagine that we have the graph, vertices of which are referenced by some complex object, like:

 
 
  
  
   
   
   
   pair
   
   <
   
    pair
   
   <
   
   int
   
   ,
   
   int
   
   >
   
   , pair
   
   <
   
    
   
   string
   
   , vector
   
   <
   
    pair
   
   <
   
   int
   
   , 
   
   int
   
   >
   
    
   
   >
   
    
   
   >
   
    
   
   >
   
    

 (
   
   this
   
    
   
   case
   
    
   
   is
   
    quite usual: complex data structure may define the position 
   
   in
   
    
 some game, Rubik’s cube situation, etc…)

Consider we know that the path we are looking for is quite short, and the total number of positions is also small. If all edges of this graph have the same length of 1, we could use BFS to find a way in this graph. A section of pseudo-code follows:

 
 
  
  
   
   
   
   //
   
    Some very hard data structure 
   
   

   
    
typedef pair
   
   <
   
    pair
   
   <
   
   int
   
   ,
   
   int
   
   >
   
   , pair
   
   <
   
    
   
   string
   
   , vector
   
   <
   
    pair
   
   <
   
   int
   
   , 
   
   int
   
   >
   
    
   
   >
   
    
   
   >
   
    
   
   >
   
    POS; 
 

   
   //
   
    ... 
   
   

   
    

   
   int
   
    find_shortest_path_length(POS start, POS finish) 
   
   ...
   
   { 
    
     map<POS, int> D; 
     // shortest path length to this position 
     queue<POS> Q; 
    
     D[start] = 0; // start from here 
     Q.push(start); 
    
     while(!Q.empty()) ...{ 
          POS current = Q.front(); 
          // Peek the front element 
          Q.pop(); // remove it from queue 
    
          int current_length = D[current];
    
          if(current == finish) ...{ 
               return D[current]; 
               // shortest path is found, return its length 
          } 

          tr(all possible paths from 'current', it) ...{ 
               if(!D.count(*it)) ...{ 
               // same as if(D.find(*it) == D.end), see Part I
                    // This location was not visited yet 
                    D[*it] = current_length + 1; 
               } 
          } 
     } 
     
     // Path was not found 
     return -1; 
}
   
    


   
   //
   
    ...

If the edges have different lengths, however, BFS will not work. We should use Dijkstra instead. It's possible to implement such a Dijkstra via priority_queue -- see below.

Priority_Queue
Priority queue is the binary heap. It's the data structure, that can perform three operations:

push any element (push)
view top element (top)
pop top element (pop)

For the application of STL's priority_queue see the TrainRobber problem from SRM 307.

Dijkstra
In the last part of this tutorial I’ll describe how to efficiently implement Dijktra’s algorithm in sparse graph using STL containers. Please look through this tutorial for information on Dijkstra’s algoritm.

Consider we have a weighted directed graph that is stored as vector< vector< pair<int,int> > > G, where

G.size() is the number of vertices in our graph
G[i].size() is the number of vertices directly reachable from vertex with index i
G[i][j].first is the index of j-th vertex reachable from vertex i
G[i][j].second is the length of the edge heading from vertex i to vertex G[i][j].first

We assume this, as defined in the following two code snippets:

 
 
  
  
   
   
   
   typedef pair
   
   <
   
   int
   
   ,
   
   int
   
   >
   
    ii;
typedef vector
   
   <
   
   ii
   
   >
   
    vii;
typedef vector
   
   <
   
   vii
   
   >
   
    vvii;

Dijstra via priority_queue
Many thanks to misof for spending the time to explain to me why the complexity of this algorithm is good despite not removing deprecated entries from the queue.

      
     
 
 
  
  
   
   
   
    vi D(N, 
   
   987654321
   
   ); 
      
   
   //
   
    distance from start vertex to each vertex
   
   

   
   
      priority_queue
   
   <
   
   ii,vector
   
   <
   
   ii
   
   >
   
   , greater
   
   <
   
   ii
   
   >
   
    
   
   >
   
    Q; 
      
   
   //
   
    priority_queue with reverse comparison operator, 
      
   
   //
   
    so top() will return the least distance
      
   
   //
   
    initialize the start vertex, suppose it’s zero
   
   

   
         D[
   
   0
   
   ] 
   
   =
   
    
   
   0
   
   ;
      Q.push(ii(
   
   0
   
   ,
   
   0
   
   ));

      
   
   //
   
    iterate while queue is not empty
   
   

   
         
   
   while
   
   (
   
   !
   
   Q.empty()) 
   
   ...
   
   {

            // fetch the nearest element
            ii top = Q.top();
            Q.pop();
                        
            // v is vertex index, d is the distance
            int v = top.second, d = top.first;

            // this check is very important
            // we analyze each vertex only once
            // the other occurrences of it on queue (added earlier) 
            // will have greater distance
            if(d <= D[v]) ...{
                  // iterate through all outcoming edges from v
                  tr(G[v], it) ...{
                        int v2 = it->first, cost = it->second;
                        if(D[v2] > D[v] + cost) ...{
                              // update distance if possible
                              D[v2] = D[v] + cost;
                              // add the vertex to queue
                              Q.push(ii(D[v2], v2));

                        }
                  }
            }
      }

I will not comment on the algorithm itself in this tutorial, but you should notice the priority_queue object definition. Normally, priority_queue<ii> will work, but the top() member function will return the largest element, not the smallest. Yes, one of the easy solutions I often use is just to store not distance but (-distance) in the first element of a pair. But if you want to implement it in the “proper” way, you need to reverse the comparison operation of priority_queue to reverse one. Comparison function is the third template parameter of priority_queue while the second paramerer is the storage type for container. So, you should write priority_queue<ii, vector<ii>, greater<ii> >.

https://p-blog.csdn.net/images/p_blog_csdn_net/touzani/303255/o_Dijkstra.JPG

Dijkstra via set
Petr gave me this idea when I asked him about efficient Dijkstra implementation in C#. While implementing Dijkstra we use the priority_queue to add elements to the “vertices being analyzed” queue in O(logN) and fetch in O(log N). But there is a container besides priority_queue that can provide us with this functionality -- it’s ‘set’! I’ve experimented a lot and found that the performance of Dijkstra based on priority_queue and set is the same.

So, here’s the code:

    
 
 
  
  
   
   
   
     vi D(N, 
   
   987654321
   
   );

      
   
   //
   
    start vertex
   
   

   
         
   
   set
   
   <
   
   ii
   
   >
   
    Q;
      D[
   
   0
   
   ] 
   
   =
   
    
   
   0
   
   ;
      Q.insert(ii(
   
   0
   
   ,
   
   0
   
   ));
 
      
   
   while
   
   (
   
   !
   
   Q.empty()) 
   
   ...
   
   {

           // again, fetch the closest to start element 
           // from “queue” organized via set
           ii top = *Q.begin();
           Q.erase(Q.begin());
           int v = top.second, d = top.first;
 
           // here we do not need to check whether the distance 
           // is perfect, because new vertices will always
           // add up in proper way in this implementation

           tr(G[v], it) ...{
                int v2 = it->first, cost = it->second;
                if(D[v2] > D[v] + cost) ...{
                     // this operation can not be done with priority_queue, 
                     // because it does not support DECREASE_KEY
                     if(D[v2] != 987654321) ...{
                           Q.erase(Q.find(ii(D[v2],v2)));
                     }
                     D[v2] = D[v] + cost;
                     Q.insert(ii(D[v2], v2));
                }
           }
      }

One more important thing: STL’s priority_queue does not support the DECREASE_KEY operation. If you will need this operation, ‘set’ may be your best bet.

I’ve spent a lot of time to understand why the code that removes elements from queue (with set) works as fast as the first one.

These two implementations have the same complexity and work in the same time. Also, I’ve set up practical experiments and the performance is exactly the same (the difference is about ~%0.1 of time).

As for me, I prefer to implement Dijkstra via ‘set’ because with ‘set’ the logic is simpler to understand, and we don’t need to remember about ‘greater<int>’ predicate overriding.

What is not included in STL
If you have made it this far in the tutorial, I hope you have seen that STL is a very powerful tool, especially for TopCoder SRMs. But before you embrace STL wholeheartedly, keep in mind what is NOT included in it.

First, STL does not have BigInteger-s. If a task in an SRM calls for huge calculations, especially multiplication and division, you have three options:

use a pre-written template
use Java, if you know it well
say “Well, it was definitely not my SRM!”

I would recommend option number one.

Nearly the same issue arises with the geometry library. STL does not have geometry support, so you have those same three options again.

The last thing – and sometimes a very annoying thing – is that STL does not have a built-in string splitting function. This is especially annoying, given that this function is included in the default template for C++ in the ExampleBuilder plugin! But actually I’ve found that the use of istringstream(s) in trivial cases and sscanf(s.c_str(), …) in complex cases is sufficient.

Those caveats aside, though, I hope you have found this tutorial useful, and I hope you find the STL a useful addition to your use of C++. Best of luck to you in the Arena!

Note from the author: In both parts of this tutorial I recommend the use of some (macro) templates to minimize the time required to implement something. I must say that this suggestion should always be up to the coder. Aside from whether (macro) templates are a good or bad tactic for SRMs, in everyday life they can become annoying for other people who are trying to understand your code. While I did rely on them for some time, ultimately I reached the decision to stop. I encourage you to weigh the pros and cons of templates and to consider this decision for yourself.

width="728" scrolling="no" height="90" frameborder="0" align="middle" src="http://download1.csdn.net/down3/20070601/01184120111.htm" marginheight="0" marginwidth="0">