

by Sachin Malhotra

深入图遍历 (Deep Dive Into Graph Traversals)

There are over 2.07 billion monthly active Facebook Users worldwide as of Q3 2017. The most important aspect of the Facebook network is the social engagement between users. The more friends a user has, the more engaging the conversations become via comments on posts, messaging etc. If you’ve used Facebook fairly regularly, you must be knowing about the Friends Recommendation feature.

截至2017年第三季度,全球每月有超过20.7亿活跃Facebook用户。Facebook网络最重要的方面是用户之间的社交参与。 用户拥有的朋友越多,通过在帖子,消息等上的评论就可以使对话变得更加有趣。如果您经常使用Facebook,则必须了解“朋友推荐”功能。

Facebook recommends a set of people that we can add as friends. Most of the times, these are people we’ve never heard of before. But still, Facebook thinks that we should add them. The question is: how does Facebook come up with a set of recommendations for a specific person?

Facebook推荐了一组我们可以添加为朋友的人。 大多数时候,这些人是我们以前从未听说过的人。 但是,Facebook认为我们应该添加它们。 问题是: Facebook如何针对特定人员提出一系列建议

One way to do this is based on mutual friends. eg:- If a user A and C don’t know each other, but they have a mutual friend B, then probably A and C should be friends too. What if A and C have 2 mutual friends and A and D have 3 mutual friends? How will the ordering be for suggestions?

一种方法是基于共同的朋友。 例如:-如果用户A和C彼此不认识,但是他们有共同的朋友B,那么A和C可能也应该是朋友。 如果A和C有2个共同的朋友,而A和D有3个共同的朋友怎么办? 建议的订购顺序如何?

In this case, it seems pretty obvious to suggest D over C to A because they have more mutual friends and are more likely to get connected.


However, two people might not always have mutual friends, but they might have common 2nd-degree or 3rd-degree connections.


N度连接 (Nth Degree Connections)

  • A and B are friends. (0 degree)

    A和B是朋友。 (0度)

  • A and B are 1st-degree friends means they have a mutual friend.


  • A and B are 2nd-degree friends if they have a friend, who is a 1st-degree friend with the other person. eg:- A — C — D — B, then A and B are 2nd-degree friends.

    A和B是第2度的朋友,如果有一个朋友,谁是第一个度的朋友与其他人他们。 例如:-A — C — D — B,那么A和B是二度好友。

  • Similarly, A and B are Nth degree friends if they have N connections in between. eg:- A — X1 — X2 — X3….. — XN — B.

    同样,如果A和B之间有N个连接,则它们是N级朋友。 例如:-A — X1 — X2 — X3….. — XN —B。

Looking at this approach for the recommendation, we need to be able to find the degree of friendship that two given users share on Facebook.


输入图遍历 (Enter Graph Traversals)

Now that we know how Friend Recommendations can be made, let’s restate this problem so that we can look at it from an algorithmic perspective.


Let’s imagine an undirected graph of all the users on Facebook, where vertices V represent the users and edges E represent friendships. In other words: if users A and B are friends on Facebook, there is an edge between vertices A and B. The challenge is to find out the degree of connection between any two users.

让我们想象一下Facebook上所有用户的无向图其中顶点V代表用户,边E代表友谊。 换句话说:如果用户A和B是Facebook上的朋友,则顶点A和B之间存在一条边缘。挑战是找出任何两个用户之间的联系程度。

More formally, we need to see the shortest distance between two nodes in an undirected, unweighted graph.


Consider two vertices in this undirected graph A and C. There are two different paths for reaching C:


1. A → B → C and 2. A → G →F → E →D →C

1. A→B→C和2. A→G→F→E→D→C

Clearly, we want to take the smallest path when trying to see the degree of connection between two people on the social network.


So far so good.


Before proceeding, let’s look at the complexity of this problem. As stated before, Facebook has around 2.07 billion users as of Q3 2017. That means our graph will have around 2.07 billion nodes and at least (2.07 billion — 1) edges (if every person has at least one friend).

在继续之前,让我们看一下这个问题的复杂性。 如前所述,截至2017年第三季度,Facebook拥有约20.7亿用户。这意味着我们的图表将拥有约20.7亿个节点和至少(20.7亿个-1)个边缘(如果每个人都有至少一个朋友)

This is a huge scale to solve this problem on. Additionally, we also saw that there might be multiple paths to reach from a given source vertex to a destination vertex in the graph and we want the shortest one to solve our problem.

这是解决此问题的巨大规模。 另外,我们还看到从图中的给定源顶点到目标顶点可能有多条路径,我们希望用最短的路径来解决我们的问题。

We will look at two classic graph traversal algorithms to solve our problem:


1. Depth First Search and 2. Breadth First Search.


Imagine that you get stuck in a maze like this.


You have to get out somehow. There might be multiple routes from your starting position to the exit. The natural approach to getting out of the maze is to try all the paths.

您必须以某种方式下车。 从起始位置到出口可能有多条路线。 走出迷宫的自然方法是尝试所有路径。

Let’s say you have two choices at the point where you are currently standing. Obviously, you don’t know which one leads out of the maze. So you decide to make the first choice and move onwards in the maze.

假设您目前处于站立状态时有两种选择。 显然,您不知道哪个人走出迷宫。 因此,您决定做出第一个选择,然后在迷宫中继续前进。

You keep making moves and you keep moving forward and you hit a dead end. Now you would ideally want to try a different path, and so you backtrack to a previous checkpoint where you made one of the choices and then you try a new one i.e. a different path this time.

您不断前进,不断前进,走到了尽头。 现在,理想情况下,您想尝试一条不同的路径,因此您可以回溯到先前的检查点,在此处您做出了选择之一,然后尝试了新的选择,即这次是另一条路径。

You keep doing this until you find the exit.


Recursively trying out a specific path and backtracking are the two components forming the Depth First Search algorithm (DFS).

递归地尝试特定路径和回溯是构成深度优先搜索算法 (DFS)的两个组件。

If we model the maze problem as a graph, the vertices would represent the individual’s position on the maze and directed edges between two nodes would represent a single move from one position to another position. Using DFS, the individual would try all possible routes until the exit is found.

如果我们将迷宫问题建模为图形,则顶点将表示个人在迷宫中的位置,并且两个节点之间的有向边将表示从一个位置到另一位置的单次移动。 使用DFS,个人将尝试所有可能的路线,直到找到出口为止。

Here is a sample pseudo-code for the same.


1  procedure DFS(G,v):2      label v as discovered3      for all edges from v to w in G.adjacentEdges(v) do4          if vertex w is not labeled as discovered then5              recursively call DFS(G,w)

For a deeper dive into this algorithm, check out :-


Time Complexity: O(V + E)

时间复杂度:O(V + E)

Imagine a contagious disease gradually spreading across a region. Every day, the people who have the illness infect new people they come into physical contact with. In this way, the disease is doing a sort of breadth-first-search(BFS) over the population. The “queue” is the set of people who have just been infected. The graph is the physical contact network of the region.

想象一下,一种传染性疾病逐渐在一个地区蔓延。 每天,患病的人都会感染与他们物理接触的新人。 通过这种方式,该疾病在整个人群中进行了广度优先搜索 (BFS)。 “队列”是一组刚被感染的人。 该图是该区域的物理联系网络。

Imagine you need to simulate the spread of the disease through this network. The root node of the search is patient zero, the first known sufferer of the disease. You start off with just them with the disease, and no one else.

想象您需要通过该网络模拟疾病的传播。 搜索的根节点是零号患者,即该疾病的第一个已知患者。 您只从患有这种疾病的人开始,而没有其他人。

Now you iterate over the people they are in contact with. Some will catch the disease. Now iterate over all of them. Give the people they’re in contact with the disease too, unless they’ve already had it. Keep going until you’ve infected everyone, or you’ve infected your target. Then you’re done. That’s how breadth-first-search works.

现在,您遍历了与他们联系的人。 有些会传染病。 现在遍历所有这些对象。 除非他们已经患有这种疾病,否则也要给他们与这种疾病接触的人们。 继续前进,直到您感染了所有人,或者您感染了目标。 这样就完成了。 这就是广度优先搜索的工作方式。

The BFS search algorithm explores vertices layer by layer starting at the very first vertex and only moving on to the next layer once all vertices on the current layer have been processed.


Here is a sample pseudo-code for BFS.


1   procedure BFS(G, v):2       q = Queue()3       q.enqueue(v)4       while q is not empty:5            v = q.dequeue()6            if v is not visited:7               mark v as visited (// Process the node)8               for all edges from v to w in G.adjacentEdges(v) do9                    q.enqueue(w)

For a deeper understanding of BFS, look into this article.


Time Complexity: O(V + E)

时间复杂度:O(V + E)

最短路径 (Shortest Paths)

Let’s move forward and solve our original problem: finding the shortest path between two given vertices in an undirected graph.


Looking at the time complexities of the two algorithms, we can’t really make out the difference between the two for this problem. Both the algorithms will find a path (or rather the shortest path) to our destination from the given source.

从这两种算法的时间复杂度来看,我们不能真正分辨出这两种方法之间的区别。 两种算法都会找到从给定源到目的地的路径(或更短的路径)。

Let’s look at the following example.


Suppose we want to find out the shortest path from the node 8 to 10. Let’s look at the nodes that DFS and BFS explore before reaching the destination.

假设我们想找出从节点8到10的最短路径 。 让我们看一下DFS和BFS在到达目的地之前探索的节点。

  • Process 8 → Process 3 → Process 1.

    过程 8→ 过程 3→ 过程 1。

  • Backtrack to 3.

  • Process 6 → Process 4.

    过程 6→ 过程 4。

  • Backtrack to 6.

  • Process 7.

    工程 7。

  • Backtrack to 6 → Backtrack to 3 → Backtrack to 8.

  • Process 10.


A total of 7 nodes are being processed here before the destination is reached. Now let’s look at how BFS does things.

在到达目的地之前,这里总共要处理7个节点。 现在让我们看一下BFS是如何做的。

  • Process 8 → Enqueue 3, 10

    工程 8→入队3,10

  • Process 3 → Enqueue 1,6

    工程 3→入队1,6

  • Process 10.

    工程 10。

Woah, that was fast! Just 3 nodes had to be processed and we were at our destination.

哇,那太快了! 只需处理3个节点,我们就在目的地。

The explanation for this speedup that we can see in BFS and not in DFS is because DFS takes up a specific path and goes till the very end i.e. until it hits a dead end and then backtracks.


This is the major downfall of the DFS algorithm. It might have to expand 1000s of levels (in a huge network like that of Facebook, just because it selected a bad path to process in the very beginning) before reaching the path containing our destination. BFS doesn’t face this problem and hence is much faster for our problem.

这是DFS算法的主要缺点。 在到达包含我们目的地的路径之前,它可能必须扩展1000多个级别(在像Facebook这样的庞大网络中,只是因为它一开始就选择了一条不好的路径进行处理)。 BFS不会遇到这个问题,因此对于我们的问题来说要快得多。

Additionally, even if DFS finds out the destination, we cannot be sure that the path taken by DFS is the shortest one. There might be other paths as well.

此外,即使DFS找到了目的地,我们也无法确定DFS采取的路径是最短的路径。 可能还有其他路径。

That means that in any case, for the shortest paths problem, DFS would have to span the entire graph to get the shortest path.


In the case of BFS, however, the first occurrence of the destination node ensures that it is the one at the shortest distance from the source.


结论 (Conclusion)

So far we discussed the problem of Friends Recommendation by Facebook and we boiled it down to the problem of finding the degree of connections between two users in the network graph.


Then we discussed two interesting Graph Traversal algorithms that are very commonly used. Finally, we looked at which algorithm performs the best for solving our problem.

然后,我们讨论了两种非常常用的有趣的图遍历算法。 最后,我们研究了哪种算法最能解决我们的问题。

Breadth First Search is the algorithm you want to use if you have to find the shortest distance between two nodes in an undirected, unweighted graph.


Let’s look at this fun problem to depict the difference between the two algorithms.


Assuming that you’ve read the problem statement carefully, let’s try and model this as a graph problem in the first place.


Let all possible strings become nodes in the graph and we have an edge between two vertices if they have a single mutation between them.


Easy, right?


We are given a starting string (read source vertext) eg:- “AACCGGTT” and we have to reach the destination string (read destination vertex) “AACCGGTA” in minimum number of mutations (read minimum number of steps) such that all intermediate strings (nodes) should belong to the given word bank.

给我们一个起始字符串(读取源文本),例如:-“ AACCGGTT”,并且我们必须以最少的突变数(读取最小步骤数)到达目标字符串(读取目标顶点)“ AACCGGTA”,以便所有中间字符串(节点)应属于给定的词库。

Try and solve this problem on your own before looking at the solution below.


If you try to solve it using DFS, you will surely come up with a solution, but there is a test case(s) that will exceed the allotted time limit on the LeetCode platform. That’s because of the problem described before as to why DFS takes so long (process 7 nodes as opposed to 3 in BFS) to reach the destination vertex.

如果您尝试使用DFS解决它,您肯定会想出一个解决方案,但是有一个测试用例会超出LeetCode平台上分配的时间限制。 这是因为前面所述的问题,即为什么DFS花这么长时间(处理7个节点而不是BFS中的3个节点)才能到达目标顶点。

Hope you got the main idea behind the two main graph traversals, and the difference between them when the application is shortest paths in an undirected unweighted graph.


Please recommend (❤) this post if you think this may be useful for someone!


翻译自: https://www.freecodecamp.org/news/deep-dive-into-graph-traversals-227a90c6a261/


