算法渐近性质分析
by Divya Godayal
通过Divya Godayal
神奇宝贝解释的渐近分析:深入研究复杂性分析 (Asymptotic Analysis Explained with Pokémon: A Deep Dive into Complexity Analysis)
by Sachin Malhotra and Divya Godayal
由Sachin Malhotra和Divya Godayal撰写
Let’s admit that we are either still stuck on the nuances of how to write a good algorithm or we dread the term itself.
让我们承认,我们要么仍然停留在如何编写好的算法的细微差别上,要么就惧怕该术语本身。
An algorithm is nothing fancy. It is just the method of doing something. For instance, let’s say Pikachu has to visit his friend tonight. He can do it in many different ways. What matters is which method he chooses.
算法没什么花哨的。 这只是做某事的方法。 例如,假设皮卡丘今晚必须去拜访他的朋友。 他可以用许多不同的方式做到这一点。 重要的是他选择哪种方法。
The method he picks would determine the time taken for him to reach his friend. We deal with such scenarios on a daily basis. We might not think of every decision as an algorithmic decision, but it might be one.
他选择的方法将决定他与朋友联系所需的时间。 我们每天处理此类情况。 我们可能不会将每个决策都视为算法决策,但是可能是一个。
Programmers need to make an informed choice every time. This matters even more when you are building a highly scalable and responsive application.
程序员每次都需要做出明智的选择。 当您构建高度可扩展且响应Swift的应用程序时,这尤为重要。
You are responsible for every piece of code you write, even if it doesn’t work. ?
您应对所编写的每一段代码负责,即使它不起作用。 ?
目录☞ (Table of Contents ☞)
为什么要分析算法? ? (Why analyze an algorithm? ?)
算法无处不在。 就像从字面上看,到处都是。 实际上,为了撰写本文,我们编译了1200个步骤的列表。
Don’t take that seriously now. I am kidding, of course! ?
我的意思是,在任何生活领域中都无法逃脱算法。 您最好学习选择合适的艺术品!
假设我们心爱的神奇宝贝建立了冠军。 每当宠物小精灵赢得战斗时,其排名都会更新。 为了打破关系,下一场战斗是与得分相同的神奇宝贝。
要求您建立一个网站,该网站可以快速告知下一场比赛。 您中的编码忍者很兴奋,然后跳上去。 您建立了一个时尚的网站,具有精美的图形。 最初,您被告知有50只神奇宝贝将参加战斗。
为了找到获胜口袋妖怪的下一场比赛,您决定将其得分与锦标赛中每个口袋妖怪的得分进行比较,这实质上是线性搜索。 它就像一个魅力!
但是在第一场比赛的那天,注册了1000只新的神奇宝贝! 啊,真可惜。 您没有看到即将到来,对吗?
渐进分析是仅根据输入大小(N)评估算法性能的方法,其中N非常大 。 它使您了解应用程序的限制行为,因此对于评估代码的性能非常重要。
For example, if the number of pokemons taking part in the fight is N, then the asymptotic complexity of your linear search algorithm is O(N). If you don’t know what this notation is, fret not. We will address it soon.
例如,如果参加战斗的口袋妖怪的数量为N ,则 线性搜索算法的渐近复杂度为O(N) 。 如果您不知道该表示法是什么,则不用担心。 我们将尽快解决。
In simple words, it’s like asking all the N pokemons what their rank is and then taking a decision. Imagine asking all 1000 pokemons. Tiring! right?
简单来说,就像问所有N个宠物小精灵的等级,然后做出决定。 想象一下要问所有1000只神奇宝贝。 累了! 对?
For a machine, O(N) might not be bad, but on a website where the focus is responsiveness and speed, it might not be the best choice.
对于机器而言, O(N)可能不错,但是在注重响应性和速度的网站上,它可能不是最佳选择。
The reason why 1000 new pokemons becomes a huge problem is because you didn’t think about the scalability aspect of the application from the get go and used a naive approach for solving the problem. Running into such scalability issues was just a matter of time.
1000个新的神奇宝贝成为一个大问题的原因是,因为您从一开始就没有考虑应用程序的可伸缩性方面,而是使用了幼稚的方法来解决问题。 遇到此类可伸缩性问题只是时间问题。
Analysis of algorithms is like that, it’s always hanging around. But you only get serious about it when it’s really needed. And then you are just beating around the tail … uh oh, I mean the bush ?
算法分析就是这样,它总是四处徘徊。 但是,只有在真正需要它时,您才会认真对待它。 然后,您只是在尾巴上跳动……呃,我是说灌木丛?
Analyzing an algorithm helps measures the efficiency of your program and it needs your attention from the moment you start thinking about a solution.
分析算法有助于衡量程序的效率,从开始考虑解决方案的那一刻起,它就需要引起您的注意。
You could have just used a dictionary or a hash table to find all the pokemons with same rank and reduced the algorithmic time complexity to O(1). This is like going to just one manager pokemon who has the answer to your query.
您可能只是使用字典或哈希表来查找具有相同等级的所有神奇宝贝,并将算法时间复杂度降低为O(1) 。 这就像只去找一个拥有您查询答案的经理宠物小精灵。
A crazy reduction in time complexity, from O(N) to O(1). Analyzing an algorithm makes it possible to compare different approaches and decide on the best one.
从O(N)到O(1)疯狂地减少了时间复杂度。 通过分析算法,可以比较不同的方法并确定最佳方法。
顺便问一下,N是多少? ? (What is N, by the way? ?)
N defines the input. Here N are the number of Pokemons. For the purpose of algorithmic analysis, we consider N to be very large.
N定义输入。 这里N是神奇宝贝的数量。 为了进行算法分析,我们认为N非常大。
复杂性和渐近行为 (Complexity and Asymptotic Behavior ?️♀️)
Let’s say Pikachu is on the lookout for a co-pokemon who has some kind of a special power. Pikachu starts by asking all the pokemons about their powers one by one. Such kind of an approach is known as linear search since it’s done linearly, one by one. But for our reference, let’s call it Pikachu’s Search.
假设皮卡丘(Pikachu)正在寻找具有某种特殊力量的同伴宠物小精灵 。 皮卡丘首先要向所有的宠物小精灵询问其能力。 这种方法被称为线性搜索,因为它是一个接一个地线性完成的。 但作为参考,我们将其称为“ 皮卡丘的搜索” 。
1. Pikachu_Search(pokemons): # List of pokemons2. for p in pokemons_list: # No. of iterations - N 3. if p has special power: # Constant time operation4. return p # Constant time operation 5. return "No Pokemon Found" # Constant time operation
In the above code snippet, pokemons_list
is the list of all Pokemons participating in the championship. Hence, the size of this list is N.
在上面的代码片段, pokemons_list
是参加冠军所有的小宠物列表。 因此,此列表的大小为N。
Runtime Analysis for Pikachu’s Search:
皮卡丘搜索的运行时分析:
Step 2
is a for loop, thus the operations inside it will be repeated N times.Step 4
is only executed if the condition in thestep 3
is true. Oncestep 4
is executed the loop breaks and result is returned.Step 2
是一个for循环,因此其中的操作将重复N次。 仅当step 3
的条件为true时,才执行Step 4
。 一旦执行了step 4
,循环就会中断并返回结果。If
Step 3
takes a constant amount of time, sayC1
, then the total time taken in the for loop would beC1.N.
如果
Step 3
花费恒定的时间,例如C1
,则for循环中花费的总时间将为C1.N.
All the other operations are constant time operations not affected by the loop so we can take a cumulative constant for all of them as
C2
.所有其他操作都是不受循环影响的恒定时间操作,因此我们可以将所有这些操作的累积常数作为
C2
。
Total Runtime f(N) =
C1.N + C2
, a function of N.总运行时间f(N)=
C1.N + C2
,是N的函数。
Let’s make it large. What if the value of N is very, very large. Do you think the constants would have any significance then?
让它变大。 如果N的值非常非常大,该怎么办。 您认为常量会有意义吗?
In algorithmic analysis, an important idea is to chop off the less important part.
在算法分析中,一个重要的想法是将次要部分切掉。
For example, if the run time for an algorithm is expressed as10N² + 2N + 5
then asymptotically, only the higher order term N² is of significance. This makes comparison between algorithms much easier.
例如,如果算法的运行时间表示为10N² + 2N + 5
然后渐近表示,则只有高阶项N²才有意义。 这使得算法之间的比较容易得多。
复杂程度◎◉●○⦿ (Degrees of Complexity ◎◉●○⦿)
An algorithm shows different behaviors when exposed to different types of inputs. This brings us to the discussion of how we can define this behavior or the complexity of the algorithm. Since Pikachu’s search is still on, let’s see what’s going on with him.
当暴露于不同类型的输入时,算法显示出不同的行为。 这使我们开始讨论如何定义此行为或算法的复杂性。 由于皮卡丘的搜寻仍在进行中,因此让我们看看他的情况如何。
Best Case ~ Pure Optimism. He got very lucky since the very first pokemon he approached had the special power Pikachu was looking for.
最佳情况〜 纯粹的乐观主义 。 自从他接触到的第一个宠物小精灵拥有皮卡丘正在寻找的特殊力量以来,他感到非常幸运。
Worst Case ~ Pure Pessimism. He had to go visit all the pokemons and to his dismay, the very last pokemon had the super power which he wanted.
最坏的情况〜 纯粹悲观主义 。 他不得不去探访所有的宠物小精灵,令他沮丧的是,最后一个宠物小精灵拥有了他想要的超能力。
Average Case ~ Being Practical. Pikachu is a grown up Pokemon now. Experience has taught him a lot and he knows it’s all a matter of time and luck. He estimated high chances of finding the super Pokemon in the first 500 or so Pokemons that he visits and he was right.
一般情况〜 实用。 皮卡丘(Pikachu)现在是一个成年的神奇宝贝。 经验教给他很多东西,他知道这都是时间和运气的问题。 他估计在访问的前500个左右的神奇宝贝中找到超级神奇宝贝的可能性很高,他是对的。
Analyzing an algorithm could be done in the above mentioned three ways.
分析算法可以通过上述三种方式完成。
The best case complexity
does not yield much. It acts as the lower bound for the complexity of an algorithm. If you go with it, you are just preparing yourself for the best. You have to be pretty lucky for your algorithm to hit the best case bounds anyways. In a practical sense, this doesn’t help much.
best case complexity
不会产生太多。 它充当算法复杂性的下限。 如果您选择了它,那么您只是在准备最好的自己。 无论如何,您必须非常幸运,算法才能达到最佳情况。 从实际意义上讲,这并没有太大帮助。
Always a good to know, the average case complexity
is generally difficult to calculate because it needs you to analyze the performance of your algorithm on different variations of the input and hence not widely used.
通常要知道的是, average case complexity
通常很难计算,因为它需要您根据输入的不同变化来分析算法的性能,因此没有得到广泛使用。
Worst case complexity
helps you prepare for the worst. In algorithms this kind of pessimism is considered good since it gives an upper bound on the complexity. Hence, you always know the limits of your algorithm!
Worst case complexity
可以帮助您为Worst case complexity
做准备。 在算法中,这种悲观主义被认为是好的,因为它给出了复杂性的上限。 因此,您始终知道算法的局限性!
复杂性分析工具? (Tools for Complexity Analysis ?)
We saw earlier that the total runtime for Pikachu’s Search is f(N)= C1.N + C2
, a function of N. Let’s get to know more about the tools we have, to represent the running time, so as to make comparison among algorithms possible.
前面我们看到,皮卡丘Search的总运行时间为f(N)= C1.N + C2
,是N的函数。让我们更多地了解我们拥有的工具,以表示运行时间,以便进行比较。可能的算法。
Big O ?: Oh yes! It’s pronounced like that. Big — Oh
! It’s the upper bound on the complexity of an algorithm. Hence, it is used to denote the worst behavior of an algorithm.
大O ?:哦,是的! 听起来像那样。 ig — Oh
! 这是算法复杂度的上限。 因此,它用来表示算法的最差行为。
Essentially, this denotes the maximum runtime for an algorithm no matter the input.
本质上,无论输入多少,这都表示算法的最大运行时间。
It is the most widely used notation because of its ease of analyzing an algorithm by learning about its worst behavior.
它是使用最广泛的符号,因为它易于通过了解算法的最差行为来对其进行分析。
For Pikachu’s search, we can say f(N) or running time is bounded from above by C.g(N)
for very large N, where is c
is a constant and g(N) = N
. Thus O(N)
represents the asymptotic upper bound for Pikachu’s search.
对于皮卡丘的搜索,我们可以说对于非常大的N,f(N)或运行时间由Cg(N)
从上方限制,其中c
是常数, g(N) = N
因此, O(N)
表示皮卡丘搜索的渐近上限。
Big Omega(Ω): Similar to Big O notation, the Ω notation is used to define an asymptotic lower bound on the performance of an algorithm. Hence, this is used for representing the best case scenarios.
Big Omega(Ω):与Big O表示法相似,Ω表示法用于定义算法性能的渐近下限。 因此,这用于表示最佳情况。
The omega bound essentially means the minimum amount of time that our algorithm will take to execute, irrespective of the input.
Ω界限本质上是指我们的算法执行所需的最短时间 ,而与输入无关。
This notation is not used often in practical scenarios, since studying the best behavior can’t be a correct measure for comparison.
由于在研究最佳行为时并不能作为比较的正确方法,因此在实际情况中通常不使用这种表示法。
For Pikachu’s search, we can say f(N) or running time is bounded from below by C.g(N)
for very large N, where is c is a constant and g(N) = 1
. Thus Ω(1)
represents the asymptotic lower bound for Pikachu’s Search.
对于皮卡丘的搜索,对于非常大的N,可以说f(N)或运行时间由Cg(N)
从下面限制,其中c是常数, g(N) = 1
。 因此, Ω (1)
表示皮卡丘搜索的渐近下界。
Big Theta(Θ): A tight bound on the behavior of an algorithm, this notation defines the upper and lower bounds for a function. This is called a tight bound
because we fix the running time to within a constant factor above and below. Something like this:
大Theta ( Θ) :严格限制算法的行为,该符号定义函数的上限和下限。 之所以称为tight bound
是因为我们将运行时间固定在上下两个恒定的范围内。 像这样:
An algorithm might exhibit different best and worst case behaviors. When both are the same, we tend to use the theta notation. Otherwise, the best and worst cases are called out separately as:
一个算法可能表现出不同的最佳和最差情况行为。 当两者相同时,我们倾向于使用theta表示法。 否则,最好和最坏的情况下, 单独叫出来的:
(a) For worst case
f(N) is bounded by function g(N) = N
, for large values of N. Hence tight bound complexity would be denoted asΘ(N)
. This means the worst case run time for Pikachu’s search is at least C1⋅N
and at most C2⋅N.
(a) worst case
对于较大的N值,f(N)由函数g(N) = N
限制。因此,严格的限制复杂度将表示为Θ(N)
。 这意味着皮卡丘搜索的最坏情况运行时间至少为 C 1⋅N
最多 C 2⋅N.
(b) Similarly, its best case
tight bound complexity is Θ(1)
.
(b)同样, best case
紧密边界复杂度为Θ(1)
。
Let’s consider one more example where f(N) = 10N² + 2N + 5
, for this function the best and worst case complexities would be Ω(N²) and O(N²) respectively. Hence the average or the tight bound complexity would be Θ(N²).
让我们再考虑一个例子,其中f(N) = 10N² + 2N + 5
,对于该函数,最佳和最差情况的复杂度分别为Ω(N²)和O(N²)。 因此,平均或紧密边界复杂度将为Θ(N²)。
Since worst case complexity acts as a better comparison metric, from now on we will be using Big-O for complexity analysis.
由于最坏情况下的复杂性是更好的比较指标,因此从现在开始,我们将使用Big-O进行复杂性分析。
空间复杂度? (Space Complexity ?)
We have been discussing about time complexity all this while. An important concept in complexity analysis is Space Complexity. As the name suggests, it means how much space or memory will the algorithm take in terms of N, where N is very large.
我们一直在讨论时间复杂性。 复杂度分析中的一个重要概念是空间复杂度 。 顾名思义,这意味着该算法将以N表示占用多少空间或内存 ,其中N非常大。
Whenever we compare different algorithms for solving a particular problem, we don’t just focus on the time complexities. Space complexity is also an important aspect for comparing different algorithms. Yes, it’s true that we have a lot of memory available these days and hence, space is something which can be compromised on. However, it is not something we should ignore all the time.
每当我们比较解决特定问题的不同算法时,我们就不会只关注时间复杂性。 空间复杂度也是比较不同算法的重要方面。 是的,的确,这些天我们有很多可用的内存,因此空间可能会受到影响。 但是,这不是我们一直都应该忽略的事情。
There’s an interesting conundrum that developers face all the time when coming up with solutions for programming problems. Let’s discuss a little bit about what it is.
有一个有趣的难题,开发人员在提出编程问题解决方案时始终面临。 让我们讨论一下它是什么。
时间和空间的权衡? (The Time and Space trade-off ?)
More often than not, you want to make your algorithm blazingly fast. Sometimes in doing so you end up compromising the space complexity.
通常,您想让自己的算法快速发展。 有时这样做最终会损害空间的复杂性。
However, sometimes we trade in some time to optimize on the space.
但是,有时我们会花一些时间在空间上进行优化。
In practical applications, one thing or the other is compromised and this is famously referred to as the time-space tradeoff in the algorithmic analysis world.
在实际应用中,一件事或另一件事受到损害,这在算法分析领域中被称为时空折衷。
Pikachu realized that he was searching for a pokemon every other day. This essentially means running Pikachu’s Search over and over again. Huh ! ? Naturally, he got so tired with the exhausting amount of work he had to put in everyday.
皮卡丘意识到他每隔一天就在寻找宠物小精灵。 这本质上意味着一遍又一遍地运行皮卡丘的搜索。 ! ? 自然,他对每天不得不进行的繁琐工作感到非常疲倦。
In order to help him out and speed up his search process, we decided to use a hash table. We can use the power type of a pokemon as the key in the hash table.
为了帮助他并加快他的搜索过程,我们决定使用哈希表。 我们可以使用口袋妖怪的力量类型作为哈希表中的键 。
If we need to find the pokemons with a special power, the worst case complexity would be O(1)
, since hash table lookup is a constant time operation.
如果我们需要找到具有特殊能力的口袋妖怪,那么最糟糕的情况就是O(1)
,因为哈希表查找是一个固定时间的操作。
Without using this hash table, poor little Pikachu would have had to go visit every pokemon individually and ask about their powers. And repeatedly doing this is insane.
如果不使用此哈希表,可怜的小皮卡丘将不得不单独拜访每个宠物小精灵并询问其力量。 反复这样做是很疯狂的。
All it took was creating a hash table once and from then on use it for look-ups to bring down the overall runtime!
所要做的就是一次创建一个哈希表,然后将其用于查找以降低整体运行时间!
But that’s not it, as you saw it came with a cost of space. The hash table would need an entry for each Pokemon. Hence the space complexity would be O(N)
.
但这不是那样,正如您看到的那样,它要占用空间。 哈希表将需要为每个Pokemon条目。 因此,空间复杂度将为O(N)
。
O(N) Time, O(1) Space
—— Choose between — — O(1) Time, O(N) Space
O(N) Time, O(1) Space
- 选择 — — O(1) Time, O(N) Space
This choice depends on the application needs. If we have a customer facing application, it should not be slow. The priority in such a situation would be making the application as responsive as possible no matter the amount of space used. However, if we’re really constrained by the space available to us, we have to give up on time to make up for that.
该选择取决于应用程序需求。 如果我们有一个面向客户的应用程序,那么它应该不会很慢。 在这种情况下,无论使用多少空间,优先级都将使应用程序尽可能响应。 但是,如果我们真的受到可用空间的限制,那么我们必须放弃时间来弥补这一点。
Choosing your algorithm wisely helps to optimize both time and space.
明智地选择算法有助于优化时间和空间。
Time and Space complexity always go hand in hand. We need to do the math and go with the best approach. There is golden rule to help you decide which one to compromise on. Everything is application dependent.
时空复杂性总是密不可分的。 我们需要进行数学运算,并采用最佳方法。 有一条黄金法则可以帮助您决定要折中哪一项。 一切都取决于应用程序。
That’s a lot of theoretical concepts to soak in. We know, even poor Pikachu has gotten a bit bored. But worry not, we will now put all these concepts into some practice and use them to analyze the complexity of some algorithms. This will help clarify the minute differences between the different kinds of complexities, the importance of big-Oh complexity, the time-space tradeoff and more.
需要吸收很多理论概念。我们知道,即使是贫穷的皮卡丘也感到有些无聊。 但是不用担心,我们现在将所有这些概念付诸实践,并使用它们来分析某些算法的复杂性。 这将有助于弄清不同种类的复杂性之间的微小差异,大Oh复杂性的重要性,时空权衡等等。
To start with, Pikachu wants to analyze all the sorting techniques. Sorting all the Pokemons by their ranks helps him to keep the rank table organized.
首先,皮卡丘(Pikachu)要分析所有排序技术。 将所有口袋妖怪按等级排序有助于他保持等级表的有序性。
Let’s check out the basic yet crucial sorting algorithms. The input array pk_rank
to be sorted is of size N.
让我们看看基本但至关重要的排序算法。 要排序的输入数组pk_rank
的大小为N。
In case you are not familiar with any of the sorting algorithms mentioned below, we advice you to read about them before moving onto the following sections. The intention of the following examples is not to explain the different algorithms but to explain how you can derive their time and space complexity.
如果您不熟悉下面提到的任何排序算法,建议您先阅读它们,然后再继续以下各节。 以下示例的目的不是解释不同的算法,而是解释如何推导它们的时间和空间复杂性。
气泡排序? (Bubble Sort ?)
The bubble sort, one of the simplest of sorting algorithms repeatedly compares adjacent elements of an array and swaps them if they are out of order. The analogy is drawn from bubbles which rise to the top eventually. As elements of an array are sorted, they gradually bubble to their correct position in the array.
该冒泡排序 ,最简单的排序算法一个反复的阵列的相邻的元件和互换如果它们的顺序进行比较。 类比是从最终上升到顶部的气泡得出的。 当对数组中的元素进行排序时,它们会逐渐冒泡至其在数组中的正确位置。
Time Complexity: Now that we have the algorithm in place, let’s get to analyzing its time and space complexity. We can clearly see from step 2 and 3
there is a nested loop structure in the algorithm. Also the range of the second for loop is N — 1 — i
, which clearly indicates it is dependent on the previous loop.
时间复杂度:现在,我们已经有了算法,现在让我们分析其时间和空间复杂度。 从step 2 and 3
我们可以清楚地看到算法中有一个嵌套的循环结构。 同样,第二个for循环的范围是N — 1 — i
,这清楚地表明它取决于上一个循环。
if i = 0, second for loop would execute N-1 timesif i = 1, second for loop would execute N-2 timesif i = 2, second for loop would execute N-3 times..if i = N-1, second for loop would execute 0 times
Now we know the amount of time (iterations) our bubble sort algorithm takes at each step along the way. We mentioned before that there is a nested loop in the algorithm. For each value of the variable in the first loop, we know the amount of time taken in the second loop. All that remains now is to sum these up. Let’s do that.
现在我们知道了气泡排序算法在执行过程中每一步所花费的时间(迭代)。 我们之前提到过,算法中存在一个嵌套循环。 对于第一个循环中变量的每个值,我们知道第二个循环中花费的时间。 现在剩下的就是总结这些。 来做吧。
S = N-1 + N-2 + N-3 + ... + 3 + 2 + 1~ N * (N+1) / 2 ~ N² + N, ignoring all the coefficients
If you look at step 4
and step 5
, these are constant time operations. They don’t really add anything to the time complexity (or space complexity for that matter). That implies, we have N² + N iterations and in each iteration, we have constant time operations being performed.
如果您查看step 4
和step 5
,则这些是恒定时间操作。 它们并没有真正增加时间复杂度(或与此相关的空间复杂度)。 这意味着,我们有N²+ N次迭代,并且在每次迭代中,我们都执行恒定时间的运算。
Hence, the runtime complexity of bubble sort algorithm would be C.(N² + N) where C
is a constant. Asymptotically we can say the worst case time complexity for Bubble Sort is O(N²)
.
因此,冒泡排序算法的运行时复杂度将为C.(N²+ N) ,其中C
为常数。 渐近我们可以说,冒泡排序的最坏情况下时间复杂度是O(N²)
。
Is this a good sorting algorithm? We haven’t looked at any other algorithms as such to compare with. However, let’s see how long it will take for this algorithm to sort a billion pokemons (reproduction, overpopulation, you get it ?).
这是一个好的排序算法吗? 我们还没有看过其他可与之比较的算法。 但是,让我们看一下该算法对十亿个神奇宝贝进行分类需要多长时间(繁殖,人口过剩,您知道吗?)。
We leave the calculation up to you, but, it will take bubble sort approximately 31,709 years to sort a billion pokemons (assuming every instruction takes 1 ms to execute). Is Pikachu immortal or something ?
我们将计算权交给您,但是,排序十亿个神奇宝贝需要大约31,709年的气泡排序时间 (假设每条指令需要1毫秒才能执行)。 皮卡丘是不朽的吗?
Space Complexity: Analyzing the space complexity is comparatively simpler as opposed to the time complexity for this algorithm. The bubble sort algorithm only performs one operation repeatedly. Swapping of numbers. In doing this, it doesn’t make use of any external memory. It just re-arranges the numbers in the original array and hence, the space complexity is constant, or O(1)
or even Θ(1)
.
空间复杂度:与该算法的时间复杂度相比,分析空间复杂度相对简单。 冒泡排序算法仅重复执行一个操作。 交换数字。 这样,它不会使用任何外部存储器。 它只是重新排列原始数组中的数字,因此,空间复杂度是恒定的,即O(1)
甚至Θ(1)
。
插入排序??? (Insertion Sort ???)
Do you like playing cards?
你喜欢玩纸牌吗?
Well, even if you don’t, you should know that a good initial strategy in a lot of games is to arrange the cards in a specific order i.e. to sort the deck of cards. The idea for insertion sort is very similar to arranging the deck of cards.
好吧,即使您不这样做,您也应该知道,在很多游戏中,一个好的初始策略是按特定顺序排列纸牌,即对纸牌进行排序 。 插入排序的想法与安排卡片组非常相似。
Let’s say, you have a few cards sorted in ascending order. If you are given another card to be inserted
at the right position so that the cards in your hand are still sorted. What will you do?
假设您有几张卡片以升序排列。 如果您将另一张卡inserted
正确的位置,以便您手中的卡仍能被分类。 你会怎么做?
You would start from either the left or the right extreme of the cards in hand and compare the new card with every card in the deck and find the right spot.
您可以从手牌的左端或右端开始,然后将新卡与卡组中的每张卡进行比较,以找到正确的位置。
Similarly, if more new cards are provided, you repeat the same process for each new card and keep the cards in your hand sorted.
同样,如果提供了更多新卡,则对每张新卡重复相同的过程,并保持手中的卡分类。
Insertion sort works in the same manner. It starts from index 1
(array ordering starts at 0
) and treats each element like a new card. Each of the new element can then placed at the correct position in the already sorted left subarray.
插入排序的工作方式相同。 它从索引1
开始(数组排序从0
开始),并将每个元素都像一张新卡片一样对待。 然后,可以将每个新元素放置在已排序的左子数组中的正确位置。
The important thing to note here is, given a new card (or an element in our case at an index j
), all the cards in the hand (or all the elements before that index) are already sorted.
这里要注意的重要一点是,给定一张新牌(或本例中为下标j
的元素),手中的所有牌(或该下标之前的所有元素) 均已排序 。
Let’s look at a formal algorithm for insertion sort followed by an animation that executes the algorithm on a test input.
让我们看一下用于插入排序的形式算法,然后是在测试输入上执行该算法的动画。
Time Complexity: From step 1 and 4
there is a nested while
structure within a for
loop. The while loop runs j+1
times and j
is clearly dependent on i
. Let’s see how value of j
changes with changing values of i
.
时间复杂度:从step 1 and 4
在for
循环中有一个嵌套的 while
结构。 while循环运行j+1
次,并且j
显然取决于i
。 让我们看看j
值如何随着i
值变化而变化。
if i = 1, then j = 0 hence while loop would execute 1 timesif i = 2, then j = 1 hence while loop would execute 2 timesif i = 3, then j = 2 hence while loop would execute 3 times..if i = N-1, then j = N-2 hence while loop would execute N-1 times
Now we know the amount of time (iterations) our insertion sort algorithm takes at each step along the way. Total time is:
现在我们知道了插入排序算法在执行过程中每一步所花费的时间(迭代)。 总时间为:
S = 1 + 2 + 3 + .... + N-2 + N-1~ N * (N+1) / 2 ~ N² + N, ignoring all the coefficients
Step 2 through 7
are constant time operations. They don’t really add anything to the time complexity (or space complexity for that matter). That implies, we have N² + N iterations and in each iteration, we have constant time operations being performed.
Step 2 through 7
是恒定时间操作。 它们并没有真正增加时间复杂度(或与此相关的空间复杂度)。 这意味着,我们有N²+ N次迭代,并且在每次迭代中,我们都执行恒定时间的运算。
Hence, the runtime complexity of the insertion sort algorithm would be C.(N² + N) where C
is a constant. Asymptotically, we can say the worst case time complexity for Insertion Sort is same as that of bubble sort i.e. O(N²)
.
因此,插入排序算法的运行时复杂度将为C.(N²+ N) ,其中C
为常数。 渐近地,我们可以说插入排序的最坏情况时间复杂度与冒泡排序的复杂度即O(N²)
。
Space Complexity: Analyzing the space complexity is comparatively simpler as opposed to the time complexity for this algorithm. The insert sort algorithm only re-arranges the numbers in the original array. In doing this, it doesn’t make use of any external memory at all. Hence, the space complexity is constant, or O(1)
or even Θ(1)
.
空间复杂度:与该算法的时间复杂度相比,分析空间复杂度相对简单。 插入排序算法仅重新排列原始数组中的数字。 这样,它根本不使用任何外部存储器。 因此,空间复杂度是恒定的,或者是O(1)
甚至Θ(1)
。
Note: Comparing algorithms on the basis of asymptotic complexity is easy and fast. Also on a higher level it’s a good measure. But from practical aspects if two algorithms have same complexity, it doesn’t necessarily mean they have same performance in practical scenarios.
注意:基于渐进复杂度比较算法非常容易且快速。 在较高级别上,这也是一个很好的措施。 但是从实际角度看,如果两种算法具有相同的复杂性,并不一定意味着它们在实际场景中具有相同的性能。
When calculating the asymptotic complexity of an algorithm, we ignore all the constant factors and the lower order terms.
在计算算法的渐近复杂度时,我们忽略了所有常数因子和低阶项。
But these ignored values eventually do add to the execution time of an algorithm.
但是这些被忽略的值最终确实会增加算法的执行时间。
Insertion sort is much faster than bubble sort when the array is almost sorted. For each pass through the array, bubble sort must go till the end of the array and compare the adjacent pairs, insertion sort on the other hand, would bail early if it finds that the array is sorted. Try executing the two algorithms on a sorted array and look at the number of iterations it takes each of them to finish execution.
当数组几乎被排序时,插入排序比气泡排序快得多。 对于每次通过数组,冒泡排序必须一直到数组末尾,然后比较相邻的对,而插入排序则在发现数组已排序的情况下提早保释。 尝试在已排序的数组上执行这两种算法,并查看它们完成完成所需的迭代次数。
Thus, whenever you are finding the best algorithm for your application, it needs to be analyzed from a lot of different aspects. Asymptotic analysis definitely helps to weed out slower algorithms, but observation and deeper insights help to find the best suited algorithm for your application.
因此,每当找到适合您应用程序的最佳算法时,都需要从许多不同方面进行分析。 渐进分析绝对有助于消除速度较慢的算法,但是观察和更深入的见识有助于找到最适合您的应用程序的算法。
合并排序? (Merge Sort ?)
So far we’ve analyzed two of the most basic sorting algorithms. These are introductory sorting algorithms but are not the ones generally used in practice due to their high asymptotic complexity.
到目前为止,我们已经分析了两种最基本的排序算法。 这些是介绍性的排序算法,但由于其渐近复杂性高,因此在实践中并不常用。
Let’s move on to a faster, more practical sorting algorithm. The merge sort algorithm deviates from the nested loop structured sorting that we have seen in the previous two algorithms and adopts a completely new paradigm that we will be discussing below.
让我们继续一个更快,更实用的排序算法。 合并排序算法不同于我们在前两个算法中看到的嵌套循环结构化排序,并且采用了我们将在下面讨论的全新范式。
The Merge Sort algorithm is based on something known as the Divide and Conquer programming paradigm. This programming paradigm is based on a very simple idea and this finds utility in a lot of different algorithms out there including merge sort. Divide and Conquer is divided into three basic steps:
合并排序算法基于称为分而治之编程范例的事物。 该编程范例基于一个非常简单的想法,并且可以在许多不同的算法(包括合并排序)中找到实用程序。 分而治之分为三个基本步骤:
Divide: Break a big problem into smaller sub-problems.
划分 :将大问题分解为较小的子问题。
Divide: Break a big problem into smaller sub-problems.Conquer: Optimally solve the smaller sub-problems
划分 :将大问题分解为较小的子问题。 征服 :最佳解决较小的子问题
Divide: Break a big problem into smaller sub-problems.Conquer: Optimally solve the smaller sub-problemsCombine: Finally, combine the results of the sub-problems to find the solution for the original big problem.
划分 :将大问题分解为较小的子问题。 征服 :最佳解决较小的子问题合并 :最后, 结合子问题的结果以找到原始大问题的解决方案。
Let’s look at a brief overview of how the merge sort algorithm makes use of the divide and conquer paradigm.
让我们看一下合并排序算法如何利用分而治之范式的简要概述。
Divide ~ The first step in the process is to divide the given array into two, equal-sized smaller sub-arrays. This helps since now we have 2 smaller arrays to sort, each with half the original number of elements.
划分 〜这个过程的第一步是划分 将给定的数组分成两个相等大小的较小子数组。 这有帮助,因为现在我们有2个较小的数组可以排序,每个数组都有原始元素数量的一半。
Conquer ~ The next step is to sort the smaller arrays. This part is referred to as the conquer step since we are solving the sub-problems optimally.
征服〜下一步是对较小的数组进行排序。 这部分被称为征服步骤,因为我们正在以最佳方式解决子问题。
Combine ~ Finally, we are presented with two sorted halves of the original array and we have to combine them optimally such that we get a single sorted array. This is the combine step of the paradigm explained above.
合并〜最后,我们看到原始数组的两个排序半部分,我们必须对其进行最佳组合,以得到一个排序后的数组。 这是上述范例的组合步骤。
But wait. Is this it?
可是等等。 是这个吗?
Given an array of 1000 elements if we divide it into 2 equal halves of 500 each, we still have a lot of elements to sort in an array (or sub-array).
给定一个由1000个元素组成的数组,如果我们将其分为2个相等的一半(每个500个),那么我们仍然有很多元素要排序在一个数组(或子数组)中。
Shouldn’t we divide the two halves further into 4 to get even shorter subarrays?
我们不应该将两个半部分进一步分成4个以得到更短的子数组吗?
Yes! Indeed we should!
是! 确实,我们应该!
We recursively divide the array into smaller halves and sort and merge the smaller halves to get the original array back.
我们将数组递归地分成较小的两半,然后对较小的两半进行排序和合并,以恢复原始数组。
This essentially means we divide e.g. an array of size 1000 into 2 halves of 500 each. Then we further split these two halves into 4 portions of 250 each and so on. Don’t worry if you’re not able to contemplate all of this intuitively in terms of complexity analysis. We will get to that very soon.
这实质上意味着我们将例如大小为1000的数组划分为两半,每半500个。 然后,我们将这两半进一步分成4个部分,每个部分250个,依此类推。 如果您无法根据复杂性分析来直观地考虑所有这些,请不要担心。 我们将很快解决。
Let’s have a look at the algorithm for merge-sort. The algorithm is divided into two functions, one which recursively sorts the two equal halves of a given array and another one which merges the two sorted halves together.
让我们看一下合并排序的算法。 该算法分为两个函数,一个函数递归地排序给定数组的两个相等的一半,另一个函数将两个排序后的一半合并在一起。
We will first analyze the complexity of the merge function and then get to analyzing the merge_sort function.
我们将首先分析合并函数的复杂性,然后再分析merge_sort函数。
The above function simply takes in two sorted halves of the array and merges them together into a single, sorted half. The two halves are defined using indices. The left half is from [left, mid]
and the right half is from [mid + 1, right]
.
上面的函数只是将数组的两半分开,然后将它们合并为一个单独的半数。 使用索引定义两个半部分。 左半部分来自[left, mid]
,右半部分来自[mid + 1, right]
。
step 2-3
copies the elements over from the original array to a temporary buffer and we use this buffer for merging purposes. The sorted elements are copied back to the original array. Since we iterate over a certain portion of the array, the time complexity for this operation is O(N)
considering there are N
elements in the array.
step 2-3
将元素从原始数组复制到临时缓冲区中,并使用此缓冲区进行合并。 将排序后的元素复制回原始数组。 由于我们遍历数组的特定部分,因此考虑到数组中有N
元素,此操作的时间复杂度为O(N)
。
step 5
is a while loop which iterates over the shorter one of the two subarrays. This while loop and the ones that come after, in step 13 and step 14
cover all the elements of the two subarrays. So, their combined time complexity is O(N)
.
step 5
是while循环,它迭代两个子数组中较短的一个。 在step 13 and step 14
,此while循环及其后的内容覆盖了两个子数组的所有元素。 因此,它们的综合时间复杂度为O(N)
。
This means that the merging step is a linear time algorithm.
这意味着合并步骤是线性时间算法。
The overall complexity of the merge sort is decided by the number of times the merge function is called.
合并排序的总体复杂度取决于调用合并函数的次数。
Let’s move on and look at the original merge_sort function. It’s extremely simple.
让我们继续看一下原始的merge_sort函数。 非常简单。
step 4
calls the merge_sort
function on the left half of the array.
step 4
调用数组左半部分的merge_sort
函数。
step 5
calls the merge_sort
function on the right half of the array.
step 5
调用数组右半部分的merge_sort
函数。
and then the step 6
finally calls the merge
function to combine the two halves.
然后step 6
最终调用merge
函数将两半merge
。
Uh. A function calling itself? ??
嗯 一个函数调用自己? ??
How does one calculate it’s complexity?
如何计算其复杂性?
Till now we have discussed analysis of loops. Many algorithms, however, like Merge Sort are recursive in nature. When we analyze them, we get a recurrence relation for time complexity. We get running time on an input of size N
as a function of N
and the running time on inputs of smaller sizes.
到目前为止,我们已经讨论了循环分析。 但是,许多算法(例如合并排序)本质上都是递归的。 当我们分析它们时,我们得到时间复杂度的递归关系。 我们得到的大小的输入运行时间N
作为一个功能N
和尺寸较小的投入运行时间。
Primarily, there are two important ways of analyzing the complexity of a recurrence relation:
首先,有两种重要的方法可以分析递归关系的复杂性:
- Using a Recursion Tree and 使用递归树和
- Using the Master Method. 使用主方法。
递归树分析? (Recursion Tree Analysis ?)
This is the most intuitive way for analyzing the complexity of recurrence relations. Essentially, we can visualize a recurrence relation in the form of a recursion tree.
这是分析递归关系复杂性的最直观方法。 本质上,我们可以以递归树的形式可视化递归关系。
The visualization helps to know the amount of work done by the algorithm at each step (read level) along the way and summing up the work done at each level tells us the overall complexity of the algorithm.
可视化有助于了解算法在此过程中每个步骤(读取级别)所完成的工作量,总结每个级别所完成的工作将告诉我们算法的整体复杂性。
Before we look at the recursion tree for the Merge Sort algorithm, let’s first look at the recurrence relation for it.
在查看合并排序算法的递归树之前,让我们先来看一下它的递归关系。
T(N) = 2T(N / 2) + O(N)
Let T(N)
represent the amount of work done (or the time taken to) sort an array consisting of N
elements. The above relation states that the overall time taken is equal to the time taken to sort the two halves of the array +
the time taken to merge the two halves. We have already seen the time taken to merge the two halves before and that is O(N)
.
令T(N)
表示完成对由N
元素组成的数组进行排序的工作量(或花费的时间)。 上面的关系表明,花费的总时间等于对数组的两半进行排序所花费的时间+
合并两半所花费的时间。 我们已经看到之前合并两个半部分所花费的时间,即O(N)
。
We can write the recurrence relation as follows:
我们可以将递归关系编写如下:
T(N) = 2T(N / 2) + O(N)T(N / 2) = 2T(N / 4) + O(N / 2)T(N / 4) = 2T(N / 8) + O(N / 4)...
It’s much easier to visualize this in the form of a tree. Each node in the tree would consist of two branches since we have two different subproblems given a single problem. Let’s look at the recursion tree for merge sort.
以树的形式将其可视化要容易得多。 由于给定一个问题,我们有两个不同的子问题,因此树中的每个节点都将包含两个分支。 让我们看一下用于合并排序的递归树。
Each node in the tree represents a subproblem and the value at each node represents the amount of work spent at each subproblem. The root node represents the original problem.
树中的每个节点代表一个子问题,每个节点上的值代表每个子问题上花费的工作量。 根节点代表原始问题。
In our recursion tree, every non-leaf node has 2 children, representing the number of subproblems it is splitting into. We have seen from the algorithm for Merge Sort that at each step of the recursion, the given array is divided into two equal halves.
在我们的递归树中,每个非叶节点都有2个子节点,代表了它分解成的子问题的数量。 我们从合并排序算法中看到,在递归的每个步骤中,给定的数组都被分为两个相等的一半。
So, there are two important things we need to figure out in order to analyze the complexity of the merge sort algorithm.
因此,为了分析合并排序算法的复杂性,我们需要弄清两个重要事项。
We need to know the amount of work done at each level in the tree and
我们需要知道树中每个级别完成的工作量 ,
We need to know the total number of levels in the tree, or, as it is more commonly called, the height of the tree.
我们需要知道树中所有级别的总数,或者,通常被称为树的高度。
First, we will calculate the height of our recursion tree. We can see from the recursion tree above that every non-leaf node splits into two nodes. Hence, what we have above is a complete binary tree.
首先,我们将计算递归树的高度。 从上面的递归树中可以看到,每个非叶节点都分为两个节点。 因此,我们上面有一个完整的二叉树 。
Intuitively, we will go on splitting the array until there is a single element left in a subarray, at which point we don’t need any sorting (this is the base case) and we simply return.
直观地讲,我们将继续拆分数组,直到子数组中只剩下一个元素为止,此时我们不需要任何排序(这是基本情况),只需返回即可。
At the first level in our binary recursion tree, there is a single subproblem consisting of N
elements. The next level in the tree consists of 2
subproblems (sub-arrays to be sorted) with N / 2
elements each.
在我们的二进制递归树的第一层,有一个由N
元素组成的单个子问题。 树中的下一个级别包含2
个子问题(要排序的子数组),每个子问题都有N / 2
元素。
Right now, we are not really concerned with the number of subproblems. We just want to know the size of each subproblem since we can see that all the subproblems on a particular level of the tree are of the same size.
现在,我们并不真正关心子问题的数量 。 我们只想知道每个子问题的大小,因为我们可以看到树的特定级别上的所有子问题都具有相同的大小。
At Level 0 we have subproblem(s) each consisting of N elementsAt Level 1 we have subproblem(s) each consisting of N/2 elementsAt Level 2 we have subproblem(s) each consisting of N/4 elementsAt Level 3 we have subproblem(s) each consisting of N/8 elementsAt Level 4 we have subproblem(s) each consisting of N/16 elements...At Level X we have subproblem(s) each consisting of 1 element.
The number of elements seem to be reducing in powers of 2
. From the pattern above, it seems that:
元素的数量似乎正在减少powers of 2
。 从上面的模式来看,似乎:
N = 2^X X = log_2(N)
This means, the height of our tree is log_2(N)
(log base 2 of N). Now let’s look at the amount of work done by the algorithm at each step.
这意味着,我们树的高度为log_2(N)
( log_2(N)
对数底数2)。 现在,让我们看一下算法在每个步骤中完成的工作量。
T(N)
is defined as the amount of work required to be put in for sorting an array of N
elements. We looked at the recurrence relation for this earlier on and it was:
T(N)
定义为排序N
元素的数组所需的工作量。 我们在较早的时候研究了这种递归关系,它是:
T(N) = 2T(N / 2) + O(N)
This implies, the amount of work done at the first level in the tree is O(N)
and rest of the work is done at the next level. This is due to the recursion call in the form of 2T(N / 2)
. At the next level, as we can see from the figure above, the amount of work done is 2 * O(N / 2) = O(N)
. Similarly, the amount of work done at the third level os 4 * O(N / 4) = O(N)
.
这意味着,在树的第一层完成的工作量为O(N)
,其余工作在第二层完成。 这是由于2T(N / 2)
形式的递归调用。 从上图可以看到,在下一级别,完成的工作量为2 * O(N / 2) = O(N)
。 同样,在第三层完成的工作量os 4 * O(N / 4) = O(N)
。
Surprisingly, the algorithm has to perform the same amount of work on each level and that work amounts to O(N)
which is the time consumed by the merge procedure. Thus, the number of levels will define the overall time complexity.
令人惊讶地,该算法必须在每个级别上执行相同数量的工作,并且该工作总计为O(N)
,这是合并过程所消耗的时间。 因此,级别数将定义整体时间复杂度。
As we calculated earlier, the number of levels in our recursion tree are log(N)
and hence, the time complexity for Merge Sort is O(Nlog(N)).
如前所述,递归树中的级别数为log(N)
,因此,合并排序的时间复杂度为O(Nlog(N)).
Yay! We learnt a new methodology for asymptotic analysis in the form of recursion trees. It’s a fun way to build an intuition about the complexity of any recurrence relation. It may not always be feasible to draw out the complete recursion tree, but it definitely helps build an understanding.
好极了! 我们学习了递归树形式的渐近分析新方法。 这是一种有趣的方式,可以直观地了解任何递归关系的复杂性。 画出完整的递归树可能并不总是可行的,但肯定有助于建立理解。
主方法分析 (Master Method Analysis ??)
We’ve looked at the recursion tree based method for asymptotic analysis of recurrences. However, as mentioned before, it might not be feasible to draw out the recursion tree every time for calculating the complexity.
我们已经研究了基于递归树的递归渐近分析方法。 但是,如前所述,每次绘制出递归树来计算复杂度可能都不可行。
The merge sort recursion breaks a given problem (array) into two smaller sub-problems (subarrays). What if we get an algorithm where a problem is divided into say, 100 subproblems? We won’t be able to draw out the recursion tree for analysis.
合并排序递归将给定的问题(数组)分为两个较小的子问题(子数组)。 如果我们得到一个将问题分为100个子问题的算法该怎么办? 我们将无法绘制出递归树进行分析。
Thus, we need a more direct way for analyzing the complexity of a recurrence relation. We need a method which doesn’t require us to actually draw the recursion tree but one which builds on the same concepts as the recursion tree.
因此,我们需要一种更直接的方法来分析递归关系的复杂性。 我们需要一种方法,它不需要我们实际绘制递归树,而需要一种与递归树相同的概念。
This is where the Master Method comes into the picture. This method is based on the recursion tree method. There are three different scenarios that are covered under the master method which essentially cover most of the recurrence relations. Before looking at these cases, however, let’s look at the recursion tree for the following general recursion relation:
这是主方法出现的地方。 此方法基于递归树方法。 主方法涵盖三种不同的场景,这些场景基本上涵盖了大多数递归关系。 但是,在研究这些情况之前,让我们看一下下面的一般递归关系的递归树:
T(n) = a T(n / b) + f(n)
n
is the size of the problem.n
是问题的大小。a
is the number of subproblems in the recursion.a
是递归中子问题的数量。n / b
is the size of each subproblem. (Here it is assumed that all subproblems are essentially the same size.)n / b
是每个子问题的大小。 (这里假定所有子问题的大小基本相同。)f(n)
is the cost of the work done outside the recursive calls, which includes the cost of dividing the problem into smaller subproblems and the cost of merging the solutions to the subproblems.f(n)
是在递归调用之外完成的工作的成本,其中包括将问题划分为较小的子问题的成本以及将解决方案合并到子问题的成本。
The two most important things to know for us to understand the master method are the amount of work done by the algorithm at the root and the amount of work done at the leaves.
让我们理解主方法需要了解的两个最重要的事情是,算法在根部完成的工作量和叶端完成的工作量。
The work done at the root is simply f(n)
. The amount of work done at the leaves is dependent upon the height of the tree.
从根本上完成的工作就是f(n)
。 叶子上完成的工作量取决于树的高度。
The height of this tree would be log_b(n)
i.e log base b
of n
. This follows from the recursion tree we saw for merge sort. b
in case of merge sort is 2
. The number of nodes at any level, l
are a^l
and so, the number of leaf nodes at the last level would be:
此树的高度将是log_b(n)
即数底数b
的n
。 这来自我们为合并排序看到的递归树。 如果合并排序为2
b
。 任何级别的节点数l
为a^l
,因此,最后一级的叶节点数为:
a^{log_b(n)} = n ^ log_b(a) nodes.
Since the amount of work done on each subproblem at the final level is Θ(1)
, the total amount of work done at the leaf nodes is n ^ log_b(a)
.
由于在最后一级在每个子问题上完成的工作量为Θ(1)
,因此在叶节点处完成的总工作量为n ^ log_b(a)
。
If you focus on the generic recurrence relation above, you will notice that there are two main competing forces at play:
如果您专注于上面的通用递归关系,您会发现有两个主要竞争因素在起作用:
The Division step ~ the ??(?/?) term is desperately trying to reproduce, multiplying smaller and smaller copies of itself.
除法步骤 〜??(?/?)术语极力地试图复制,其自身的副本越来越小。
The Conquer step ~ the ?(?) term represents merging since it is desperately trying to collapse these mini-portions together.
征服步骤 〜?(?)项表示合并,因为它拼命试图将这些小部分折叠在一起。
The two forces are trying to oppose the other one and in doing so, they want to control the total amount of work done by the algorithm and hence the overall time complexity.
两种力量都试图与另一种力量对抗,因此,他们希望控制算法完成的工作总量,从而控制总体时间复杂度。
Who will win ?
Who will win ?
Case 1 (Divide Step Wins) (Case 1 (Divide Step Wins))
If f(n) = Θ(n^c)
such that c < log_b
(a), then T(n) = Θ(n^log_b
(a). f
(n) is the amount of work done at the root of the tree and n ^ log_b
(a) is the amount of work done at the leaves.
If f(n) = Θ(n^c)
such that c < log_b
(a), th en T(n) = Θ(n^log_b
(a ). f
(n) is the amount of work done at the root of the tree a nd n ^ log_b
(a) is the amount of work done at the leaves.
If the work done at leaves is polynomially more, then leaves are the dominant part, and our result becomes the work done at leaves.
If the work done at leaves is polynomially more, then leaves are the dominant part, and our result becomes the work done at leaves.
e.g. T(n) = 8 T(n / 2) + 1000 n^2
If we fit this recurrence relation in the Master method, we get:
If we fit this recurrence relation in the Master method, we get:
a = 8, b = 2, and f(n) = O(n^2)Hence, c = 2 and log_b(a) = log_2(8) = 3Clearly, 2 < 3 and this fits in the Case 1 for Master method. This implies, the amount of work done at the leaves of the tree is asymptotically higher than the work done at the root. Hence, the complexity of this recurrence relation is Θ(n^log_2(8)) = Θ(n^3).
Case 2 (Conquer Step Wins) (Case 2 (Conquer Step Wins))
If f(n) = Θ(n^c)
such that c > log_b
(a), then T(n) = Θ(f(
n)). If work done at root is asymptotically more, then our final complexity becomes work done at root.
If f(n) = Θ(n^c)
such that c > log_b
(a), th en T(n) = Θ(f(
n)). If work done at root is asymptotically more, then our final complexity becomes work done at root.
We are not concerned with the amount of work done at the lower levels here, since the largest polynomial term dependent on n
is the one that controls the complexity of the algorithm. Hence, the work done on all the lower levels can be safely ignored.
We are not concerned with the amount of work done at the lower levels here, since the largest polynomial term dependent on n
is the one that controls the complexity of the algorithm. Hence, the work done on all the lower levels can be safely ignored.
e.g. T(n) = 2 T(n / 2) + n^2
If we fit this recurrence relation in the Master method, we get:
If we fit this recurrence relation in the Master method, we get:
a = 2, b = 2, and f(n) = O(n^2)Hence, c = 2 and log_b(a) = log_2(2) = 1Clearly, 2 > 1 and hence this fits the Case 2 of the Master method where majority of the work is done at the root of the recursion tree and that is why Θ(f(n)) controls the complexity of the algorithm. Thus, the time complexity of this recurrence relation is Θ(n^2).
Case 3 [It's a tie!] (Case 3 [It’s a tie!])
If f(n) = Θ(n^c)
such that c = log_b(a)
, then T(n) = Θ(n^c log(n)).
The final case is when there’s a tie amongst the work done at the leaves and the work done at the root of the tree.
If f(n) = Θ(n^c)
such that c = log_b(a)
, then T(n) = Θ(n^c log(n)).
The final case is when there's a tie amongst the work done at the leaves and the work done at the root of the tree.
In this case, both the conquer and the divide steps are equally dominant and hence, the total amount of work done is equal to the the work done at any level * height of the tree.
In this case, both the conquer and the divide steps are equally dominant and hence, the total amount of work done is equal to the the work done at any level * height of the tree.
e.g. T(n) = 2T(n / 2) + O(n)
Yes! This is the complexity of the Merge Sort algorithm. If we fit the recurrence relation for merge sort in the Master method, we get:
是! This is the complexity of the Merge Sort algorithm. If we fit the recurrence relation for merge sort in the Master method, we get:
a = 2, b = 2, and f(n) = O(n^1)Hence, c = 1 = log_2(2)
This fits the criterion for the Case 3 described above. The amount of work done is same on all the levels as can be verified from the figure above. Thus, the time complexity is the work done at any level * the total number of levels (or the height of the tree).
We have analyzed the time complexity of the Merge Sort algorithm using two different ways namely the Recursion Tree and the Master Method. We had to use these different techniques since the merge sort algorithm is a recursive algorithm and the classical asymptotic analysis approaches we had seen earlier for loops were of no use here.
We have analyzed the time complexity of the Merge Sort algorithm using two different ways namely the Recursion Tree and the Master Method. We had to use these different techniques since the merge sort algorithm is a recursive algorithm and the classical asymptotic analysis approaches we had seen earlier for loops were of no use here.
Space Complexity: As for the space complexity, we don’t have to use any complicated techniques and hence, the analysis is much simpler. One main space occupying data structure in the Merge Sort algorithm is the temp buffer
array that is used during the merge
procedure.
Space Complexity: As for the space complexity, we don't have to use any complicated techniques and hence, the analysis is much simpler. One main space occupying data structure in the Merge Sort algorithm is the temp buffer
array that is used during the merge
procedure.
This array is initialized once and the size of this array is N
. Another data structure that occupies space is the recursion stack. Essentially, the total number of recursive calls determine the size of the recursion stack. As we’ve seen in the recursion tree representation, the number of calls made by merge sort is essentially the height of the recursion tree.
This array is initialized once and the size of this array is N
. Another data structure that occupies space is the recursion stack . Essentially, the total number of recursive calls determine the size of the recursion stack. As we've seen in the recursion tree representation, the number of calls made by merge sort is essentially the height of the recursion tree.
The height of the recursion tree was log_2(N)
and hence, the size of the recursion stack will also be log_2(N)
at max.
The height of the recursion tree was log_2(N)
and hence, the size of the recursion stack will also be log_2(N)
at max.
Hence, the total space complexity for merge sort would be N + log_2(N) = O(N)
.
Hence, the total space complexity for merge sort would be N + log_2(N) = O(N)
.
Binary Search ? ? ? (Binary Search ? ? ?)
Remember our friend Pikachu and his search for a Pokemon of a specific power. Poor little Pikachu had a 1000 Pokemons at his disposal and he had to find the one Pokemon with a specific power. Yeah, Pikachu is very choosy about his opponents.
Remember our friend Pikachu and his search for a Pokemon of a specific power. Poor little Pikachu had a 1000 Pokemons at his disposal and he had to find the one Pokemon with a specific power. Yeah, Pikachu is very choosy about his opponents.
His requirements keep on changing day in and day out and he certainly cannot go and check with each and every Pokemon, every time his requirements change i.e. he cannot perform a Linear Search through the list of Pokemons to find the one he is looking for.
His requirements keep on changing day in and day out and he certainly cannot go and check with each and every Pokemon, every time his requirements change ie he cannot perform a Linear Search through the list of Pokemons to find the one he is looking for.
We mentioned earlier the use of a Hash Table to store the Pokemons using their unique power value as the key and the Pokemon itself as the value. This would bring down the search complexity to O(1)
i.e. constant time.
We mentioned earlier the use of a Hash Table to store the Pokemons using their unique power value as the key and the Pokemon itself as the value. This would bring down the search complexity to O(1)
ie constant time.
However, this makes use of additional space which raises the space complexity of the search operation to O(N)
considering there are N
Pokemons available. N
in this case would be 1000
. What if Pikachu didn’t have all this extra space available and he still wanted to speed up the search process?
However, this makes use of additional space which raises the space complexity of the search operation to O(N)
considering there are N
Pokemons available. N
in this case would be 1000
. What if Pikachu didn't have all this extra space available and he still wanted to speed up the search process?
Yes! Certainly Pikachu can make use of his profound knowledge about sorting algorithms to come up with a search strategy which would be faster than the slow Linear search.
是! Certainly Pikachu can make use of his profound knowledge about sorting algorithms to come up with a search strategy which would be faster than the slow Linear search.
Pikachu decided to ask his good friend Deoxys for help. Deoxys, being the fastest Pokemon out there, helps Pikachu sort the list of Pokemons according to their power.
Pikachu decided to ask his good friend Deoxys for help. Deoxys, being the fastest Pokemon out there, helps Pikachu sort the list of Pokemons according to their power.
Instead of relying on the traditional sorting algorithms, Deoxys makes use of the Quick Sort algorithm (of course he does!) for sorting the Pokemons.
Instead of relying on the traditional sorting algorithms, Deoxys makes use of the Quick Sort algorithm (of course he does!) for sorting the Pokemons.
In doing so, he doesn’t make use of any additional space and the time taken for sorting the N
Pokemons is the same as that of the Merge Sort
algorithm. So, Pikachu is happy with his friend helping him out at the time of need.
In doing so, he doesn't make use of any additional space and the time taken for sorting the N
Pokemons is the same as that of the Merge Sort
algorithm. So, Pikachu is happy with his friend helping him out at the time of need.
Pikachu, being extremely smart, comes up with a search strategy which makes use of the sorted nature of the list of Pokemons. This new strategy /algorithm is known as the Binary Search algorithm. (Note: Sorting is a precondition for running binary search, once the list is sorted Pikachu can run binary search as many times as he wants on this sorted list).
Pikachu, being extremely smart, comes up with a search strategy which makes use of the sorted nature of the list of Pokemons. This new strategy /algorithm is known as the Binary Search 算法。 ( Note : Sorting is a precondition for running binary search, once the list is sorted Pikachu can run binary search as many times as he wants on this sorted list).
Let’s have a look at the code for this algorithm and then analyze its complexity.
Let's have a look at the code for this algorithm and then analyze its complexity.
Clearly, the algorithm is recursive in nature. Let’s see if we can use our newly learnt tricks to analyze the time complexity for the binary search algorithm. The two variables l
and r
essentially define the portion of the array in which we have to search for the given element, x
.
Clearly, the algorithm is recursive in nature. Let's see if we can use our newly learnt tricks to analyze the time complexity for the binary search algorithm. The two variables l
and r
essentially define the portion of the array in which we have to search for the given element, x
.
If we look at the algorithm, all it’s really doing is dividing the search portion of the input array into half. Other than making a recursive call based on a certain condition, it doesn’t really do anything. So, let’s quickly look at the recurrence relation for the binary search algorithm.
If we look at the algorithm, all it's really doing is dividing the search portion of the input array into half. Other than making a recursive call based on a certain condition, it doesn't really do anything. So, let's quickly look at the recurrence relation for the binary search algorithm.
T(n) = T(n / 2) + O(1)
That seems like a pretty simple recurrence relation to analyze. First let’s try and analyze the recursion tree and draw the complexity from there and then we will look at the Master theorem and see which of the three cases fits this recursion.
That seems like a pretty simple recurrence relation to analyze. First let's try and analyze the recursion tree and draw the complexity from there and then we will look at the Master theorem and see which of the three cases fits this recursion.
Whoa! This binary search algorithm is super fast. It’s much faster than linear search. What this implies for our cute little friend Pikachu is that for 1000 Pokemons, he would simply have to go and “ask” 10 of them at max to find the one special pokemon he is looking for (how? ?).
哇! This binary search algorithm is super fast. It's much faster than linear search. What this implies for our cute little friend Pikachu is that for 1000 Pokemons, he would simply have to go and “ask” 10 of them at max to find the one special pokemon he is looking for (how? ?).
Now let’s see how the more “formulaic” way of approach recursive complexity analysis i.e. the Master method help us in this case. The generic master method recursive relation is
Now let's see how the more “formulaic” way of approach recursive complexity analysis ie the Master method help us in this case. The generic master method recursive relation is
T(n) = a T(n / b) + f(n)
and for our binary search algorithm we have
and for our binary search algorithm we have
T(n) = T(n / 2) + O(1)f(n) = O(n^0), hence c = 0a = 1b = 2c = 0
There are 3 different cases for the master theorem and c ? log_b(a)
decides which of the three cases get’s used for a particular analysis. In our case, 0 < log_2
(1) i.e. 0
= 0. This implies that our binary search algorithm fits the case-3 of the master theorem, therefore T(n) = Θ(n^0 log(n)) = Θ(log
(n)
There are 3 different cases for the master theorem and c ? log_b(a)
decides which of the three cases get's used for a particular analysis. In our case, 0 < log_2
(1) i. e. 0
= 0. This implies that our binary search algorithm fits t he case -3 of the master theorem, therefo re T(n) = Θ(n^0 log(n)) = Θ(log
(n)
How to choose the best algorithm? ? (How to choose the best algorithm? ?)
In this article we introduced the idea of complexity analysis which is an important part of algorithm design and development. We saw why analyzing an algorithm’s complexity is important and how it directly affects our scalability decisions. We even saw some great techniques for analyzing this complexity efficiently and correctly so as to make informed decisions in a timely manner. The question arises, however,
In this article we introduced the idea of complexity analysis which is an important part of algorithm design and development. We saw why analyzing an algorithm's complexity is important and how it directly affects our scalability decisions. We even saw some great techniques for analyzing this complexity efficiently and correctly so as to make informed decisions in a timely manner. The question arises, however,
Given all that I know about the time and space complexities of two algorithms, how do I choose which one to finally go with? Is there a golden rule?
Given all that I know about the time and space complexities of two algorithms, how do I choose which one to finally go with? Is there a golden rule?
The answer to this question, unfortunately, is No!
The answer to this question, unfortunately, is No!
There’s no golden rule to help you decide which algorithm to go with. It totally depends on a lot of external factors. Let’s try and look at a few of these scenarios that you might find yourself in and also look at the kind of decisions you would want to make.
There's no golden rule to help you decide which algorithm to go with. It totally depends on a lot of external factors. Let's try and look at a few of these scenarios that you might find yourself in and also look at the kind of decisions you would want to make.
No constraint on the space! (No constraint on the space!)
Well, if you have two algorithms A and B and you want to decide which one to use, apart from the time complexity, the space complexity also becomes an important factor.
Well, if you have two algorithms A and B and you want to decide which one to use, apart from the time complexity, the space complexity also becomes an important factor.
However, given that space is not an issue that you are concerned with, it’s best to go with the algorithm that has the capability to reduce the time complexity even further given more space.
However, given that space is not an issue that you are concerned with, it's best to go with the algorithm that has the capability to reduce the time complexity even further given more space.
For example, Counting Sort is a linear time sorting algorithm but it’s heavily dependent upon the amount of space available. Precisely, the range of numbers that it can deal with depends on the amount of space available. Given unlimited space, you’re better off using the counting sort algorithm for sorting a huge range of numbers.
For example, Counting Sort is a linear time sorting algorithm but it's heavily dependent upon the amount of space available. Precisely, the range of numbers that it can deal with depends on the amount of space available. Given unlimited space, you're better off using the counting sort algorithm for sorting a huge range of numbers.
Sub-second latency requirement and limited space available (Sub-second latency requirement and limited space available)
If you find yourself in such a scenario, then it becomes really important to deeply understand the performance of the algorithm on a lot of varying inputs especially the kind of inputs you expect the algorithm to work with in your application.
If you find yourself in such a scenario, then it becomes really important to deeply understand the performance of the algorithm on a lot of varying inputs especially the kind of inputs you expect the algorithm to work with in your application.
For example, we have two sorting algorithms: Bubble sort and Insertion sort, and you want to decide amongst them which one to use for sorting a list of users based on their age. You analyzed the kind of input expected and you found the input array to be almost sorted. In such a scenario, it’s best to use Insertion sort over Bubble sort due to its inherent ability to deal amazingly well with almost sorted inputs.
For example, we have two sorting algorithms: Bubble sort and Insertion sort, and you want to decide amongst them which one to use for sorting a list of users based on their age. You analyzed the kind of input expected and you found the input array to be almost sorted . In such a scenario, it's best to use Insertion sort over Bubble sort due to its inherent ability to deal amazingly well with almost sorted inputs.
Wait, why would anyone use Bubble or Insertion sort in real world scenarios? (Wait, why would anyone use Bubble or Insertion sort in real world scenarios?)
If you think that these algorithms are just for educational purposes and are not used in any real world scenarios, you’re not alone! However, this couldn’t be further away from truth. I’m sure you’ve all used the sort()
functionality in Python sometime in your career.
If you think that these algorithms are just for educational purposes and are not used in any real world scenarios, you're not alone! However, this couldn't be further away from truth. I'm sure you've all used the sort()
functionality in Python sometime in your career.
Well, if you’ve used it and marveled at its performance, you’ve used a hybrid algorithm based on Insertion Sort and Merge Sort called the Tim Sort algorithm. To read more about it, head over here:
Well, if you've used it and marveled at its performance, you've used a hybrid algorithm based on Insertion Sort and Merge Sort called the Tim Sort algorithm. To read more about it, head over here:
Timsort — the fastest sorting algorithm you’ve never heard ofTimsort: A very fast , O(n log n), stable sorting algorithm built for the real world — not constructed in academia…skerritt.blog
Timsort — the fastest sorting algorithm you've never heard of Timsort: A very fast , O(n log n), stable sorting algorithm built for the real world — not constructed in academia… skerritt.blog
It’s true that insertion sort might not be useful for very large inputs as we’ve al seen from its polynomial time complexity. However, it’s inherent ability to quickly sort almost sorted range of numbers is what makes it so special and that’s precisely the reason it’s used in the Timsort algorithm.
It's true that insertion sort might not be useful for very large inputs as we've al seen from its polynomial time complexity. However, it's inherent ability to quickly sort almost sorted range of numbers is what makes it so special and that's precisely the reason it's used in the Timsort algorithm.
In short, you won’t ever have a clear black and white division between the algorithms you are struggling to choose from. You have to analyze all the properties of the algorithms, including their time and space complexity. You have to consider the size of inputs you are expecting your algorithm to work with and any other constraints that might exist. Considering all these factors, you have to make an informed decision!
In short, you won't ever have a clear black and white division between the algorithms you are struggling to choose from. You have to analyze all the properties of the algorithms, including their time and space complexity. You have to consider the size of inputs you are expecting your algorithm to work with and any other constraints that might exist. Considering all these factors, you have to make an informed decision!
If you had a fun time understanding the intricacies of complexity analysis and also playing around with our friend Pikachu, do remember to destroy that like button and spread some love. ❣️
If you had a fun time understanding the intricacies of complexity analysis and also playing around with our friend Pikachu, do remember to destroy that like button and spread some love. ❣️
If you want more programming problems with detailed complexity analysis, head over to our kitchen! ?
If you want more programming problems with detailed complexity analysis, head over to our kitchen ! ?
Analyzing an algorithm is an important part of any developer’s skill set and if you feel there are other’s who might benefit from this article, then do share it as much as possible!
Analyzing an algorithm is an important part of any developer's skill set and if you feel there are other's who might benefit from this article, then do share it as much as possible!
算法渐近性质分析