《每日论文》Problem Solving with Algorithms and Data Structures using Python.（第三章）

最新推荐文章于 2021-02-06 21:27:31 发布

Vicky__3021

最新推荐文章于 2021-02-06 21:27:31 发布

阅读量400

点赞数 1

分类专栏：每日论文文章标签：算法 python 数据结构

本文链接：https://blog.csdn.net/qq_51771374/article/details/112209254

版权

每日论文专栏收录该内容

12 篇文章 5 订阅

订阅专栏

Problem Solving with Algorithms and Data Structures using Python.使用Python解决算法和数据结构的问题。

在这里插入图片描述
By Brad Miller and David Ranum, Luther College

3. Analysis

3. 分析

3.1. Objectives

3.1. 目标

To understand why algorithm analysis is important.
理解为什么算法分析很重要。

To be able to use “Big-O” to describe execution time.
能够使用“Big-O”来描述执行时间。

To understand the “Big-O” execution time of common operations on Python lists and dictionaries.
了解Python列表和字典上常见操作的“Big-O”执行时间。

To understand how the implementation of Python data impacts algorithm analysis.
了解Python数据的实现如何影响算法分析。

To understand how to benchmark simple Python programs.
了解如何对简单的Python程序进行基准测试。

3.2. What Is Algorithm Analysis?

3.2. 什么是算法分析?

It is very common for beginning computer science students to compare their programs with one another.
对于计算机科学的初学者来说，相互比较他们的程序是很常见的。
You may also have noticed that it is common for computer programs to look very similar, especially the simple ones.
您可能还注意到，计算机程序看起来非常相似是很常见的，特别是那些简单的程序。
An interesting question often arises.
一个有趣的问题经常会出现。
When two programs solve the same problem but look different, is one program better than the other?
当两个程序解决相同的问题，但看起来不同时，一个程序比另一个更好吗?

In order to answer this question, we need to remember that there is an important difference between a program and the underlying algorithm that the program is representing.
为了回答这个问题，我们需要记住，在程序和程序所代表的底层算法之间有一个重要的区别。
As we stated in Chapter 1, an algorithm is a generic, step-by-step list of instructions for solving a problem.
正如我们在第一章中所述，算法是解决问题的通用的、一步一步的指令列表。
It is a method for solving any instance of the problem such that given a particular input, the algorithm produces the desired result.
它是一种解决问题的方法，在给定一个特定的输入时，算法会产生期望的结果。
A program, on the other hand, is an algorithm that has been encoded into some programming language.
另一方面，程序是一种被编码到某种程序设计语言中的算法。
There may be many programs for the same algorithm, depending on the programmer and the programming language being used.
可能有许多程序适用于相同的算法，这取决于程序员和所使用的编程语言。

To explore this difference further, consider the function shown in ActiveCode 1.
为了进一步研究这种差异，请考虑ActiveCode 1中显示的函数。
This function solves a familiar problem, computing the sum of the first n integers.
这个函数解决了一个常见的问题，计算前n个整数的和。
The algorithm uses the idea of an accumulator variable that is initialized to 0.
该算法使用了初始化为0的累加器变量的思想。
The solution then iterates through the n integers, adding each to the accumulator.
然后，该解决方案遍历n个整数，并将每个加到累加器中。

def sumOfN(n):
   theSum = 0
   for i in range(1,n+1):
       theSum = theSum + i

   return theSum

print(sumOfN(10))
# 输出结果如下
55

Now look at the function in ActiveCode 2.
现在看看ActiveCode 2中的函数。
At first glance it may look strange, but upon further inspection you can see that this function is essentially doing the same thing as the previous one.
乍一看，它可能看起来很奇怪，但进一步检查就会发现，这个函数本质上与前一个函数做的事情是相同的。
The reason this is not obvious is poor coding.
这一点不明显的原因是糟糕的编码。
We did not use good identifier names to assist with readability, and we used an extra assignment statement during the accumulation step that was not really necessary.
我们没有使用好的标识符名称来帮助提高可读性，并且在积累步骤中使用了额外的赋值语句，这实际上是不必要的。

def foo(tom):
    fred = 0
    for bill in range(1,tom+1):
       barney = bill
       fred = fred + barney

    return fred

print(foo(10))
# 输出结果如下
55

The question we raised earlier asked whether one function is better than another.
我们之前提出的问题是一个函数是否比另一个更好。
The answer depends on your criteria.
答案取决于你的标准。
The function sumOfN is certainly better than the function foo if you are concerned with readability.
如果考虑到可读性，函数sumOfN肯定比函数foo好。
In fact, you have probably seen many examples of this in your introductory programming course since one of the goals there is to help you write programs that are easy to read and easy to understand.
事实上，你可能在你的入门编程课程中见过很多这样的例子，因为它的目标之一是帮助你编写易于阅读和理解的程序。
In this course, however, we are also interested in characterizing the algorithm itself.
然而，在本课程中，我们也对算法本身的特性感兴趣。
(We certainly hope that you will continue to strive to write readable, understandable code.)
(我们当然希望您继续努力编写可读、可理解的代码。)

Algorithm analysis is concerned with comparing algorithms based upon the amount of computing resources that each algorithm uses.
算法分析关注的是基于每个算法使用的计算资源数量来比较算法。
We want to be able to consider two algorithms and say that one is better than the other because it is more efficient in its use of those resources or perhaps because it simply uses fewer.
我们希望能够考虑两种算法，并认为一种比另一种更好，因为它在使用这些资源方面更有效率，或者可能只是因为它使用的更少。
From this perspective, the two functions above seem very similar.
从这个角度来看，上面的两个函数看起来非常相似。
They both use essentially the same algorithm to solve the summation problem.
它们都使用了本质上相同的算法来解决求和问题。

At this point, it is important to think more about what we really mean by computing resources.
在这一点上，重要的是要更多地思考我们所说的计算资源的真正含义。
There are two different ways to look at this.
有两种不同的方法来看待这个问题。
One way is to consider the amount of space or memory an algorithm requires to solve the problem.
一种方法是考虑解决问题所需的算法空间或内存。
The amount of space required by a problem solution is typically dictated by the problem instance itself.
问题解决方案所需的空间量通常由问题实例本身决定。
Every so often, however, there are algorithms that have very specific space requirements, and in those cases we will be very careful to explain the variations.
然而，有时会有一些算法有非常特定的空间要求，在这些情况下，我们会非常小心地解释这些变化。

As an alternative to space requirements, we can analyze and compare algorithms based on the amount of time they require to execute.
作为空间需求的替代方案，我们可以根据执行算法所需的时间量来分析和比较算法。
This measure is sometimes referred to as the “execution time” or “running time” of the algorithm.
这种度量有时被称为算法的“执行时间”或“运行时间”。
One way we can measure the execution time for the function sumOfN is to do a benchmark analysis.
测量函数sumonfn执行时间的一种方法是进行基准分析。
This means that we will track the actual time required for the program to compute its result.
这意味着我们将跟踪程序计算结果所需的实际时间。
In Python, we can benchmark a function by noting the starting time and ending time with respect to the system we are using.
在Python中，我们可以根据所使用的系统记录函数的开始时间和结束时间，从而对函数进行基准测试。
In the time module there is a function called time that will return the current system clock time in seconds since some arbitrary starting point.
在time模块中有一个名为time的函数，它将返回从某个任意起始点开始的以秒为单位的当前系统时钟时间。
By calling this function twice, at the beginning and at the end, and then computing the difference, we can get an exact number of seconds (fractions in most cases) for execution.
通过在开始和结束两次调用这个函数，然后计算差异，我们可以得到执行的精确秒数(在大多数情况下是小数)。

Listing 1
清单 1

import time

def sumOfN2(n):
   start = time.time()

   theSum = 0
   for i in range(1,n+1):
      theSum = theSum + i

   end = time.time()

   return theSum,end-start

Listing 1 shows the original sumOfN function with the timing calls embedded before and after the summation.
清单1显示了原始的sumonfn函数，在求和之前和之后都嵌入了计时调用。
The function returns a tuple consisting of the result and the amount of time (in seconds) required for the calculation.
该函数返回一个由结果和计算所需的时间(以秒为单位)组成的元组。
If we perform 5 invocations of the function, each computing the sum of the first 10,000 integers, we get the following:
如果我们执行5次函数调用，每次计算前10,000个整数的和，我们得到以下结果:

>>>for i in range(5):
       print("Sum is %d required %10.7f seconds"%sumOfN(10000))
Sum is 50005000 required  0.0018950 seconds
Sum is 50005000 required  0.0018620 seconds
Sum is 50005000 required  0.0019171 seconds
Sum is 50005000 required  0.0019162 seconds
Sum is 50005000 required  0.0019360 seconds

We discover that the time is fairly consistent and it takes on average about 0.0019 seconds to execute that code.
我们发现时间相当一致，执行该代码平均大约需要0.0019秒。
What if we run the function adding the first 100,000 integers?
如果我们运行添加前100,000个整数的函数会怎样?

>>>for i in range(5):
       print("Sum is %d required %10.7f seconds"%sumOfN(100000))
Sum is 5000050000 required  0.0199420 seconds
Sum is 5000050000 required  0.0180972 seconds
Sum is 5000050000 required  0.0194821 seconds
Sum is 5000050000 required  0.0178988 seconds
Sum is 5000050000 required  0.0188949 seconds
>>>

Again, the time required for each run, although longer, is very consistent, averaging about 10 times more seconds.
同样，每次运行所需的时间虽然更长，但非常稳定，平均大约多10秒。
For n equal to 1,000,000 we get:
当n = 1,000,000时，我们得到:

>>>for i in range(5):
       print("Sum is %d required %10.7f seconds"%sumOfN(1000000))
Sum is 500000500000 required  0.1948988 seconds
Sum is 500000500000 required  0.1850290 seconds
Sum is 500000500000 required  0.1809771 seconds
Sum is 500000500000 required  0.1729250 seconds
Sum is 500000500000 required  0.1646299 seconds
>>>

In this case, the average again turns out to be about 10 times the previous.
在这种情况下，平均值仍然是之前的10倍左右。
Now consider ActiveCode 3, which shows a different means of solving the summation problem.
现在考虑ActiveCode 3，它展示了解决求和问题的另一种方法。
This function, sumOfN3, takes advantage of a closed equation
这个函数，sumOfN3，利用了一个封闭方程 ∑ni=1i=(n)(n+1)2
to compute the sum of the first n integers without iterating.
不经过迭代计算前n个整数的和。

def sumOfN3(n):
   return (n*(n+1))/2

print(sumOfN3(10))
# 运行结果如下
55.0

If we do the same benchmark measurement for sumOfN3, using five different values for n (10,000, 100,000, 1,000,000, 10,000,000, and 100,000,000), we get the following results:
如果我们对sumOfN3进行相同的基准测量，对n使用5个不同的值(10,000、100,000、1,000,000、10,000,000和100,000,000)，我们得到以下结果:

Sum is 50005000 required 0.00000095 seconds
Sum is 5000050000 required 0.00000191 seconds
Sum is 500000500000 required 0.00000095 seconds
Sum is 50000005000000 required 0.00000095 seconds
Sum is 5000000050000000 required 0.00000119 seconds

There are two important things to notice about this output.
关于这个输出有两点需要注意。
First, the times recorded above are shorter than any of the previous examples.
首先，上面记录的时间比前面任何一个例子都短。
Second, they are very consistent no matter what the value of n.
第二，不管n的值是多少，它们都是一致的。
It appears that sumOfN3 is hardly impacted by the number of integers being added.
看起来sumOfN3几乎不受所加整数的数量的影响。

But what does this benchmark really tell us?
但是这个基准到底告诉了我们什么呢?
Intuitively, we can see that the iterative solutions seem to be doing more work since some program steps are being repeated.
直观地，我们可以看到迭代解决方案似乎做了更多的工作，因为一些程序步骤被重复。
This is likely the reason it is taking longer.
这可能是它需要更长时间的原因。
Also, the time required for the iterative solution seems to increase as we increase the value of n.
而且，随着n的增加，迭代解所需的时间似乎也在增加。
However, there is a problem.
然而，这里有一个问题。
If we ran the same function on a different computer or used a different programming language, we would likely get different results.
如果我们在不同的计算机上运行相同的函数或使用不同的编程语言，我们可能会得到不同的结果。
It could take even longer to perform sumOfN3 if the computer were older.
如果计算机是旧的，执行sumOfN3可能需要更长的时间。

We need a better way to characterize these algorithms with respect to execution time.
我们需要一种更好的方法来描述这些算法的执行时间。
The benchmark technique computes the actual time to execute.
基准测试技术计算实际执行的时间。
It does not really provide us with a useful measurement, because it is dependent on a particular machine, program, time of day, compiler, and programming language.
它并没有真正为我们提供有用的度量，因为它依赖于特定的机器、程序、时间、编译器和编程语言。
Instead, we would like to have a characterization that is independent of the program or computer being used.
相反，我们希望有一个独立于正在使用的程序或计算机的特性。
This measure would then be useful for judging the algorithm alone and could be used to compare algorithms across implementations.
这种度量方法可以单独用于判断算法，也可以用来比较不同实现之间的算法。

3.3. Big-O Notation

3.3. Big-O记号

When trying to characterize an algorithm’s efficiency in terms of execution time, independent of any particular program or computer, it is important to quantify the number of operations or steps that the algorithm will require.
当试图从执行时间的角度来描述一个算法的效率时，独立于任何特定的程序或计算机，重要的是量化算法将需要的操作或步骤的数量。
If each of these steps is considered to be a basic unit of computation, then the execution time for an algorithm can be expressed as the number of steps required to solve the problem.
如果这些步骤中的每一步都被认为是一个基本的计算单元，那么算法的执行时间就可以表示为解决问题所需的步骤数。
Deciding on an appropriate basic unit of computation can be a complicated problem and will depend on how the algorithm is implemented.
确定一个合适的基本计算单元是一个复杂的问题，这取决于算法是如何实现的。

A good basic unit of computation for comparing the summation algorithms shown earlier might be to count the number of assignment statements performed to compute the sum.
比较前面显示的求和算法的一个很好的基本计算单元可能是计算为计算求和而执行的赋值语句的数量。
In the function sumOfN, the number of assignment statements is 1 (theSum=0) plus the value of n (the number of times we perform theSum=theSum+i).
在函数sumOfN中，赋值语句的数量是1 (sum =0)加上n的值(执行sum的次数= sum +i)。
We can denote this by a function, call it T, where T(n)=1+n.
我们可以用函数T来表示，T(n)=1+n。
The parameter n is often referred to as the “size of the problem,” and we can read this as “T(n) is the time it takes to solve a problem of size n, namely 1+n steps.”
参数n通常被称为“问题的规模”，我们可以将其理解为“T(n)是解决规模为n的问题所需要的时间，即1+n步。”

In the summation functions given above, it makes sense to use the number of terms in the summation to denote the size of the problem.
在上面给出的求和函数中，用求和项的个数来表示问题的大小是有意义的。
We can then say that the sum of the first 100,000 integers is a bigger instance of the summation problem than the sum of the first 1,000.
然后我们可以说，前10万个整数的和是求和问题的一个更大的实例，而不是前1000个整数的和。
Because of this, it might seem reasonable that the time required to solve the larger case would be greater than for the smaller case.
因此，解决较大案件所需的时间可能比解决较小案件所需的时间大，这似乎是合理的。
Our goal then is to show how the algorithm’s execution time changes with respect to the size of the problem.
我们的目标是展示算法的执行时间如何随问题的大小而变化。

Computer scientists prefer to take this analysis technique one step further.
计算机科学家更愿意将这种分析技术更进一步。
It turns out that the exact number of operations is not as important as determining the most dominant part of the T(n) function.
结果是，精确的运算次数并不像确定T(n)函数的最主要部分那么重要。
In other words, as the problem gets larger, some portion of the T(n) function tends to overpower the rest.
换句话说，当问题变得更大时，T(n)函数的某些部分往往会压倒其他部分。
This dominant term is what, in the end, is used for comparison.
这个占主导地位的术语最终是用来进行比较的。
The order of magnitude function describes the part of T(n) that increases the fastest as the value of n increases.
数量级函数描述了T(n)中随n值增加而增加最快的部分。
Order of magnitude is often called Big-O notation (for “order”) and written as O(f(n)).
数量级通常被称为大O符号(表示“顺序”)，写成O(f(n))。
It provides a useful approximation to the actual number of steps in the computation.
它为计算中的实际步骤数提供了一个有用的近似值。
The function f(n) provides a simple representation of the dominant part of the original T(n).
函数f(n)提供了原始T(n)的主导部分的简单表示。

In the above example, T(n)=1+n.
在上面的例子中，T(n)=1+n。
As n gets large, the constant 1 will become less and less significant to the final result.
随着n的增大，常数1对最终结果的影响将变得越来越小。
If we are looking for an approximation for T(n), then we can drop the 1 and simply say that the running time is O(n).
如果我们寻找T(n)的近似值，那么我们可以去掉1，简单地说，运行时间是O(n)
It is important to note that the 1 is certainly significant for T(n).
值得注意的是，1对于T(n)是很重要的。
However, as n gets large, our approximation will be just as accurate without it.
然而，随着n变大，我们的近似在没有n的情况下也同样准确。

As another example, suppose that for some algorithm, the exact number of steps is T(n)=5n²+27n+1005.
再举一个例子，假设对于某些算法，准确的步数是T(n)=5n²+27n+1005。
When n is small, say 1 or 2, the constant 1005 seems to be the dominant part of the function.
当n很小时，比如1或2，常量1005似乎是函数的主要部分。
However, as n gets larger, the n² term becomes the most important.
然而，随着n变大，n²项变得最重要。
In fact, when n is really large, the other two terms become insignificant in the role that they play in determining the final result.
事实上，当n非常大的时候，其他两项在决定最终结果的作用中就变得不重要了。
Again, to approximate T(n) as n gets large, we can ignore the other terms and focus on 5n².
再一次，为了在n变大时近似T(n)，我们可以忽略其他项，专注于5n²。
In addition, the coefficient 5 becomes insignificant as n gets large.
此外，随着n的增大，系数5变得不重要。
We would say then that the function T(n) has an order of magnitude f(n)=n², or simply that it is O(n²).
我们可以说，函数T(n)的数量级是f(n)=n²，或者简单地说，它是O(n²)

Although we do not see this in the summation example, sometimes the performance of an algorithm depends on the exact values of the data rather than simply the size of the problem.
虽然我们在求和的例子中没有看到这一点，但有时算法的性能取决于数据的准确值，而不仅仅是问题的大小。
For these kinds of algorithms we need to characterize their performance in terms of best case, worst case, or average case performance.
对于这类算法，我们需要根据最佳情况、最差情况或平均情况来描述它们的性能。
The worst case performance refers to a particular data set where the algorithm performs especially poorly.
最坏情况下的性能是指算法性能特别差的特定数据集。
Whereas a different data set for the exact same algorithm might have extraordinarily good performance.
然而，对于完全相同的算法，不同的数据集可能会有非常好的性能。
However, in most cases the algorithm performs somewhere in between these two extremes (average case).
然而，在大多数情况下，算法的执行介于这两个极端(平均情况)之间。
It is important for a computer scientist to understand these distinctions so they are not misled by one particular case.
对于计算机科学家来说，理解这些区别是很重要的，这样他们就不会被一个特定的情况所误导。

A number of very common order of magnitude functions will come up over and over as you study algorithms.
在你学习算法的过程中，会不断出现一些非常常见的数量级函数。
These are shown in Table 1.
如表1所示。
In order to decide which of these functions is the dominant part of any T(n) function, we must see how they compare with one another as n gets large.
为了确定哪个函数是T(n)函数的主导部分，我们必须看看当n变大时它们是如何相互比较的。

Table 1: Common Functions for Big-O
表1:Big-O的常见功能

f(n)	Name 名字
1	Constant 常数
log n	Logarithmic 对数
n	Linear 线性方程
nlog n	Log Linear 对数线性
n²	Quadratic 二次方程
n³	Cubic 立方
2ⁿ	Exponential 指数方程

Figure 1 shows graphs of the common functions from Table 1.
图1显示了表1中常见函数的图表。
Notice that when n is small, the functions are not very well defined with respect to one another.
注意，当n很小时，函数之间的定义不是很好。
It is hard to tell which is dominant.
很难说哪一个占主导地位。
However, as n grows, there is a definite relationship and it is easy to see how they compare with one another.
然而，随着n的增长，有一个明确的关系，很容易看出它们是如何相互比较的。

在这里插入图片描述
Figure 1: Plot of Common Big-O Functions
图1:常见的Big-O函数曲线图

As a final example, suppose that we have the fragment of Python code shown in Listing 2.
作为最后一个例子，假设我们有清单2所示的Python代码片段。
Although this program does not really do anything, it is instructive to see how we can take actual code and analyze performance.
尽管这个程序实际上没有做任何事情，但看看我们如何获取实际代码并分析性能是有意义的。

Listing 2
清单2

a=5
b=6
c=10
for i in range(n):
   for j in range(n):
      x = i * i
      y = j * j
      z = i * j
for k in range(n):
   w = a*k + 45
   v = b*b
d = 33

The number of assignment operations is the sum of four terms.
赋值操作的次数是四项之和。
The first term is the constant 3, representing the three assignment statements at the start of the fragment.
第一项是常量3，表示片段开头的三个赋值语句。
The second term is 3n², since there are three statements that are performed n² times due to the nested iteration.
第二项是3n²，因为由于嵌套迭代，有三条语句执行了n²次。
The third term is 2n, two statements iterated n times.
第三项是2n，两个语句迭代n次。
Finally, the fourth term is the constant 1, representing the final assignment statement.
最后，第四项是常数1，表示最终的赋值语句。
This gives us T(n)=3+3n²+2n+1=3n²+2n+4.
得到T(n)=3+3n²+2n+1=3n²+2n+4。
By looking at the exponents, we can easily see that the n² term will be dominant and therefore this fragment of code is O(n²).
通过查看指数，我们可以很容易地看到n²项占主导地位，因此这段代码是O(n²)。
Note that all of the other terms as well as the coefficient on the dominant term can be ignored as n grows larger.
注意，随着n的增大，所有其他项以及占主导的项的系数都可以忽略。

在这里插入图片描述
Figure 2: Comparing T(n) with Common Big-O Functions
图2:比较T(n)与常见的Big-O函数

Figure 2 shows a few of the common Big-O functions as they compare with the T(n) function discussed above.
图2显示了一些常见的Big-O函数，并将它们与上面讨论的T(n)函数进行了比较。
Note that T(n) is initially larger than the cubic function.
注意，T(n)一开始比三次函数大。
However, as n grows, the cubic function quickly overtakes T(n).
然而，随着n的增长，三次函数很快就会超过T(n)。
It is easy to see that T(n) then follows the quadratic function as n continues to grow.
很容易看出，当n继续增长时，T(n)遵循二次函数。

Self Check
自我检验
Write two Python functions to find the minimum number in a list.
编写两个Python函数来查找列表中的最小值。
The first function should compare each number to every other number on the list. O(n²).
第一个函数应该将每个数字与列表中的其他数字进行比较。O (n²)。
The second function should be linear O(n).
第二个函数是线性O(n)

3.4. An Anagram Detection Example

3.4. 字谜检测示例

A good example problem for showing algorithms with different orders of magnitude is the classic anagram detection problem for strings.
显示不同数量级算法的一个很好的例子是字符串的经典字谜检测问题。
One string is an anagram of another if the second is simply a rearrangement of the first.
如果第二个字符串只是第一个字符串的重排，那么一个字符串就是另一个字符串的变位词。
For example, ‘heart’ and ‘earth’ are anagrams.
例如，“heart”和“earth”是字谜。
The strings ‘python’ and ‘typhon’ are anagrams as well.
字符串’python’和’typhon’也是字谜。
For the sake of simplicity, we will assume that the two strings in question are of equal length and that they are made up of symbols from the set of 26 lowercase alphabetic characters.
为了简单起见，我们将假定所讨论的两个字符串长度相等，并且它们由26个小写字母字符集合中的符号组成。
Our goal is to write a boolean function that will take two strings and return whether they are anagrams.
我们的目标是编写一个布尔函数，该函数接受两个字符串并返回它们是否为字谜。

3.4.1. Solution 1: Checking Off

3.4.1. 解决方案一:核查

Our first solution to the anagram problem will check the lengths of the strings and then to see that each character in the first string actually occurs in the second.
字谜问题的第一个解决方案将检查字符串的长度，然后查看第一个字符串中的每个字符实际上出现在第二个字符串中。
If it is possible to “checkoff” each character, then the two strings must be anagrams.
如果可以“checkoff”每个字符，那么这两个字符串必须是字谜。
Checking off a character will be accomplished by replacing it with the special Python value None.
通过将字符替换为Python的特殊值None来完成选中。
However, since strings in Python are immutable, the first step in the process will be to convert the second string to a list.
然而，由于Python中的字符串是不可变的，所以这个过程的第一步是将第二个字符串转换为列表。
Each character from the first string can be checked against the characters in the list and if found, checked off by replacement.
第一个字符串中的每个字符都可以根据列表中的字符进行检查，如果找到，则通过替换进行检查。

ActiveCode 1 shows this function.
ActiveCode 1显示了这个函数。

def anagramSolution1(s1,s2):
    stillOK = True
    if len(s1) != len(s2):
        stillOK = False

    alist = list(s2)
    pos1 = 0

    while pos1 < len(s1) and stillOK:
        pos2 = 0
        found = False
        while pos2 < len(alist) and not found:
            if s1[pos1] == alist[pos2]:
                found = True
            else:
                pos2 = pos2 + 1

        if found:
            alist[pos2] = None
        else:
            stillOK = False

        pos1 = pos1 + 1

    return stillOK

print(anagramSolution1('abcd','dcba'))

# 输出结果如下
Ture

To analyze this algorithm, we need to note that each of the n characters in s1 will cause an iteration through up to n characters in the list from s2.
要分析这个算法，我们需要注意s1中的每一个n个字符都将导致从s2开始的列表中迭代到n个字符。
Each of the n positions in the list will be visited once to match a character from s1.
列表中的n个位置都将访问一次，以匹配s1中的一个字符。
The number of visits then becomes the sum of the integers from 1 to n.
访问次数就变成了从1到n的整数的和。
We stated earlier that this can be written as
我们前面说过，这可以写成
在这里插入图片描述
As n gets large, the n² term will dominate the n term and the 1/2 can be ignored.
当n变大时，n²项将主导n项，1/2可以忽略。
Therefore, this solution is O(n²).
因此，这个解是O(n²)

3.4.2. Solution 2: Sort and Compare

3.4.2. 解决方案2:排序和比较

Another solution to the anagram problem will make use of the fact that even though s1 and s2 are different, they are anagrams only if they consist of exactly the same characters.
另一个字谜问题的解决方案是利用这样一个事实:尽管s1和s2是不同的，但只有当它们由完全相同的字符组成时，它们才是字谜。
So, if we begin by sorting each string alphabetically, from a to z, we will end up with the same string if the original two strings are anagrams.
所以，如果我们开始按字母顺序对每个字符串排序，从a到z，我们将得到相同的字符串，如果原来的两个字符串是字谜的话。
ActiveCode 2 shows this solution.
ActiveCode 2显示了这个解决方案。
Again, in Python we can use the built-in sort method on lists by simply converting each string to a list at the start.
同样，在Python中，我们可以对列表使用内置的sort方法，只需在开始时将每个字符串转换为一个列表即可。

def anagramSolution2(s1,s2):
    alist1 = list(s1)
    alist2 = list(s2)

    alist1.sort()
    alist2.sort()

    pos = 0
    matches = True

    while pos < len(s1) and matches:
        if alist1[pos]==alist2[pos]:
            pos = pos + 1
        else:
            matches = False

    return matches

print(anagramSolution2('abcde','edcba'))

# 输出结果如下
True

At first glance you may be tempted to think that this algorithm is O(n), since there is one simple iteration to compare the n characters after the sorting process.
乍一看，你可能会认为这个算法是O(n)，因为在排序过程之后有一个简单的迭代来比较n个字符。
However, the two calls to the Python sort method are not without their own cost.
然而，对Python sort方法的两次调用都有自己的代价。
As we will see in a later chapter, sorting is typically either O(n²) or O(nlogn), so the sorting operations dominate the iteration.
正如我们将在后面的章节中看到的，排序通常是O(n²)或O(nlogn)，所以排序操作主导了迭代。
In the end, this algorithm will have the same order of magnitude as that of the sorting process.
最终，该算法将具有与排序过程相同的数量级。

3.5. Performance of Python Data Structures

3.5. Python数据结构的性能

Now that you have a general idea of Big-O notation and the differences between the different functions, our goal in this section is to tell you about the Big-O performance for the operations on Python lists and dictionaries.
现在您已经对Big-O表示法有了大致的了解，以及不同函数之间的区别，本节的目标是告诉您Python列表和字典上操作的Big-O性能。
We will then show you some timing experiments that illustrate the costs and benefits of using certain operations on each data structure.
然后，我们将向您展示一些计时实验，演示在每个数据结构上使用特定操作的成本和好处。
It is important for you to understand the efficiency of these Python data structures because they are the building blocks we will use as we implement other data structures in the remainder of the book.
理解这些Python数据结构的效率对你来说很重要，因为它们是我们在本书其余部分实现其他数据结构时将使用的构建块。
In this section we are not going to explain why the performance is what it is.
在本节中，我们不打算解释为什么性能是这样的。
In later chapters you will see some possible implementations of both lists and dictionaries and how the performance depends on the implementation.
在后面的章节中，你会看到一些列表和字典的可能实现，以及性能如何依赖于这些实现。

3.6. Lists

3.6. 列表

The designers of Python had many choices to make when they implemented the list data structure.
Python的设计者在实现列表数据结构时有很多选择。
Each of these choices could have an impact on how fast list operations perform.
这些选择中的每一个都可能对列表操作的执行速度产生影响。
To help them make the right choices they looked at the ways that people would most commonly use the list data structure and they optimized their implementation of a list so that the most common operations were very fast.
为了帮助他们做出正确的选择，他们研究了人们最常用的使用列表数据结构的方式，并优化了列表的实现，使最常用的操作非常快。
Of course they also tried to make the less common operations fast, but when a tradeoff had to be made the performance of a less common operation was often sacrificed in favor of the more common operation.
当然，他们也试图让不太常见的操作更快，但当必须做出折衷时，为了更常见的操作，通常会牺牲不太常见操作的性能。

Two common operations are indexing and assigning to an index position.
两个常见的操作是对索引位置进行索引和赋值。
Both of these operations take the same amount of time no matter how large the list becomes.
不管列表变得有多大，这两种操作所花费的时间都是一样的。
When an operation like this is independent of the size of the list they are O(1).
当这样的操作与列表的大小无关时，它们是O(1)。

Another very common programming task is to grow a list.
另一个非常常见的编程任务是增长列表。
There are two ways to create a longer list.
有两种方法可以创建一个更长的列表。
You can use the append method or the concatenation operator.
可以使用append方法或连接操作符。
The append method is O(1).
append方法是O(1)。
However, the concatenation operator is O(k) where k is the size of the list that is being concatenated.
但是，连接操作符是O(k)，其中k是要连接的列表的大小。
This is important for you to know because it can help you make your own programs more efficient by choosing the right tool for the job.
了解这一点很重要，因为它可以帮助您通过选择合适的工具来提高您自己的程序的效率。

Let’s look at four different ways we might generate a list of n numbers starting with 0.
让我们看看生成从0开始的n个数列表的四种不同方法。
First we’ll try a for loop and create the list by concatenation, then we’ll use append rather than concatenation.
首先，我们将尝试使用for循环并通过连接来创建列表，然后我们将使用append而不是连接。
Next, we’ll try creating the list using list comprehension and finally, and perhaps the most obvious way, using the range function wrapped by a call to the list constructor.
接下来，我们将尝试使用列表理解来创建列表，最后，也是最明显的方法，使用由对列表构造函数的调用包装的range函数。
Listing 3 shows the code for making our list four different ways.
清单3显示了用四种不同方式创建列表的代码。

Listing 3
清单3

def test1():
    l = []
    for i in range(1000):
        l = l + [i]

def test2():
    l = []
    for i in range(1000):
        l.append(i)

def test3():
    l = [i for i in range(1000)]

def test4():
    l = list(range(1000))

To capture the time it takes for each of our functions to execute we will use Python’s timeit module.
为了获取每个函数执行所需的时间，我们将使用Python的timeit模块。
The timeit module is designed to allow Python developers to make cross-platform timing measurements by running functions in a consistent environment and using timing mechanisms that are as similar as possible across operating systems.
timeit模块被设计为允许Python开发人员通过在一致的环境中运行函数和使用跨操作系统尽可能相似的计时机制来进行跨平台计时测量。

To use timeit you create a Timer object whose parameters are two Python statements.
要使用’ timeit ‘，你需要创建一个’ Timer ‘对象，它的参数是两个Python语句。
The first parameter is a Python statement that you want to time; the second parameter is a statement that will run once to set up the test.
第一个参数是一个你想计时的Python语句;第二个参数是一条语句，它将运行一次以设置测试。
The timeit module will then time how long it takes to execute the statement some number of times.
然后，’ timeit ‘模块将计算语句执行几次所需的时间。
By default timeit will try to run the statement one million times.
默认情况下，’ timeit '将尝试运行语句一百万次。
When its done it returns the time as a floating point value representing the total number of seconds.
当它完成时，它返回时间作为浮点值，表示总秒数。
However, since it executes the statement a million times you can read the result as the number of microseconds to execute the test one time.
但是，由于它执行语句一百万次，您可以将结果作为执行一次测试的微秒数来读取。
You can also pass timeit a named parameter called number that allows you to specify how many times the test statement is executed.
您还可以将名为“number”的命名参数传递给“timeit”，它允许您指定执行测试语句的次数。
The following session shows how long it takes to run each of our test functions 1000 times.
下面的会话显示了运行每个测试函数1000次所需的时间。

t1 = Timer("test1()", "from __main__ import test1")
print("concat ",t1.timeit(number=1000), "milliseconds")
t2 = Timer("test2()", "from __main__ import test2")
print("append ",t2.timeit(number=1000), "milliseconds")
t3 = Timer("test3()", "from __main__ import test3")
print("comprehension ",t3.timeit(number=1000), "milliseconds")
t4 = Timer("test4()", "from __main__ import test4")
print("list range ",t4.timeit(number=1000), "milliseconds")

concat  6.54352807999 milliseconds
append  0.306292057037 milliseconds
comprehension  0.147661924362 milliseconds
list range  0.0655000209808 milliseconds

In the experiment above the statement that we are timing is the function call to test1(), test2(), and so on.
在上面的实验中，我们正在计时的语句是对’ test1() ‘、’ test2() ‘等函数的调用。
The setup statement may look very strange to you, so let’s consider it in more detail.
设置语句对您来说可能看起来很奇怪，所以让我们更详细地考虑一下。
You are probably very familiar with the from, import statement, but this is usually used at the beginning of a Python program file.
你可能非常熟悉’ from '， ’ import ‘语句，但这通常用于Python程序文件的开头。
In this case the statement from __main__ import test1 imports the function test1 from the __main__ namespace into the namespace that timeit sets up for the timing experiment.
在这个例子中，语句’ from main import test1 ‘将函数’ test1 ‘从’ main ‘命名空间导入到’ timeit '为计时实验设置的命名空间中。
The timeit module does this because it wants to run the timing tests in an environment that is uncluttered by any stray variables you may have created, that may interfere with your function’s performance in some unforeseen way.
’ timeit '模块这样做是因为它希望在一个环境中运行计时测试，该环境不受您可能创建的任何游离变量的干扰，这些变量可能会以某种不可预见的方式干扰您的函数的性能。

From the experiment above it is clear that the append operation at 0.30 milliseconds is much faster than concatenation at 6.54 milliseconds.
从上面的实验中可以清楚地看出，0.30毫秒的追加操作比6.54毫秒的连接操作快得多。
In the above experiment we also show the times for two additional methods for creating a list; using the list constructor with a call to range and a list comprehension.
在上面的实验中，我们还展示了创建列表的另外两种方法的时间;使用列表构造函数调用“range”和列表推导式。
It is interesting to note that the list comprehension is twice as fast as a for loop with an append operation.
有趣的是，列表推导的速度是带有’ append ‘操作的’ for '循环的两倍。

One final observation about this little experiment is that all of the times that you see above include some overhead for actually calling the test function, but we can assume that the function call overhead is identical in all four cases so we still get a meaningful comparison of the operations.
最后一个观察这个小实验是所有时代的上面你看到包括一些开销实际上调用测试函数,但我们可以假定函数调用的开销是相同的所有四个案例中我们仍然得到一个有意义的比较操作。
So it would not be accurate to say that the concatenation operation takes 6.54 milliseconds but rather the concatenation test function takes 6.54 milliseconds.
所以说连接操作需要6.54毫秒，而连接测试函数需要6.54毫秒是不准确的。
As an exercise you could test the time it takes to call an empty function and subtract that from the numbers above.
作为练习，您可以测试调用一个空函数并从上面的数字中减去它所需的时间。

Now that we have seen how performance can be measured concretely you can look at Table 2 to see the Big-O efficiency of all the basic list operations.
现在我们已经看到了如何具体地度量性能，您可以查看表2来查看所有基本列表操作的Big-O效率。
After thinking carefully about Table 2, you may be wondering about the two different times for pop.
仔细考虑过表2后，你可能会想知道“pop”有两个不同的时间。
When pop is called on the end of the list it takes O(1) but when pop is called on the first element in the list or anywhere in the middle it is O(n).
当在列表末尾调用’ pop '时，它接受O(1)，但当在列表的第一个元素或中间任何位置调用pop时，它接受O(n)。
The reason for this lies in how Python chooses to implement lists.
其原因在于Python选择如何实现列表。
When an item is taken from the front of the list, in Python’s implementation, all the other elements in the list are shifted one position closer to the beginning.
在Python的实现中，当一个元素从列表的最前面取出时，列表中的所有其他元素都会向最开始的位置移动一个位置。
This may seem silly to you now, but if you look at Table 2 you will see that this implementation also allows the index operation to be O(1).
现在看来，这可能有点傻，但是如果您看一下表2，就会发现这个实现还允许索引操作为O(1)。
This is a tradeoff that the Python implementors thought was a good one.
Python实现者认为这是一个很好的折衷。

Table 2: Big-O Efficiency of Python List Operators
表2:Python列表操作符的Big-O效率

Operation 操作	Big-O Efficiency Big-O效率
index []	O(1)
index assignment	O(1)
append	O(1)
pop()	O(1)
pop()	O(n)
insert(i,item)	O(n)
del operator	O(n)
iteration	O(n)
contains (in)	O(n)
get slice [x:y]	O(k)
del slice	O(n)
set slice	O(n+k)
reverse	O(n)
concatenate	O(k)
sort	O(n log n)
multiply	O(nk)

As a way of demonstrating this difference in performance let’s do another experiment using the timeit module.
为了演示这种性能差异，让我们使用’ timeit '模块做另一个实验。
Our goal is to be able to verify the performance of the pop operation on a list of a known size when the program pops from the end of the list, and again when the program pops from the beginning of the list.
我们的目标是，当程序从列表的末尾弹出时，以及当程序从列表的开头弹出时，能够验证在已知大小的列表上的“pop”操作的性能。
We will also want to measure this time for lists of different sizes.
我们还想测量不同大小列表的时间。
What we would expect to see is that the time required to pop from the end of the list will stay constant even as the list grows in size, while the time to pop from the beginning of the list will continue to increase as the list grows.
我们期望看到的是，从列表的末尾弹出所需的时间将保持不变，即使列表的大小在增长，而从列表的开头弹出所需的时间将继续随着列表的增长而增长。

Listing 4 shows one attempt to measure the difference between the two uses of pop.
清单4显示了测量pop两种用法之间的差异的一个尝试。
As you can see from this first example, popping from the end takes 0.0003 milliseconds, whereas popping from the beginning takes 4.82 milliseconds.
从第一个示例中可以看到，从末尾弹出需要0.0003毫秒，而从开始弹出需要4.82毫秒。
For a list of two million elements this is a factor of 16,000.
对于一个有200万个元素的列表来说，这是16000的倍数。

There are a couple of things to notice about Listing 4.
关于清单4，有两点需要注意。
The first is the statement from __main__ import x.
第一个是’ from __ main__ import x '语句。
Although we did not define a function we do want to be able to use the list object x in our test.
虽然我们没有定义一个函数，但我们确实希望能够在测试中使用列表对象x。
This approach allows us to time just the single pop statement and get the most accurate measure of the time for that single operation.
这种方法允许我们只计算单个“pop”语句的时间，并为单个操作获得最精确的时间度量。
Because the timer repeats 1000 times it is also important to point out that the list is decreasing in size by 1 each time through the loop.
因为计时器重复了1000次，所以需要指出的是，在整个循环中，列表的大小每次减少1。
But since the initial list is two million elements in size we only reduce the overall size by 0.05%
但是由于初始列表的大小是200万个元素，所以我们只减少了0.05%的总大小

Listing 4
清单4

popzero = timeit.Timer("x.pop(0)",
                       "from __main__ import x")
popend = timeit.Timer("x.pop()",
                      "from __main__ import x")

x = list(range(2000000))
popzero.timeit(number=1000)
4.8213560581207275

x = list(range(2000000))
popend.timeit(number=1000)
0.0003161430358886719

While our first test does show that pop(0) is indeed slower than pop(), it does not validate the claim that pop(0) is O(n) while pop() is O(1).
虽然我们的第一个测试确实表明’ pop(0) ‘确实比’ pop() ‘慢，但它没有验证’ pop(0) ‘是O(n)而’ pop() '是O(1)的声明。
To validate that claim we need to look at the performance of both calls over a range of list sizes.
为了验证这一声明，我们需要查看在一系列列表大小下这两个调用的性能。
Listing 5 implements this test.
清单5实现了这个测试。

Listing 5
清单5

popzero = Timer("x.pop(0)",
                "from __main__ import x")
popend = Timer("x.pop()",
               "from __main__ import x")
print("pop(0)   pop()")
for i in range(1000000,100000001,1000000):
    x = list(range(i))
    pt = popend.timeit(number=1000)
    x = list(range(i))
    pz = popzero.timeit(number=1000)
    print("%15.5f, %15.5f" %(pz,pt))

Figure 3 shows the results of our experiment.
图3显示了我们的实验结果。
You can see that as the list gets longer and longer the time it takes to pop(0) also increases while the time for pop stays very flat.
你可以看到，随着列表变得越来越长，“pop(0)”所花费的时间也在增加，而“pop”所花费的时间却非常平稳。
This is exactly what we would expect to see for a O(n) and O(1) algorithm.
这正是我们在O(n)和O(1)算法中所期望看到的。

Some sources of error in our little experiment include the fact that there are other processes running on the computer as we measure that may slow down our code, so even though we try to minimize other things happening on the computer there is bound to be some variation in time.
一些错误在我们的小实验的来源包括事实,还有其他的流程在计算机上运行我们测量可能减缓我们的代码,所以即使我们尽量减少其他事情发生在电脑上一定会有一些变化。
That is why the loop runs the test one thousand times in the first place to statistically gather enough information to make the measurement reliable.
这就是为什么该循环首先运行测试一千次，以收集足够的统计信息，使测量可靠。
在这里插入图片描述
Figure 3: Comparing the Performance of pop and pop(0)
图3:比较’ pop ‘和’ pop(0) '的性能

3.7. Dictionaries

3.7. 字典

The second major Python data structure is the dictionary.
Python的第二个主要数据结构是字典。
As you probably recall, dictionaries differ from lists in that you can access items in a dictionary by a key rather than a position.
您可能还记得，字典与列表的不同之处在于，您可以通过键而不是位置访问字典中的项。
Later in this book you will see that there are many ways to implement a dictionary.
在本书的后面，你会看到有很多方法来实现字典。
The thing that is most important to notice right now is that the get item and set item operations on a dictionary are O(1).
现在需要注意的最重要的一点是，字典上的get项和set项操作是O(1)。
Another important dictionary operation is the contains operation.
另一个重要的字典操作是contains操作。
Checking to see whether a key is in the dictionary or not is also O(1).
检查一个键是否在字典中也是O(1)。
The efficiency of all dictionary operations is summarized in Table 3.
表3总结了所有字典操作的效率。
One important side note on dictionary performance is that the efficiencies we provide in the table are for average performance.
关于字典性能的一个重要提示是，我们在表中提供的效率是针对平均性能的。
In some rare cases the contains, get item, and set item operations can degenerate into O(n) performance but we will get into that in a later chapter when we talk about the different ways that a dictionary could be implemented.
在一些罕见的情况下，包含、获取项和设置项操作可能会退化为O(n)的性能，但我们将在后面一章讨论实现字典的不同方式时对此进行讨论。

Table 3: Big-O Efficiency of Python Dictionary Operations
表3:Python字典操作的Big-O效率

operation	Big-O Efficiency
copy	O(n)
get item	O(1)
set item	O(1)
delete item	O(1)
contains (in)	O(1)
iteration	O(n)

For our last performance experiment we will compare the performance of the contains operation between lists and dictionaries.
在我们的最后一个性能实验中，我们将比较列表和字典之间的contains操作的性能。
In the process we will confirm that the contains operator for lists is O(n) and the contains operator for dictionaries is O(1).
在这个过程中，我们将确认列表的contains操作符是O(n)，字典的contains操作符是O(1)。
The experiment we will use to compare the two is simple.
我们将用来比较两者的实验很简单。
We’ll make a list with a range of numbers in it.
我们会列出一个包含一系列数字的列表。
Then we will pick numbers at random and check to see if the numbers are in the list.
然后我们将随机挑选数字并检查这些数字是否在列表中。
If our performance tables are correct the bigger the list the longer it should take to determine if any one number is contained in the list.
如果我们的性能表是正确的，那么列表越大，确定列表中是否包含任何一个数字所需的时间就越长。

We will repeat the same experiment for a dictionary that contains numbers as the keys.
我们将对一个包含数字作为键的字典重复同样的实验。
In this experiment we should see that determining whether or not a number is in the dictionary is not only much faster, but the time it takes to check should remain constant even as the dictionary grows larger.
在这个实验中，我们应该看到，确定一个数字是否在字典中不仅要快得多，而且即使字典越来越大，检查所用的时间也应该保持不变。

Listing 6 implements this comparison.
清单6实现了这种比较。
Notice that we are performing exactly the same operation, number in container.
注意，我们正在执行完全相同的操作，’ number in container ‘。
The difference is that on line 7 x is a list, and on line 9 x is a dictionary.
区别在于第7行’ x ‘是一个列表，第9行’ x '是一个字典。

Listing 6
清单6

import timeit
import random

for i in range(10000,1000001,20000):
    t = timeit.Timer("random.randrange(%d) in x"%i,
                     "from __main__ import random,x")
    x = list(range(i))
    lst_time = t.timeit(number=1000)
    x = {j:None for j in range(i)}
    d_time = t.timeit(number=1000)
    print("%d,%10.3f,%10.3f" % (i, lst_time, d_time))

运行结果
10000,      0.053,      0.001
30000,      0.166,      0.001
50000,      0.259,      0.001
70000,      0.358,      0.001
90000,      0.465,      0.001
110000,      0.585,      0.001
130000,      0.691,      0.001
150000,      0.788,      0.001
170000,      0.949,      0.001
190000,      1.031,      0.001
210000,      1.182,      0.001
230000,      1.234,      0.001
250000,      1.340,      0.001
270000,      1.458,      0.001
290000,      1.602,      0.001
310000,      1.685,      0.001
330000,      1.880,      0.001
350000,      1.902,      0.001
370000,      1.985,      0.001
390000,      2.146,      0.001
410000,      2.169,      0.001
430000,      2.342,      0.001
450000,      2.475,      0.001
470000,      2.721,      0.001
490000,      2.687,      0.001
510000,      2.842,      0.001
530000,      2.928,      0.001
550000,      2.925,      0.001
570000,      3.087,      0.001
590000,      3.226,      0.001
610000,      3.510,      0.001
630000,      3.568,      0.001
650000,      3.761,      0.001
670000,      3.920,      0.001
690000,      3.804,      0.001
710000,      4.046,      0.001
730000,      4.086,      0.001
750000,      4.265,      0.001
770000,      4.481,      0.001
790000,      4.744,      0.001
810000,      4.534,      0.001
830000,      4.766,      0.001
850000,      4.739,      0.001
870000,      4.738,      0.001
890000,      4.807,      0.001
910000,      5.018,      0.001
930000,      5.297,      0.001
950000,      5.315,      0.001
970000,      5.399,      0.001
990000,      5.468,      0.001

Figure 4 summarizes the results of running Listing 6.
图4总结了运行清单6的结果。
You can see that the dictionary is consistently faster.
你可以看到字典总是更快。
For the smallest list size of 10,000 elements a dictionary is 89.4 times faster than a list.
对于包含10,000个元素的最小列表，字典比列表快89.4倍。
For the largest list size of 990,000 elements the dictionary is 11,603 times faster!
对于包含99万个元素的最大列表，字典的速度要快11603倍!
You can also see that the time it takes for the contains operator on the list grows linearly with the size of the list.
您还可以看到，列表上的contains操作符所花费的时间随着列表的大小线性增长。
This verifies the assertion that the contains operator on a list is O(n).
这将验证列表上的contains操作符是O(n)的断言。
It can also be seen that the time for the contains operator on a dictionary is constant even as the dictionary size grows.
还可以看出，即使字典的大小增加，字典上的contains操作符的时间也是不变的。
In fact for a dictionary size of 10,000 the contains operation took 0.004 milliseconds and for the dictionary size of 990,000 it also took 0.004 milliseconds.
事实上，对于字典大小为10,000的情况，contains操作花了0.004毫秒，对于字典大小为99万的情况，contains操作也花了0.004毫秒。
在这里插入图片描述
Figure 4: Comparing the in Operator for Python Lists and Dictionaries
图4:比较Python列表和字典的in操作符

Since Python is an evolving language, there are always changes going on behind the scenes.
因为Python是一种不断发展的语言，所以在幕后总是会有变化发生。
The latest information on the performance of Python data structures can be found on the Python website.
关于Python数据结构性能的最新信息可以在Python网站上找到。
As of this writing the Python wiki has a nice time complexity page that can be found at the Time Complexity Wiki.
在编写本文时，Python wiki有一个很好的时间复杂性页面，可以在time complexity wiki中找到。

Self Check
Q-1: Which of the list operations shown below is not O(1)?下面列出的哪些操作不是O(1)?

A. list.pop(0) √
When you remove the first element of a list, all the other elements of the list must be shifted forward.
当删除列表的第一个元素时，列表的所有其他元素必须向前移动。
B. list.pop()
Removing an element from the end of the list is a constant operation.
从列表末尾删除元素是一个常量操作。
C. list.append()
Appending to the end of the list is a constant operation
向列表末尾添加内容是一个常量操作
D. list[10]
Indexing a list is a constant operation
索引列表是一个常量操作
E. all of the above are O(1)
There is one operation that requires all other list elements to be moved.
有一种操作要求移动所有其他列表元素。

Q-2: Which of the dictionary operations shown below is O(1)?下面哪个字典操作是O(1)?

A. ‘x’ in mydict
in is a constant operation for a dictionary because you do not have to iterate but there is a better answer.
in是字典的一个常量操作，因为您不必迭代，但有一个更好的答案。
B. del mydict[‘x’]
deleting an element from a dictionary is a constant operation but there is a better answer.
从字典中删除元素是一个常量操作，但是有一个更好的答案。
C. mydict[‘x’] == 10
Assignment to a dictionary key is constant but there is a better answer.
字典键的赋值是不变的，但有更好的答案。
D. mydict[‘x’] = mydict[‘x’] + 1
Re-assignment to a dictionary key is constant but there is a better answer.
对字典键的重新赋值是不变的，但有一个更好的答案。
E. all of the above are O(1) √
The only dictionary operations that are not O(1) are those that require iteration.
唯一不是O(1)的字典操作是那些需要迭代的操作。

3.8. Summary

3.8. 总结

Algorithm analysis is an implementation-independent way of measuring an algorithm.
算法分析是一种测量算法的独立于实现的方法。
Big-O notation allows algorithms to be classified by their dominant process with respect to the size of the problem.
Big-O符号允许算法根据问题的大小根据它们的主导过程进行分类。

3.9. Key Terms

3.9. 关键术语

average case 平均情况

Big-O notation Big-O记号

brute force 暴力破解

checking off 检查

exponential 指数的

linear 线性的

log linear 对数线性

logarithmic 对数模型

order of magnitude 数量级

quadratic 二次

time complexity 时间复杂度

worst case 最坏情况

3.10. Discussion Questions

3.10. 讨论题

1.Give the Big-O performance of the following code fragment:
给出下面代码片段的Big-O性能:

for i in range(n):
   for j in range(n):
      k = 2 + 2

2.Give the Big-O performance of the following code fragment:
给出下面代码片段的Big-O性能:

for i in range(n):
     k = 2 + 2

3.Give the Big-O performance of the following code fragment:
给出下面代码片段的Big-O性能:

i = n
while i > 0:
   k = 2 + 2
   i = i // 2

4.Give the Big-O performance of the following code fragment:
给出下面代码片段的Big-O性能:

for i in range(n):
   for j in range(n):
      for k in range(n):
         k = 2 + 2

5.Give the Big-O performance of the following code fragment:
给出下面代码片段的Big-O性能:

for i in range(n):
   k = 2 + 2
for j in range(n):
   k = 2 + 2
for k in range(n):
   k = 2 + 2

3.11. Programming Exercises

3.11. 习题练习

Devise an experiment to verify that the list index operator is O(1)
设计一个实验来验证列表索引操作符是O(1)
Devise an experiment to verify that get item and set item are O(1) for dictionaries.
设计一个实验来验证get item和set item对于字典来说是O(1)。
Devise an experiment that compares the performance of the del operator on lists and dictionaries.
设计一个实验，比较del运算符在列表和字典上的性能。
Given a list of numbers in random order, write an algorithm that works in O(nlog(n)) to find the kth smallest number in the list.
给定一个随机顺序的数字列表，编写一个算法，在O(nlog(n))中找到列表中第k小的数。
Can you improve the algorithm from the previous problem to be linear? Explain.
你能把之前的问题改进成线性的算法吗?解释一下。

Vicky__3021

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
《每日论文》Problem Solving with Algorithms and Data Structures using Python.（第三章）

Problem Solving with Algorithms and Data Structures using Python.使用Python解决算法和数据结构的问题。By Brad Miller and David Ranum, Luther College目录：Problem Solving with Algorithms and Data Structures using Python.使用Python解决算法和数据结构的问题。3. Analysis3. 分析3.1. Objectives3
复制链接

扫一扫