Python数据结构（二）——算法分析

最新推荐文章于 2024-08-22 00:03:42 发布

D介子

最新推荐文章于 2024-08-22 00:03:42 发布

阅读量489

点赞数

分类专栏： Python数据结构 Python数据结构文章标签： Python

本文链接：https://blog.csdn.net/ds19991999/article/details/81276054

版权

Python数据结构同时被 2 个专栏收录

9 篇文章 5 订阅

订阅专栏

Python数据结构

7 篇文章 0 订阅

订阅专栏

算法分析

算法分析
Contents

一个例子

对简单的 Python 程序做基准测试benchmark

问题：计算前 n 个整数的和

# version1
def sumOfN(n):
    theSum = 0
    for i in range(1,n+1):
        theSum += i
    return theSum
print sumOfN(10)

# version2
def foo(tom):
    fred = 0
    for bill in range(1,tom+1):
        # 增加了不必要的赋值语句
        barney = bill
        fred += barney
    return fred
print foo(10)

我们可以通过基准分析（benchmark analysis）来测量函数 SumOfN 的执行时间，在 Python 中，我们可以通过记录相对于系统的开始时间和结束时间来对函数进行基准测试。

在 time 模块中有一个 time 函数，它可以在任意被调用的地方返回系统时钟的当前时间（以秒为单位）。

# version1
import time

def sumOfN2(n):
    start = time.time()
    theSum = 0
    for i in range(1,n+1):
        theSum += i
    end =time.time()
    return theSum,end-start

for i in range(5):
    print("Sum is %d required %10.7f seconds"%sumOfN2(100000))

Sum is 5000050000 required  0.0139999 seconds
Sum is 5000050000 required  0.0090001 seconds
Sum is 5000050000 required  0.0090001 seconds
Sum is 5000050000 required  0.0079999 seconds
Sum is 5000050000 required  0.0079999 seconds

# version2
def foo2(tom):
    start = time.time()
    fred = 0
    for bill in range(1,tom+1):
        # 增加了不必要的赋值语句
        barney = bill
        fred += barney
    end = time.time()
    return fred,end-start
for i in range(5):
    print("Sum is %d required %10.7f seconds"%foo2(100000))

Sum is 5000050000 required  0.0100000 seconds
Sum is 5000050000 required  0.0119998 seconds
Sum is 5000050000 required  0.0100000 seconds
Sum is 5000050000 required  0.0100000 seconds
Sum is 5000050000 required  0.0090001 seconds

# version3
def sumOfN3(n):
    start =time.time()
    s= (n*(n+1))/2
    end = time.time()
    return s,end-start
for i in range(5):
    print("Sum is %d required %10.7f seconds"%sumOfN3(1000))

Sum is 500500 required  0.0000000 seconds
Sum is 500500 required  0.0000000 seconds
Sum is 500500 required  0.0000000 seconds
Sum is 500500 required  0.0000000 seconds
Sum is 500500 required  0.0000000 seconds

大O符号

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

n = np.linspace(0.1, 10)
f = plt.figure(figsize=(10,6), dpi=80)
plt.plot(n,n,label = '$n$')
plt.plot(n,np.log(n),label = '$\logn$')
plt.plot(n,n*np.log(n),label = '$n*\logn$')
plt.plot(n,n**2,label = '$n^2$')
plt.plot(n,n**3,label = '$n^3$')
plt.plot(n,2**n,label = '$2^n$')
p = plt.xlim(0,10)
p = plt.ylim(0,50)
plt.grid(True)
plt.legend()

<matplotlib.legend.Legend at 0xbecfaf0>

这里写图片描述

算法时间复杂度取 $f(n)$ 的最大项，也就是当n趋于无穷大时的最大项，忽略系数，用 $O(f(n))$ 表示

乱序字符串例子

描述：乱序字符串是指一个字符串只是另一个字符串的重新排列。例如，'heart' 和'earth'就是乱序字符串。’python’ 和 'typhon'也是。为了简单起见，我们假设所讨论的两个字符串具有相等的长度，并且他们由 26 个小写字母集合组成。
目标：写一个布尔函数，它将两个字符串做参数并返回它们是不是乱序。

Method 1：检查

思路：
* 检查第一个字符串是否出现在第二个字符串中，如果可以检测到每一个字符，那么这两个字符串就是乱序.
* Python字符串不可变，先将它变为可变的列表.

def founction1(s1,s2):
    alist = list(s2)
    pos1 = 0
    stillOK = True

    while pos1<len(s1) and stillOK:
        pos2 = 0
        found = False
        while pos2<len(alist) and not found:
            if s1[pos1]==alist[pos2]:
                found = True
            else:
                pos2 += 1
        if found:
            # 将找到的字符设置为None，避免重复遍历
            alist[pos2]=None
        else:
            stillOK = False
        pos1 += 1
    return stillOK
print founction1('abcd','dcab')

True

算法分析：s1每个字符在s2中最多进行n个字符的迭代，s2列表中的每个位置将被访问一次来匹配来自s1的字符串，即：
imag
所以算法时间复杂度为： $O(n^2)$

Method 2:排序比较

思路：由于我们是要判断两个字符串是不是由完全相同的字符组成，所以我们可以先将他们排序，排序后如果 $s1==s2$ ，那么这两个字符串就是乱序字符串.

def founction2(s1,s2):
    list_1 = list(s1)
    list_2 = list(s2)
    list_1.sort()
    list_2.sort()

    match = True

    if list_1!=list_2:
        match = False
    return match
#     pos = 0
#     match = True
#     while pos<len(s1) and match:
#         if list_1[pos]==list_2[pos]:
#             pos +=1
#         else:
#             match = False
#     return match
print founction2("abcde","edcba")

True

算法分析：调用python的排序函数有时间成本，通常为O(n^2) 或 O(nlogn)，所以排序操作比迭代花费更多。最后该算法跟排序过程有同样的量级。

穷举法

思路：生成 $s1$ 的所有乱序字符串列表，然后查看是不是有 $s2$ ，这种序列有 $n!$ 种， $n!$ 比 $n^2$ 增长更快。

计数和比较

思路：利用两个乱序字符串具有相同数目的 a, b, c 等字符的事实，首先计算的是每个字母出现的次数。由于有 26 个可能的字符，我们就用一个长度为 26 的列表，每个可能的字符占一个位置。每次看到一个特定的字符，就增加该位置的计数器。最后如果两个列表的计数器一样，则字符串为乱序字符串。

def founction(s1,s2):
    c1 = [0]*26
    c2 = [0]*26

    for i in range(len(s1)):
        pos = ord(s1[i])-ord('a')
        c1[pos] += 1

    for i in range(len(s2)):
        pos = ord(s2[i])-ord('a')
        c2[pos] += 1

    j = 0
    stillOK = True
    while j<26 and stillOK:
        if c1[j]==c2[j]:
            j += 1
        else:
            stillOK = False
    return stillOK
print(founction("abcdnf","bcafnd"))