Python原生模块与自建模块的效率比较

最新推荐文章于 2025-09-30 17:17:08 发布

原创最新推荐文章于 2025-09-30 17:17:08 发布 · 1.2k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#效率 #原生模块 #timeit #算法测试 #效率对比

python 专栏收录该内容

35 篇文章

订阅专栏

本文通过实测Python自带模块与自定义实现的性能差异，包括max、sort、字符串shuffle功能，揭示了原生模块在效率上的优势。

今天突发奇想，Python自带的模块究竟比自己造的轮子快多少呢？

于是测试了max和sort模块。自建max模块：

def max_iteration(list_sample):
    length = len(list_sample)
    if length <= 1:
        return list_sample[0]
    max1 = max_iteration(list_sample[:length/2])
    max2 = max_iteration(list_sample[length/2:])
    return max(max1, max2)

用timeit计算了一下，一个长度为1000的list，结果很明显：

max iteration: 6.85511016846 
max original: 0.713296890259
max iteration: 6.77465510368 
max original: 0.699170827866

然后是sort。自建的merge sort算法如下：

def sort_iteration(list_sample):
    #merges sort
    
    result = []
    length = len(list_sample)
    if length <= 1:
        return list_sample
    list1 = sort_iteration(list_sample[:length/2])
    list2 = sort_iteration(list_sample[length/2:])
    while (len(list1) > 0) or (len(list2) > 0):
        if (len(list1) > 0) and (len(list2) > 0):
            if list1[0] < list2[0]:
                result.append(list1.pop(0))
            else:
                result.append(list2.pop(0))
        elif len(list1) == 0:
            result.extend(list2)
            list2 = []
        elif len(list2) == 0:
            result.extend(list1)
            list1 = []
        else:
            print "something is wrong."
            raise
    return result

然后结果如下:

sort iteration: 31.4509849548 
sort original: 1.15701317787

果然啊，效率差距惊人。

当然，不管是排序，还是求最大值，都有更快的算法，比如使用最小堆栈应该可以得到更高的效率。

因为最近要使用字符串的shuffle功能，也捎带着测试了一把。字符串本身不具有shuffle功能，因此可以考虑转化为list，再shuffle，然后再转化为字符串：

def string_shuffle_list():
    '''
    shuffle the string with list
    '''
    string_list = list(STRING_SAMPLE)
    random.shuffle(string_list)
    string_shuffle = ''.join(string_list)
    return

另外一种方法是从字符串中随机取样全部的字符串元素，然后再转化为字符串：

def string_shuffle_sample():
    '''
    shuffle the string with sample
    '''
    string_shuffle = ''.join(random.sample(STRING_SAMPLE, len(STRING_SAMPLE)))
    return

第二种方法看起来更简洁，我原本以为效率会更高，结果大跌眼镜：

string shuffle with list: 168.345906019,
string shuffle with sample: 207.21715498

先转化为list的方法反而更快。

也可以看出两种方法都很慢，网上说时间用在了''.join()上。额，先这样吧，过几天再找更快的方法。

最后说一下timeit的用法，普通用法网上到处都是，只是没有说明怎么传入函数的参数：

def test_sort():
    t1 = timeit.Timer(lambda: sort_iteration(LIST_SAMPLE))
    t2 = timeit.Timer(lambda: sort_original(LIST_SAMPLE))
    print("sort iteration: %s \nsort original: %s" %(t1.timeit(), t2.timeit()))

看起来，python的原生模块的效率比想像中的高的多，所以还是尽可能使用原生的模块。

————————10.17 update————————————

迭代的时候用set好还是list好呢？

def in_test(iterable):
    for i in iterable:
        pass

结果一个1000大小的set和list分别是：（t1是set）

In [287]: t1.timeit()
Out[287]: 59.14477300643921

In [288]: t2.timeit()
Out[288]: 27.456761837005615

list的迭代效率更高。说实话，依然是出乎我的意料的。

当查找一个元素是否在对象内的时候，set的效率更高。

源代码：https://github.com/gt11799/test_modules_efficiency