python中list、tuple、dict、set的相同点,Python集合类型(list tuple dict set generator)图文详解...

Python内嵌的集合类型有list、tuple、set、dict。

列表list:看似数组,但比数组强大,支持索引、切片、查找、增加等功能。

元组tuple:功能跟list差不多,但一旦生成,长度及元素都不可变(元素的元素还是可变),似乎就是一更轻量级、安全的list。

字典dict:键值对结构哈希表,跟哈希表的性质一样,key无序且不重复,增删改方便快捷。

set:无序且不重复的集合,就是一个只有键没有值的dict,Java的HashSet就是采用HashMap实现,但愿python不会是这样,毕竟set不需要value,省去了很多指针。

Generator:

称之为生成器,或者列表推导式,是python中有一个特殊的数据类型,实际上并不是一个数据结构,只包括算法和暂存的状态,并且具有迭代的功能。

先看看它们的内存使用情况,分别用生成器生成100000个元素的set, dict, generator, tuple, list。消耗的内存dict, set, list, tuple依次减少,生成的对象大小也是一样。由于generator并不生成数据表,所以不需要消耗内存:import sys

from memory_profiler import profile

@profile

def create_data(data_size):

data_generator = (x for x in xrange(data_size))

data_set = {x for x in xrange(data_size)}

data_dict = {x:None for x in xrange(data_size)}

data_tuple = tuple(x for x in xrange(data_size))

data_list = [x for x in xrange(data_size)]

return data_set, data_dict, data_generator, data_tuple, data_list

data_size = 100000

for data in create_data(data_size):

print data.__class__, sys.getsizeof(data)

Line # Mem usage Increment Line Contents

================================================

14.6 MiB 0.0 MiB @profile

def create_data(data_size):

14.7 MiB 0.0 MiB data_generator = (x for x in xrange(data_size))

21.4 MiB 6.7 MiB data_set = {x for x in xrange(data_size)}

29.8 MiB 8.5 MiB data_dict = {x:None for x in xrange(data_size)}

33.4 MiB 3.6 MiB data_tuple = tuple(x for x in xrange(data_size))

38.2 MiB 4.8 MiB data_list = [x for x in xrange(data_size)]

38.2 MiB 0.0 MiB return data_set, data_dict, data_generator, data_tuple, data_list

4194528

6291728

72

800048

824464

再看看查找性能,dict,set是常数查找时间(O(1)),list、tuple是线性查找时间(O(n)),用生成器生成指定大小元素的对象,用随机生成的数字去查找:import time

import sys

import random

from memory_profiler import profile

def create_data(data_size):

data_set = {x for x in xrange(data_size)}

data_dict = {x:None for x in xrange(data_size)}

data_tuple = tuple(x for x in xrange(data_size))

data_list = [x for x in xrange(data_size)]

return data_set, data_dict, data_tuple, data_list

def cost_time(func):

def cost(*args, **kwargs):

start = time.time()

r = func(*args, **kwargs)

cost = time.time() - start

print 'find in %s cost time %s' % (r, cost)

return r, cost #返回数据的类型和方法执行消耗的时间

return cost

@cost_time

def test_find(test_data, data):

for d in test_data:

if d in data:

pass

return data.__class__.__name__

data_size = 100

test_size = 10000000

test_data = [random.randint(0, data_size) for x in xrange(test_size)]

#print test_data

for data in create_data(data_size):

test_find(test_data, data)

输出:

----------------------------------------------

find in cost time 0.47200012207

find in cost time 0.429999828339

find in cost time 5.36500000954

find in cost time 5.53399991989

100个元素的大小的集合,分别查找1000W次,差距非常明显。不过这些随机数,都是能在集合中查找得到。修改一下随机数方式,生成一半是能查找得到,一半是查找不到的。从打印信息可以看出在有一半最坏查找例子的情况下,list、tuple表现得更差了。def randint(index, data_size):

return random.randint(0, data_size) if (x % 2) == 0 else random.randint(data_size, data_size * 2)

test_data = [randint(x, data_size) for x in xrange(test_size)]

输出:

----------------------------------------------

find in cost time 0.450000047684

find in cost time 0.397000074387

find in cost time 7.83299994469

find in cost time 8.27800011635

元素的个数从10增长至500,统计每次查找10W次的时间,用图拟合时间消耗的曲线,结果如下图,结果证明dict, set不管元素多少,一直都是常数查找时间,dict、tuple随着元素增长,呈现线性增长时间:import matplotlib.pyplot as plot

from numpy import *

data_size = array([x for x in xrange(10, 500, 10)])

test_size = 100000

cost_result = {}

for size in data_size:

test_data = [randint(x, size) for x in xrange(test_size)]

for data in create_data(size):

name, cost = test_find(test_data, data) #装饰器函数返回函数的执行时间

cost_result.setdefault(name, []).append(cost)

plot.figure(figsize=(10, 6))

xline = data_size

for data_type, result in cost_result.items():

yline = array(result)

plot.plot(xline, yline, label=data_type)

plot.ylabel('Time spend')

plot.xlabel('Find times')

plot.grid()

plot.legend()

plot.show()

c44e7f8985b3ca2834a75b4643f0a9a9.png

迭代的时间,区别很微弱,dict、set要略微消耗时间多一点:@cost_time

def test_iter(data):

for d in data:

pass

return data.__class__ .__name__

data_size = array([x for x in xrange(1, 500000, 1000)])

cost_result = {}

for size in data_size:

for data in create_data(size):

name, cost = test_iter(data)

cost_result.setdefault(name, []).append(cost)

#拟合曲线图

plot.figure(figsize=(10, 6))

xline = data_size

for data_type, result in cost_result.items():

yline = array(result)

plot.plot(xline, yline, label=data_type)

plot.ylabel('Time spend')

plot.xlabel('Iter times')

plot.grid()

plot.legend()

plot.show()

d887c3a5af3160949b38761bdd77db68.png

删除元素消耗时间图示如下,随机删除1000个元素,tuple类型不能删除元素,所以不做比较:

ce0a55fff16e02261c4929657b056a0d.png

随机删除一半的元素,图形就呈指数时间(O(n2))增长了:

c922de40c6ecd3a5f2e9ad64830f8a1a.png

添加元素消耗的时间图示如下,统计以10000为增量大小的元素个数的添加时间,都是线性增长时间,看不出有什么差别,tuple类型不能添加新的元素,所以不做比较:@cost_time

def test_dict_add(test_data, data):

for d in test_data:

data[d] = None

return data.__class__ .__name__

@cost_time

def test_set_add(test_data, data):

for d in test_data:

data.add(d)

return data.__class__ .__name__

@cost_time

def test_list_add(test_data, data):

for d in test_data:

data.append(d)

return data.__class__ .__name__

#初始化数据,指定每种类型对应它添加元素的方法

def init_data():

test_data = {

'list': (list(), test_list_add),

'set': (set(), test_set_add),

'dict': (dict(), test_dict_add)

}

return test_data

#每次检测10000增量大小的数据的添加时间

data_size = array([x for x in xrange(10000, 1000000, 10000)])

cost_result = {}

for size in data_size:

test_data = [x for x in xrange(size)]

for data_type, (data, add) in init_data().items():

name, cost = add(test_data, data) #返回方法的执行时间

cost_result.setdefault(data_type, []).append(cost)

plot.figure(figsize=(10, 6))

xline = data_size

for data_type, result in cost_result.items():

yline = array(result)

plot.plot(xline, yline, label=data_type)

plot.ylabel('Time spend')

plot.xlabel('Add times')

plot.grid()

plot.legend()

plot.show()

7798e9a2727ad5b5c0ee97b7ad7b5065.png

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值