python union函数_获得列表联合的最快方式 - Python(Fastest way to get union of lists - Python)...

获得列表联合的最快方式 - Python(Fastest way to get union of lists - Python)

有一个C ++比较来从列表列表中获得列表的联合: 找到联合集合的最快方法

还有其他几个与python相关的问题,但没有一个提出将列表组合的最快方法:

从答案中,我了解到至少有两种方法可以做到这一点:

>>> from itertools import chain

>>> x = [[1,2,3], [3,4,5], [1,7,8]]

>>> list(set().union(*x))

[1, 2, 3, 4, 5, 7, 8]

>>> list(set(chain(*x)))

[1, 2, 3, 4, 5, 7, 8]

请注意,我之后将该集合转换为列表,因为我需要将该列表的顺序修复为进一步处理。

经过一番比较,似乎像list(set(chain(*x)))更稳定,所需时间更少:

from itertools import chain

import time

import random

# Dry run.

x = [[random.choice(range(10000))

for i in range(10)] for j in range(10)]

list(set().union(*x))

list(set(chain(*x)))

y_time = 0

z_time = 0

for _ in range(1000):

x = [[random.choice(range(10000))

for i in range(10)] for j in range(10)]

start = time.time()

y = list(set().union(*x))

y_time += time.time() - start

#print 'list(set().union(*x)):\t', y_time

start = time.time()

z = list(set(chain(*x)))

z_time += time.time() - start

#print 'list(set(chain(*x))):\t', z_time

assert sorted(y) == sorted(z)

#print

print y_time / 1000.

print z_time / 1000.

[OUT]:

1.39586925507e-05

1.09834671021e-05

取出铸造套件的变量列表:

y_time = 0

z_time = 0

for _ in range(1000):

x = [[random.choice(range(10000))

for i in range(10)] for j in range(10)]

start = time.time()

y = set().union(*x)

y_time += time.time() - start

start = time.time()

z = set(chain(*x))

z_time += time.time() - start

assert sorted(y) == sorted(z)

print y_time / 1000.

print z_time / 1000.

[OUT]:

1.22241973877e-05

1.02684497833e-05

为什么list(set(chain(*x)))比list(set().union(*x))花费更少的时间?

是否有另一种方式实现相同的列表联合? 使用numpy或pandas或sframe什么的? 替代方案更快吗?

There's a C++ comparison to get union of lists from lists of lists: The fastest way to find union of sets

And there's several other python related questions but none suggest the fastest way to unionize the lists:

From the answers, I've gathered that there are at least 2 ways to do it:

>>> from itertools import chain

>>> x = [[1,2,3], [3,4,5], [1,7,8]]

>>> list(set().union(*x))

[1, 2, 3, 4, 5, 7, 8]

>>> list(set(chain(*x)))

[1, 2, 3, 4, 5, 7, 8]

Note that I'm casting the set to list afterwards because I need the order of the list to be fixed for further processing.

After some comparison, it seems like list(set(chain(*x))) is more stable and takes less time:

from itertools import chain

import time

import random

# Dry run.

x = [[random.choice(range(10000))

for i in range(10)] for j in range(10)]

list(set().union(*x))

list(set(chain(*x)))

y_time = 0

z_time = 0

for _ in range(1000):

x = [[random.choice(range(10000))

for i in range(10)] for j in range(10)]

start = time.time()

y = list(set().union(*x))

y_time += time.time() - start

#print 'list(set().union(*x)):\t', y_time

start = time.time()

z = list(set(chain(*x)))

z_time += time.time() - start

#print 'list(set(chain(*x))):\t', z_time

assert sorted(y) == sorted(z)

#print

print y_time / 1000.

print z_time / 1000.

[out]:

1.39586925507e-05

1.09834671021e-05

Taking out the variable of casting sets to list:

y_time = 0

z_time = 0

for _ in range(1000):

x = [[random.choice(range(10000))

for i in range(10)] for j in range(10)]

start = time.time()

y = set().union(*x)

y_time += time.time() - start

start = time.time()

z = set(chain(*x))

z_time += time.time() - start

assert sorted(y) == sorted(z)

print y_time / 1000.

print z_time / 1000.

[out]:

1.22241973877e-05

1.02684497833e-05

Here's the full output when I try to print the intermediate timings (without list casting): http://pastebin.com/raw/y3i6dXZ8

Why is it that list(set(chain(*x))) takes less time than list(set().union(*x))?

Is there another way of achieving the same union of lists? Using numpy or pandas or sframe or something? Is the alternative faster?

原文:https://stackoverflow.com/questions/35866067

更新时间:2019-06-03 23:45

最满意答案

什么是最快的取决于x的性质 - 无论是长列表还是短列表,包含许多子列表或很少的子列表,子列表是长还是短,以及是否有很多重复或少量重复。

以下是一些比较一些替代方案的时间结果。 有这么多的可能性,这绝不是一个完整的分析,但也许这会给你一个研究你的用例的框架。

func | x | time

unique_concatenate | many_uniques | 0.863

empty_set_union | many_uniques | 1.191

short_set_union_rest | many_uniques | 1.192

long_set_union_rest | many_uniques | 1.194

set_chain | many_uniques | 1.224

func | x | time

long_set_union_rest | many_duplicates | 0.958

short_set_union_rest | many_duplicates | 0.969

empty_set_union | many_duplicates | 0.971

set_chain | many_duplicates | 1.128

unique_concatenate | many_duplicates | 2.411

func | x | time

empty_set_union | many_small_lists | 1.023

long_set_union_rest | many_small_lists | 1.028

set_chain | many_small_lists | 1.032

short_set_union_rest | many_small_lists | 1.036

unique_concatenate | many_small_lists | 1.351

func | x | time

long_set_union_rest | few_large_lists | 0.791

empty_set_union | few_large_lists | 0.813

unique_concatenate | few_large_lists | 0.814

set_chain | few_large_lists | 0.829

short_set_union_rest | few_large_lists | 0.849

一定要在自己的机器上运行timeit基准测试,因为结果可能会有所不同。

from __future__ import print_function

import random

import timeit

from itertools import chain

import numpy as np

def unique_concatenate(x):

return np.unique(np.concatenate(x))

def short_set_union_rest(x):

# This assumes x[0] is the shortest list in x

return list(set(x[0]).union(*x[1:]))

def long_set_union_rest(x):

# This assumes x[-1] is the longest list in x

return list(set(x[-1]).union(*x[1:]))

def empty_set_union(x):

return list(set().union(*x))

def set_chain(x):

return list(set(chain(*x)))

big_range = list(range(10**7))

small_range = list(range(10**5))

many_uniques = [[random.choice(big_range) for i in range(j)]

for j in range(10, 10000, 10)]

many_duplicates = [[random.choice(small_range) for i in range(j)]

for j in range(10, 10000, 10)]

many_small_lists = [[random.choice(big_range) for i in range(10)]

for j in range(10, 10000, 10)]

few_large_lists = [[random.choice(big_range) for i in range(1000)]

for j in range(10, 100, 10)]

if __name__=='__main__':

for x, n in [('many_uniques', 1), ('many_duplicates', 4),

('many_small_lists', 800), ('few_large_lists', 800)]:

timing = dict()

for func in [

'unique_concatenate', 'short_set_union_rest', 'long_set_union_rest',

'empty_set_union', 'set_chain']:

timing[func, x] = timeit.timeit(

'{}({})'.format(func, x), number=n,

setup='from __main__ import {}, {}'.format(func, x))

print('{:20} | {:20} | {}'.format('func', 'x', 'time'))

for key, t in sorted(timing.items(), key=lambda item: item[1]):

func, x = key

print('{:20} | {:20} | {:.3f}'.format(func, x, t))

print(end='\n')

What's fastest depends on the nature of x -- whether it is a long list or a short list, with many sublists or few sublists, whether the sublists are long or short, and whether there are many duplicates or few duplicates.

Here are some timeit results comparing some alternatives. There are so many possibilities that this is by no means a complete analysis, but perhaps this will give you a framework for studying your use case.

func | x | time

unique_concatenate | many_uniques | 0.863

empty_set_union | many_uniques | 1.191

short_set_union_rest | many_uniques | 1.192

long_set_union_rest | many_uniques | 1.194

set_chain | many_uniques | 1.224

func | x | time

long_set_union_rest | many_duplicates | 0.958

short_set_union_rest | many_duplicates | 0.969

empty_set_union | many_duplicates | 0.971

set_chain | many_duplicates | 1.128

unique_concatenate | many_duplicates | 2.411

func | x | time

empty_set_union | many_small_lists | 1.023

long_set_union_rest | many_small_lists | 1.028

set_chain | many_small_lists | 1.032

short_set_union_rest | many_small_lists | 1.036

unique_concatenate | many_small_lists | 1.351

func | x | time

long_set_union_rest | few_large_lists | 0.791

empty_set_union | few_large_lists | 0.813

unique_concatenate | few_large_lists | 0.814

set_chain | few_large_lists | 0.829

short_set_union_rest | few_large_lists | 0.849

Be sure to run the timeit benchmarks on your own machine since results may vary.

from __future__ import print_function

import random

import timeit

from itertools import chain

import numpy as np

def unique_concatenate(x):

return np.unique(np.concatenate(x))

def short_set_union_rest(x):

# This assumes x[0] is the shortest list in x

return list(set(x[0]).union(*x[1:]))

def long_set_union_rest(x):

# This assumes x[-1] is the longest list in x

return list(set(x[-1]).union(*x[1:]))

def empty_set_union(x):

return list(set().union(*x))

def set_chain(x):

return list(set(chain(*x)))

big_range = list(range(10**7))

small_range = list(range(10**5))

many_uniques = [[random.choice(big_range) for i in range(j)]

for j in range(10, 10000, 10)]

many_duplicates = [[random.choice(small_range) for i in range(j)]

for j in range(10, 10000, 10)]

many_small_lists = [[random.choice(big_range) for i in range(10)]

for j in range(10, 10000, 10)]

few_large_lists = [[random.choice(big_range) for i in range(1000)]

for j in range(10, 100, 10)]

if __name__=='__main__':

for x, n in [('many_uniques', 1), ('many_duplicates', 4),

('many_small_lists', 800), ('few_large_lists', 800)]:

timing = dict()

for func in [

'unique_concatenate', 'short_set_union_rest', 'long_set_union_rest',

'empty_set_union', 'set_chain']:

timing[func, x] = timeit.timeit(

'{}({})'.format(func, x), number=n,

setup='from __main__ import {}, {}'.format(func, x))

print('{:20} | {:20} | {}'.format('func', 'x', 'time'))

for key, t in sorted(timing.items(), key=lambda item: item[1]):

func, x = key

print('{:20} | {:20} | {:.3f}'.format(func, x, t))

print(end='\n')

2016-03-08

相关问答

什么是最快的取决于x的性质 - 无论是长列表还是短列表,包含许多子列表或很少的子列表,子列表是长还是短,以及是否有很多重复或少量重复。 以下是一些比较一些替代方案的时间结果。 有这么多的可能性,这绝不是一个完整的分析,但也许这会给你一个研究你的用例的框架。 func | x | time

unique_concatenate | many_uniques | 0.863

empty_set_union

...

按下键的时间越长,所述键的计数器越高 除非您的用户有300个手指,否则他们一次最多只能按十个键。 您可以注册keydown和keyup事件; 当键按下时,保存帧计数器或时间()/时钟()的返回值; 当一个键启动或需要找到键的当前值时,减去差异。 这会将循环次数减少到大约10而不是300.注意,根据系统,time()/ clock()可能是一个系统调用,可能很慢,因此使用帧计数器可能更好。 counter = 0

keys = {}

while True:

for event in pyga

...

set.union做你想要的: >>> results_list = [[1,2,3], [1,2,4]]

>>> results_union = set().union(*results_list)

>>> print results_union

set([1, 2, 3, 4])

您也可以使用两个以上的列表。 set.union does what you want: >>> results_list = [[1,2,3], [1,2,4]]

>>> results_union = set(

...

大概只有这个方法稍微快一些 d = [[] for x in xrange(n)]

是 from itertools import repeat

d = [[] for i in repeat(None, n)]

它不必在每次迭代中创建一个新的int对象,在我的机器上快了约5%。 编辑 :使用NumPy,可以避免使用Python循环 d = numpy.empty((n, 0)).tolist()

但这实际上是列表理解的2.5倍。 The probably only way which is

...

使用集合: >>> seen = set()

>>> s1 = [x for x in a if x[0] not in seen and not seen.add(x[0])]

>>> seen = set()

>>> s2 = [x for x in b if x[0] not in seen and not seen.add(x[0])]

>>> s1

[(1, 2), (2, 3), (4, 5)]

>>> s2

[(5, 2), (6, 3), (4, 5), (1, 9)]

联盟:

...

我不知道它是否更快,但是这样更容易阅读(无论如何): sets={frozenset(e) for e in fruits}

us=set()

while sets:

e=sets.pop()

if any(e.issubset(s) for s in sets) or any(e.issubset(s) for s in us):

continue

else:

us.add(e)

更新 它很快。 更快的是使用for循环。 检查时

...

>>> from operator import ne

>>> from itertools import count, imap, compress

>>> list1[:next(compress(count(), imap(ne, list1, list2)), 0)]

[1, 2]

时序: from itertools import *

from operator import ne

def f1(list1, list2, enumerate=enumerate, izip=izip

...

如果我正确理解您要执行的操作,则可以将set.update方法与任意数量的可迭代参数一起使用。 >>> lists = [[1,2,3], [3,4,5], [5,6,7]]

>>> result = set()

>>> result.update(*lists)

>>>

>>> result

{1, 2, 3, 4, 5, 6, 7}

编辑:使用您的示例数据: >>> list_a = ['abc','bcd','dcb']

>>> list_b = ['abc','xyz','ASD']

...

我只是使用python-builtin reduce来实现这一点,它似乎并不复杂,而且在我的测试中并没有那么慢: from itertools import product

for x in product(range(3), range(2)):

rg = reduce(lambda result, index: result[index], x, lst)

value = rg[0]

如果您担心reduce的时序损失,您可以使用for循环代替: for x in produ

...

它们是lists ,而不是数组,但这里是一个解决方案: a1 = [{'student': {'name': 'abc'}, 'address': 'add_abc'},

{'student': {'name': 'xyz'}, 'address': 'add_xyz'}]

a2 = [{'student': {'name': 'abc'}, 'address': 'add_abc'},

{'student': {'name': 'rst'}, 'address'

...

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值