python并行遍历_在python中并行遍历单个列表

The objective is to do calculations on a single iter in parallel using builtin sum & map functions concurrently. Maybe using (something like) itertools instead of classic for loops to analyze (LARGE) data that arrives via an iterator...

In one simple example case I want to calculate ilen, sum_x & sum_x_sq:

ilen,sum_x,sum_x_sq=iterlen(iter),sum(iter),sum(map(lambda x:x*x, iter))

But without converting the (large) iter to a list (as with iter=list(iter))

n.b. Do this using sum & map and without for loops, maybe using the itertools and/or threading modules?

def example_large_data(n=100000000, mean=0, std_dev=1):

for i in range(n): yield random.gauss(mean,std_dev)

-- edit --

Being VERY specific: I was taking a good look at itertools hoping that there was a dual function like map that could do it. For example: len_x,sum_x,sum_x_sq=itertools.iterfork(iter_x,iterlen,sum,sum_sq)

If I was to be very very specific: I am looking for just one answer, python source code for the "iterfork" procedure.

解决方案

You can use itertools.tee to turn your single iterator into three iterators which you can pass to your three functions.

iter0, iter1, iter2 = itertools.tee(input_iter, 3)

ilen, sum_x, sum_x_sq = count(iter0),sum(iter1),sum(map(lambda x:x*x, iter2))

That will work, but the builtin function sum (and map in Python 2) is not implemented in a way that supports parallel iteration. The first function you call will consume its iterator completely, then the second one will consume the second iterator, then the third function will consume the third iterator. Since tee has to store the values seen by one of its output iterators but not all of the others, this is essentially the same as creating a list from the iterator and passing it to each function.

Now, if you use generator functions that consume only a single value from their input for each value they output, you might be able to make parallel iteration work using zip. In Python 3, map and zip are both generators. The question is how to make sum into a generator.

I think you can get pretty much what you want by using itertools.accumulate (which was added in Python 3.2). It is a generator that yields a running sum of its input. Here's how you could make it work for your problem (I'm assuming your count function was supposed to be an iterator-friendly version of len):

iter0, iter1, iter2 = itertools.tee(input_iter, 3)

len_gen = itertools.accumulate(map(lambda x: 1, iter0))

sum_gen = itertools.accumulate(iter1)

sum_sq_gen = itertools.accumulate(map(lambda x: x*x, iter2))

parallel_gen = zip(len_gen, sum_gen, sum_sq_gen) # zip is a generator in Python 3

for ilen, sum_x, sum_x_sq in parallel_gen:

pass # the generators do all the work, so there's nothing for us to do here

# ilen_x, sum_x, sum_x_sq have the right values here!

If you're using Python 2, rather than 3, you'll have to write your own accumulate generator function (there's a pure Python implementation in the docs I linked above), and use itertools.imap and itertools.izip rather than the builtin map and zip functions.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值