java parallelstream_java – 当.stream().parallel()做同样的事情时,...

收集的Javadocs.(parallelS | s)tream()和Stream本身没有回答这个问题,所以它是关于理由的邮件列表.我浏览了lambda-libs-spec-observers档案,发现one thread specifically about Collection.parallelStream()和另一个涉及java.util.Arrays should provide parallelStream()是否匹配的线程(实际上,是否应该删除).没有一劳永逸的结论,所以也许我错过了另一个清单中的某些内容,或者这个问题在私人讨论中得到了解决. (也许Brian Goetz,这次讨论的主要内容之一,可以填补任何遗漏的内容.)

参与者提出了他们的观点,所以这个答案大多只是一个相关引用的组织,在[括号]中有一些澄清,按重要性顺序呈现(正如我所解释的那样).

parallelStream()涵盖了一个非常常见的情况

Brian Goetz在第一个线程中,解释了为什么Collections.parallelStream()的价值足以保持甚至在其他并行流工厂方法被删除后:

We do not have explicit parallel versions of each of these [stream factories]; we did

originally, and to prune down the API surface area, we cut them on the

theory that dropping 20+ methods from the API was worth the tradeoff of

the surface yuckiness and performance cost of .intRange(...).parallel().

But we did not make that choice with Collection.

We could either remove the Collection.parallelStream(), or we could add

the parallel versions of all the generators, or we could do nothing and

leave it as is. I think all are justifiable on API design grounds.

I kind of like the status quo, despite its inconsistency. Instead of

having 2N stream construction methods, we have N+1 — but that extra 1

covers a huge number of cases, because it is inherited by every

Collection. So I can justify to myself why having that extra 1 method

is worth it, and why accepting the inconsistency of going no further is

acceptable.

Do others disagree? Is N+1 [Collections.parallelStream() only] the practical choice here? Or should we go

for the purity of N [rely on Stream.parallel()]? Or the convenience and consistency of 2N [parallel versions of all factories]? Or is

there some even better N+3 [Collections.parallelStream() plus other special cases], for some other specially chosen cases we

want to give special support to?

Brian Goetz在后面关于Arrays.parallelStream()的讨论中代表这个位置:

I still really like Collection.parallelStream; it has huge

discoverability advantages, and offers a pretty big return on API

surface area — one more method, but provides value in a lot of places,

since Collection will be a really common case of a stream source.

parallelStream()更高效

Direct version [parallelStream()] is more performant, in that it requires less wrapping (to

turn a stream into a parallel stream, you have to first create the

sequential stream, then transfer ownership of its state into a new

Stream.)

回应Kevin Bourrillion对这种影响是否显着的怀疑,Brian again:

Depends how seriously you are counting. Doug counts individual object

creations and virtual invocations on the way to a parallel operation,

because until you start forking, you’re on the wrong side of Amdahl’s

law — this is all “serial fraction” that happens before you can fork

any work, which pushes your breakeven threshold further out. So getting

the setup path for parallel ops fast is valuable.

People dealing with parallel library support need some attitude

adjustment about such things. On a soon-to-be-typical machine,

every cycle you waste setting up parallelism costs you say 64 cycles.

You would probably have had a different reaction if it required 64

object creations to start a parallel computation.

That said, I’m always completely supportive of forcing implementors

to work harder for the sake of better APIs, so long as the

APIs do not rule out efficient implementation. So if killing

parallelStream is really important, we’ll find some way to

turn stream().parallel() into a bit-flip or somesuch.

stream().parallel()有状态使未来复杂化

在讨论时,将流从顺序切换到并行并返回可以与其他流操作交织. Brian Goetz, on behalf of Doug Lea,解释了为什么顺序/并行模式切换可能使Java平台的未来发展复杂化:

I’ll take my best stab at explaining why: because it (like the stateful

methods (sort, distinct, limit)) which you also don’t like, move us

incrementally farther from being able to express stream pipelines in

terms of traditional data-parallel constructs, which further constrains

our ability to to map them directly to tomorrow’s computing substrate,

whether that be vector processors, FPGAs, GPUs, or whatever we cook up.

Filter-map-reduce map[s] very cleanly to all sorts of parallel computing

substrates; filter-parallel-map-sequential-sorted-limit-parallel-map-uniq-reduce

does not.

So the whole API design here embodies many tensions between making it

easy to express things the user is likely to want to express, and doing

is in a manner that we can predictably make fast with transparent cost

models.

此模式切换是removed after further discussion.在当前版本的库中,流管道是顺序的或并行的;最后一次调用sequential()/ parallel()获胜.除了支持有状态问题之外,这种改变还改善了使用parallel()从顺序流工厂建立并行管道的性能.

将parallelStream()作为一等公民公开可以提高程序员对库的感知,从而使他们能够编写更好的代码

I have a slightly different viewpoint about the value of this sequential

intuition — I view the pervasive “sequential expectation” as one if the

biggest challenges of this entire effort; people are constantly

bringing their incorrect sequential bias, which leads them to do stupid

things like using a one-element array as a way to “trick” the “stupid”

compiler into letting them capture a mutable local, or using lambdas as

arguments to map that mutate state that will be used during the

computation (in a non-thread-safe way), and then, when its pointed out

that what they’re doing, shrug it off and say “yeah, but I’m not doing

it in parallel.”

We’ve made a lot of design tradeoffs to merge sequential and parallel

streams. The result, I believe, is a clean one and will add to the

library’s chances of still being useful in 10+ years, but I don’t

particularly like the idea of encouraging people to think this is a

sequential library with some parallel bags nailed on the side.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值