本文翻译自:Should I always use a parallel stream when possible?
With Java 8 and lambdas it's easy to iterate over collections as streams, and just as easy to use a parallel stream. 使用Java 8和lambda,可以很容易地将集合作为流进行迭代,也很容易使用并行流。 Two examples from the docs , the second one using parallelStream: docs中的两个示例,第二个示例使用parallelStream:
myShapesCollection.stream()
.filter(e -> e.getColor() == Color.RED)
.forEach(e -> System.out.println(e.getName()));
myShapesCollection.parallelStream() // <-- This one uses parallel
.filter(e -> e.getColor() == Color.RED)
.forEach(e -> System.out.println(e.getName()));
As long as I don't care about the order, would it always be beneficial to use the parallel? 只要我不关心顺序,使用并行会一直有益吗? One would think it is faster dividing the work on more cores. 有人会认为,更快地将工作划分到更多的内核上。
Are there other considerations? 还有其他考虑事项吗? When should parallel stream be used and when should the non-parallel be used? 什么时候应该使用并行流,什么时候应该使用非并行流?
(This question is asked to trigger a discussion about how and when to use parallel streams, not because I think always using them is a good idea.) (问这个问题引发了关于如何以及何时使用并行流的讨论,不是因为我认为始终使用并行流是一个好主意。)
#1楼
参考:https://stackoom.com/question/1NUVs/如果可能-是否应该始终使用并行流
#2楼
A parallel stream has a much higher overhead compared to a sequential one. 与顺序流相比,并行流的开销要高得多。 Coordinating the threads takes a significant amount of time. 协调线程需要花费大量时间。 I would use sequential streams by default and only consider parallel ones if 我将默认使用顺序流,并且仅在以下情况下考虑并行流
I have a massive amount of items to process (or the processing of each item takes time and is parallelizable) 我要处理大量项目(或者每个项目的处理需要时间并且可以并行化)
I have a performance problem in the first place 我首先遇到性能问题
I don't already run the process in a multi-thread environment (for example: in a web container, if I already have many requests to process in parallel, adding an additional layer of parallelism inside each request could have more negative than positive effects) 我尚未在多线程环境中运行该流程(例如:在Web容器中,如果我已经有许多并行处理的请求,则在每个请求中添加额外的并行度层可能会产生比积极影响更大的负面影响)
In your example, the performance will anyway be driven by the synchronized access to System.out.println()
, and making this process parallel will have no effect, or even a negative one. 在您的示例中,无论如何,性能将由对System.out.println()
的同步访问来驱动,并且使此过程并行将没有效果,甚至是负面的。
Moreover, remember that parallel streams don't magically solve all the synchronization problems. 此外,请记住,并行流并不能神奇地解决所有同步问题。 If a shared resource is used by the predicates and functions used in the process, you'll have to make sure that everything is thread-safe. 如果过程中使用的谓词和函数使用了共享资源,则必须确保所有内容都是线程安全的。 In particular, side effects are things you really have to worry about if you go parallel. 尤其是副作用,如果并行使用,那么您真的要担心。
In any case, measure, don't guess! 无论如何,不要猜测! Only a measurement will tell you if the parallelism is worth it or not. 只有度量会告诉您并行性是否值得。
#3楼
JB hit the nail on the head. JB撞到了头。 The only thing I can add is that Java 8 doesn't do pure parallel processing, it does paraquential . 我唯一可以添加的是Java 8不会进行纯并行处理,而是会进行后处理 。 Yes I wrote the article and I've been doing F/J for thirty years so I do understand the issue. 是的,我写了这篇文章,并且从事F / J工作已经三十年了,所以我确实理解了这个问题。
#4楼
The Stream API was designed to make it easy to write computations in a way that was abstracted away from how they would be executed, making switching between sequential and parallel easy. Stream API旨在简化计算方式,简化了计算方式,简化了顺序和并行之间的切换。
However, just because its easy, doesn't mean its always a good idea, and in fact, it is a bad idea to just drop .parallel()
all over the place simply because you can. 但是,仅仅因为它简单容易,并不意味着它总是一个好主意,实际上,仅仅因为可以就在整个地方放下.parallel()
就是一个坏主意。
First, note that parallelism offers no benefits other than the possibility of faster execution when more cores are available. 首先,请注意,并行化除了提供更多内核可用时更快执行的可能性外没有其他好处。 A parallel execution will always involve more work than a sequential one, because in addition to solving the problem, it also has to perform dispatching and coordinating of sub-tasks. 并行执行总是比顺序执行涉及更多的工作,因为除了解决问题之外,它还必须执行子任务的分派和协调。 The hope is that you'll be able to get to the answer faster by breaking up the work across multiple processors; 希望您可以通过分解多个处理器上的工作来更快地找到答案。 whether this actually happens depends on a lot of things, including the size of your data set, how much computation you are doing on each element, the nature of the computation (specifically, does the processing of one element interact with processing of others?), the number of processors available, and the number of other tasks competing for those processors. 这种情况是否真的发生取决于很多因素,包括数据集的大小,对每个元素进行的计算量,计算的性质(具体来说,一个元素的处理是否与其他元素的处理相互作用?) ,可用处理器的数量以及与这些处理器竞争的其他任务的数量。
Further, note that parallelism also often exposes nondeterminism in the computation that is often hidden by sequential implementations; 此外,请注意,并行性通常还会在计算中暴露出不确定性,而不确定性通常被顺序实现所隐藏; sometimes this doesn't matter, or can be mitigated by constraining the operations involved (ie, reduction operators must be stateless and associative.) 有时这无关紧要,或者可以通过限制所涉及的操作来缓解(即,归约运算符必须是无状态且具有关联性的)。
In reality, sometimes parallelism will speed up your computation, sometimes it will not, and sometimes it will even slow it down. 实际上,并行有时会加快您的计算速度,有时却不会,甚至有时会降低速度。 It is best to develop first using sequential execution and then apply parallelism where (A) you know that there's actually benefit to increased performance and (B) that it will actually deliver increased performance. 最好先使用顺序执行进行开发,然后再应用并行性,其中(A)您知道提高性能实际上是有好处的,并且(B)它实际上可以提高性能。 (A) is a business problem, not a technical one. (A)是业务问题,而不是技术问题。 If you are a performance expert, you'll usually be able to look at the code and determine (B), but the smart path is to measure. 如果您是性能专家,通常可以查看代码并确定(B),但是明智的选择是衡量。 (And, don't even bother until you're convinced of (A); if the code is fast enough, better to apply your brain cycles elsewhere.) (并且,甚至在您确信(A)之前都不要打扰;如果代码足够快,最好将您的大脑循环应用于其他地方。)
The simplest performance model for parallelism is the "NQ" model, where N is the number of elements, and Q is the computation per element. 并行性的最简单性能模型是“ NQ”模型,其中N是元素数,Q是每个元素的计算量。 In general, you need the product NQ to exceed some threshold before you start getting a performance benefit. 通常,在开始获得性能优势之前,您需要产品NQ超过某个阈值。 For a low-Q problem like "add up numbers from 1 to N", you will generally see a breakeven between N=1000 and N=10000. 对于低Q问题,例如“从1到N的数字相加”,通常会看到N = 1000和N = 10000之间的收支平衡。 With higher-Q problems, you'll see breakevens at lower thresholds. 对于较高Q的问题,您将在较低的阈值处看到收支平衡。
But the reality is quite complicated. 但是现实非常复杂。 So until you achieve experthood, first identify when sequential processing is actually costing you something, and then measure if parallelism will help. 因此,在获得专家见识之前,请先确定顺序处理实际上在什么时候使您付出了代价,然后衡量并行性是否会有所帮助。
#5楼
I watched one of the presentations of Brian Goetz (Java Language Architect & specification lead for Lambda Expressions) . 我观看了Brian Goetz (Java语言架构师和Lambda Expressions的规范负责人)的演示之一。 He explains in detail the following 4 points to consider before going for parallelization: 他详细解释了进行并行化之前要考虑的以下4点:
Splitting / decomposition costs 分解/分解成本
– Sometimes splitting is more expensive than just doing the work! –有时拆分比仅做工作要昂贵!
Task dispatch / management costs 任务分派/管理费用
– Can do a lot of work in the time it takes to hand work to another thread. –在将工作交给另一个线程的时间上可以完成很多工作。
Result combination costs 结果合并成本
– Sometimes combination involves copying lots of data. –有时组合涉及复制大量数据。 For example, adding numbers is cheap whereas merging sets is expensive. 例如,增加数字很便宜,而合并集合很昂贵。
Locality 地区性
– The elephant in the room. –房间里的大象。 This is an important point which everyone may miss. 这是每个人都可能错过的重要一点。 You should consider cache misses, if a CPU waits for data because of cache misses then you wouldn't gain anything by parallelization. 您应该考虑缓存未命中,如果CPU由于缓存未命中而等待数据,那么并行化将不会带来任何好处。 That's why array-based sources parallelize the best as the next indices (near the current index) are cached and there are fewer chances that CPU would experience a cache miss. 这就是为什么在缓存下一个索引(当前索引附近)时,基于数组的源可以并行化最佳资源的原因,并且CPU遇到缓存未命中的可能性较小。
He also mentions a relatively simple formula to determine a chance of parallel speedup. 他还提到了确定并行加速机会的相对简单的公式。
NQ Model : NQ模型 :
N x Q > 10000
where, 哪里,
N = number of data items N =数据项数
Q = amount of work per item Q =每个项目的工作量
#6楼
Other answers have already covered profiling to avoid premature optimization and overhead cost in parallel processing. 其他答案已经涵盖了性能分析以避免并行处理中的过早优化和开销成本。 This answer explains the ideal choice of data structures for parallel streaming. 这个答案解释了并行流数据结构的理想选择。
As a rule, performance gains from parallelism are best on streams over
ArrayList
,HashMap
,HashSet
, andConcurrentHashMap
instances; 通常,在ArrayList
,HashMap
,HashSet
和ConcurrentHashMap
实例上的流上,并行性带来的性能提升最佳。 arrays; 数组;int
ranges;int
范围 andlong
ranges. 和long
。 What these data structures have in common is that they can all be accurately and cheaply split into subranges of any desired sizes, which makes it easy to divide work among parallel threads. 这些数据结构的共同之处在于,它们都可以准确而便宜地拆分为任意大小的子范围,这使得在并行线程之间轻松进行工作分配变得容易。 The abstraction used by the streams library to perform this task is the spliterator , which is returned by thespliterator
method onStream
andIterable
. 流库用来执行此任务的抽象是分离器,它由Stream
和Iterable
上的spliterator
方法返回。Another important factor that all of these data structures have in common is that they provide good-to-excellent locality of reference when processed sequentially: sequential element references are stored together in memory. 所有这些数据结构共有的另一个重要因素是,当按顺序处理它们时,它们提供了很好的引用局部性:顺序元素引用一起存储在内存中。 The objects referred to by those references may not be close to one another in memory, which reduces locality-of-reference. 这些引用所引用的对象在内存中可能彼此不接近,从而降低了引用的位置。 Locality-of-reference turns out to be critically important for parallelizing bulk operations: without it, threads spend much of their time idle, waiting for data to be transferred from memory into the processor's cache. 事实证明,引用位置对于并行化批量操作至关重要:没有它,线程将花费大量时间空闲,等待数据从内存传输到处理器的缓存中。 The data structures with the best locality of reference are primitive arrays because the data itself is stored contiguously in memory. 具有最佳引用位置的数据结构是原始数组,因为数据本身连续存储在内存中。
Source: Item #48 Use Caution When Making Streams Parallel, Effective Java 3e by Joshua Bloch 来源:Item#48并行使用流时要小心,有效的Java 3e by Joshua Bloch