使用Java 8和lambdas,可以很容易地将集合作为流迭代,也可以很容易地使用并行流。文档中的两个例子,第二个使用parallelStream:

myShapesCollection.stream()
    .filter(e -> e.getColor() == Color.RED)
    .forEach(e -> System.out.println(e.getName()));

myShapesCollection.parallelStream() // <-- This one uses parallel
    .filter(e -> e.getColor() == Color.RED)
    .forEach(e -> System.out.println(e.getName()));

只要我不关心顺序,使用并行运算总是有益的吗?有人会认为将工作分配到更多的核心上更快。

还有其他考虑吗?什么时候应该使用平行流,什么时候应该使用非平行流?

(问这个问题是为了引发关于如何以及何时使用并行流的讨论,而不是因为我认为总是使用它们是一个好主意。)


当前回答

与顺序流相比,并行流具有更高的开销。协调线程需要大量的时间。默认情况下,我将使用顺序流,只有在以下情况下才考虑并行流

我有大量的项目要处理(或者每个项目的处理都需要时间,并且是并行的) 首先我的表现有问题 我还没有在多线程环境中运行进程(例如:在web容器中,如果我已经有许多请求并行处理,在每个请求中添加一个额外的并行层可能会产生更多的负面影响而不是积极影响)

在您的示例中,性能无论如何都是由对System.out.println()的同步访问驱动的,并且使此进程并行将没有任何影响,甚至是负面影响。

此外,请记住并行流并不能神奇地解决所有的同步问题。如果进程中使用的谓词和函数使用共享资源,则必须确保所有内容都是线程安全的。特别是,如果你平行进行,副作用是你必须担心的事情。

在任何情况下,测量,不要猜测!只有测量才能告诉您并行性是否值得。

其他回答

其他答案已经涵盖了分析,以避免过早优化和并行处理中的开销成本。这个答案解释了并行流数据结构的理想选择。

As a rule, performance gains from parallelism are best on streams over ArrayList , HashMap , HashSet , and ConcurrentHashMap instances; arrays; int ranges; and long ranges. What these data structures have in common is that they can all be accurately and cheaply split into subranges of any desired sizes, which makes it easy to divide work among parallel threads. The abstraction used by the streams library to perform this task is the spliterator , which is returned by the spliterator method on Stream and Iterable. Another important factor that all of these data structures have in common is that they provide good-to-excellent locality of reference when processed sequentially: sequential element references are stored together in memory. The objects referred to by those references may not be close to one another in memory, which reduces locality-of-reference. Locality-of-reference turns out to be critically important for parallelizing bulk operations: without it, threads spend much of their time idle, waiting for data to be transferred from memory into the processor’s cache. The data structures with the best locality of reference are primitive arrays because the data itself is stored contiguously in memory.

来源:Joshua Bloch所著的有效Java 3e,在使流并行时要小心

永远不要让一个无限的流与一个极限并行。事情是这样的:

    public static void main(String[] args) {
        // let's count to 1 in parallel
        System.out.println(
            IntStream.iterate(0, i -> i + 1)
                .parallel()
                .skip(1)
                .findFirst()
                .getAsInt());
    }

结果

    Exception in thread "main" java.lang.OutOfMemoryError
        at ...
        at java.base/java.util.stream.IntPipeline.findFirst(IntPipeline.java:528)
        at InfiniteTest.main(InfiniteTest.java:24)
    Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.stream.SpinedBuffer$OfInt.newArray(SpinedBuffer.java:750)
        at ...

如果你使用.limit(…)

解释: 在Java 8中,在流中使用.parallel会导致OOM错误

类似地,如果流是有序的并且有比你想要处理的更多的元素,不要使用parallel。

public static void main(String[] args) {
    // let's count to 1 in parallel
    System.out.println(
            IntStream.range(1, 1000_000_000)
                    .parallel()
                    .skip(100)
                    .findFirst()
                    .getAsInt());
}

这可能会运行更长的时间,因为并行线程可能工作在大量的数字范围上,而不是关键的0-100,这将花费很长时间。

Collection.parallelStream() is a great way to do work in parallel. However you need to keep in mind that this effectively uses a common thread pool with only a few worker threads internally (number of threads equals to the number of cpu cores by default), see ForkJoinPool.commonPool(). If some of pool's tasks are a long-running I/O-bound work then others, potentially fast, parallelStream calls will get stuck waiting for the free pool threads. This obviously leads to a requirement of fork-join tasks being non-blocking and short or, in other words, cpu-bound. For better understanding of details I strongly recommend careful reading of java.util.concurrent.ForkJoinTask javadoc, here are some relevant quotes:

ForkJoinTasks的效率源于…它们主要用作计算任务,计算纯函数或操作纯孤立的对象。

理想情况下,计算应避免同步方法或块,并应尽量减少其他阻塞同步

可细分的任务也不应该执行阻塞I/O

这表明parallelStream()任务的主要目的是在隔离的内存结构上进行简短计算。也建议查看文章常见的并行流陷阱

我看了Brian Goetz (Java语言架构师& Lambda表达式规范负责人)的一次演讲。他详细解释了在进行并行化之前需要考虑的4点:

Splitting / decomposition costs – Sometimes splitting is more expensive than just doing the work! Task dispatch / management costs – Can do a lot of work in the time it takes to hand work to another thread. Result combination costs – Sometimes combination involves copying lots of data. For example, adding numbers is cheap whereas merging sets is expensive. Locality – The elephant in the room. This is an important point which everyone may miss. You should consider cache misses, if a CPU waits for data because of cache misses then you wouldn't gain anything by parallelization. That's why array-based sources parallelize the best as the next indices (near the current index) are cached and there are fewer chances that CPU would experience a cache miss.

他还提到了一个相对简单的公式来确定并行加速的机会。

NQ模型:

N x Q > 10000

在那里, N =数据项个数 Q =每一项的工作量

与顺序流相比,并行流具有更高的开销。协调线程需要大量的时间。默认情况下,我将使用顺序流,只有在以下情况下才考虑并行流

我有大量的项目要处理(或者每个项目的处理都需要时间,并且是并行的) 首先我的表现有问题 我还没有在多线程环境中运行进程(例如:在web容器中,如果我已经有许多请求并行处理,在每个请求中添加一个额外的并行层可能会产生更多的负面影响而不是积极影响)

在您的示例中,性能无论如何都是由对System.out.println()的同步访问驱动的,并且使此进程并行将没有任何影响,甚至是负面影响。

此外,请记住并行流并不能神奇地解决所有的同步问题。如果进程中使用的谓词和函数使用共享资源,则必须确保所有内容都是线程安全的。特别是,如果你平行进行,副作用是你必须担心的事情。

在任何情况下,测量,不要猜测!只有测量才能告诉您并行性是否值得。