使用Java 8和lambdas,可以很容易地将集合作为流迭代,也可以很容易地使用并行流。文档中的两个例子,第二个使用parallelStream:

    .filter(e -> e.getColor() == Color.RED)
    .forEach(e -> System.out.println(e.getName()));

myShapesCollection.parallelStream() // <-- This one uses parallel
    .filter(e -> e.getColor() == Color.RED)
    .forEach(e -> System.out.println(e.getName()));






我有大量的项目要处理(或者每个项目的处理都需要时间,并且是并行的) 首先我的表现有问题 我还没有在多线程环境中运行进程(例如:在web容器中,如果我已经有许多请求并行处理,在每个请求中添加一个额外的并行层可能会产生更多的负面影响而不是积极影响)






As a rule, performance gains from parallelism are best on streams over ArrayList , HashMap , HashSet , and ConcurrentHashMap instances; arrays; int ranges; and long ranges. What these data structures have in common is that they can all be accurately and cheaply split into subranges of any desired sizes, which makes it easy to divide work among parallel threads. The abstraction used by the streams library to perform this task is the spliterator , which is returned by the spliterator method on Stream and Iterable. Another important factor that all of these data structures have in common is that they provide good-to-excellent locality of reference when processed sequentially: sequential element references are stored together in memory. The objects referred to by those references may not be close to one another in memory, which reduces locality-of-reference. Locality-of-reference turns out to be critically important for parallelizing bulk operations: without it, threads spend much of their time idle, waiting for data to be transferred from memory into the processor’s cache. The data structures with the best locality of reference are primitive arrays because the data itself is stored contiguously in memory.

来源:Joshua Bloch所著的有效Java 3e,在使流并行时要小心

我看了Brian Goetz (Java语言架构师& Lambda表达式规范负责人)的一次演讲。他详细解释了在进行并行化之前需要考虑的4点:

Splitting / decomposition costs – Sometimes splitting is more expensive than just doing the work! Task dispatch / management costs – Can do a lot of work in the time it takes to hand work to another thread. Result combination costs – Sometimes combination involves copying lots of data. For example, adding numbers is cheap whereas merging sets is expensive. Locality – The elephant in the room. This is an important point which everyone may miss. You should consider cache misses, if a CPU waits for data because of cache misses then you wouldn't gain anything by parallelization. That's why array-based sources parallelize the best as the next indices (near the current index) are cached and there are fewer chances that CPU would experience a cache miss.



N x Q > 10000

在那里, N =数据项个数 Q =每一项的工作量

Collection.parallelStream() is a great way to do work in parallel. However you need to keep in mind that this effectively uses a common thread pool with only a few worker threads internally (number of threads equals to the number of cpu cores by default), see ForkJoinPool.commonPool(). If some of pool's tasks are a long-running I/O-bound work then others, potentially fast, parallelStream calls will get stuck waiting for the free pool threads. This obviously leads to a requirement of fork-join tasks being non-blocking and short or, in other words, cpu-bound. For better understanding of details I strongly recommend careful reading of java.util.concurrent.ForkJoinTask javadoc, here are some relevant quotes:






    public static void main(String[] args) {
        // let's count to 1 in parallel
            IntStream.iterate(0, i -> i + 1)


    Exception in thread "main" java.lang.OutOfMemoryError
        at ...
        at java.base/java.util.stream.IntPipeline.findFirst(IntPipeline.java:528)
        at InfiniteTest.main(InfiniteTest.java:24)
    Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.stream.SpinedBuffer$OfInt.newArray(SpinedBuffer.java:750)
        at ...


解释: 在Java 8中,在流中使用.parallel会导致OOM错误


public static void main(String[] args) {
    // let's count to 1 in parallel
            IntStream.range(1, 1000_000_000)




First, note that parallelism offers no benefits other than the possibility of faster execution when more cores are available. A parallel execution will always involve more work than a sequential one, because in addition to solving the problem, it also has to perform dispatching and coordinating of sub-tasks. The hope is that you'll be able to get to the answer faster by breaking up the work across multiple processors; whether this actually happens depends on a lot of things, including the size of your data set, how much computation you are doing on each element, the nature of the computation (specifically, does the processing of one element interact with processing of others?), the number of processors available, and the number of other tasks competing for those processors.






