我有一个列表myListToParse,我想在其中过滤元素并对每个元素应用一个方法,并将结果添加到另一个列表myFinalList中。

在Java 8中,我注意到我可以用两种不同的方式来做到这一点。我想知道他们之间更有效的方式,并了解为什么一种方式比另一种更好。

我愿意接受任何关于第三条路的建议。

方法1:

myFinalList = new ArrayList<>();
myListToParse.stream()
        .filter(elt -> elt != null)
        .forEach(elt -> myFinalList.add(doSomething(elt)));

方法2:

myFinalList = myListToParse.stream()
        .filter(elt -> elt != null)
        .map(elt -> doSomething(elt))
        .collect(Collectors.toList()); 

当前回答

还有第三种选择-使用stream(). toarray() -请参阅“为什么stream没有toList方法”下的注释。它比forEach()或collect()慢,表达能力也更差。它可能会在以后的JDK构建中得到优化,所以在这里添加它以防万一。

假设列表<字符串>

    myFinalList = Arrays.asList(
            myListToParse.stream()
                    .filter(Objects::nonNull)
                    .map(this::doSomething)
                    .toArray(String[]::new)
    );

在doSomething()中有一个微型基准测试,1M个条目,20%的空值和简单的转换

private LongSummaryStatistics benchmark(final String testName, final Runnable methodToTest, int samples) {
    long[] timing = new long[samples];
    for (int i = 0; i < samples; i++) {
        long start = System.currentTimeMillis();
        methodToTest.run();
        timing[i] = System.currentTimeMillis() - start;
    }
    final LongSummaryStatistics stats = Arrays.stream(timing).summaryStatistics();
    System.out.println(testName + ": " + stats);
    return stats;
}

结果如下

并行:

toArray: LongSummaryStatistics{count=10, sum=3721, min=321, average=372,100000, max=535}
forEach: LongSummaryStatistics{count=10, sum=3502, min=249, average=350,200000, max=389}
collect: LongSummaryStatistics{count=10, sum=3325, min=265, average=332,500000, max=368}

顺序:

toArray: LongSummaryStatistics{count=10, sum=5493, min=517, average=549,300000, max=569}
forEach: LongSummaryStatistics{count=10, sum=5316, min=427, average=531,600000, max=571}
collect: LongSummaryStatistics{count=10, sum=5380, min=444, average=538,000000, max=557}

parallel没有空值和过滤器(所以流的大小): toArrays在这种情况下具有最好的性能,而.forEach()在接收数组列表上的"indexOutOfBounds"失败,必须替换为.forEachOrdered()

toArray: LongSummaryStatistics{count=100, sum=75566, min=707, average=755,660000, max=1107}
forEach: LongSummaryStatistics{count=100, sum=115802, min=992, average=1158,020000, max=1254}
collect: LongSummaryStatistics{count=100, sum=88415, min=732, average=884,150000, max=1014}

其他回答

如果使用Eclipse Collections,则可以使用collectIf()方法。

MutableList<Integer> source =
    Lists.mutable.with(1, null, 2, null, 3, null, 4, null, 5);

MutableList<String> result = source.collectIf(Objects::nonNull, String::valueOf);

Assert.assertEquals(Lists.immutable.with("1", "2", "3", "4", "5"), result);

它会快速计算,应该比使用流快一点。

注意:我是Eclipse Collections的提交者。

不要担心任何性能差异,在这种情况下,它们通常是最小的。

方法2更可取,因为

it doesn't require mutating a collection that exists outside the lambda expression. it's more readable because the different steps that are performed in the collection pipeline are written sequentially: first a filter operation, then a map operation, then collecting the result (for more info on the benefits of collection pipelines, see Martin Fowler's excellent article.) you can easily change the way values are collected by replacing the Collector that is used. In some cases you may need to write your own Collector, but then the benefit is that you can easily reuse that.

如果使用第三方库是可以的,cyclops-react定义了内置此功能的Lazy扩展集合。例如,我们可以简单地写

ListX myListToParse;

ListX myFinalList = myListToParse.filter(elt -> elt != null) .map(elt -> doSomething(elt));

myFinalList在第一次访问之前(在物化列表被缓存和重用之后)不会被求值。

[披露我是cyclops-react的主要开发者]

还有第三种选择-使用stream(). toarray() -请参阅“为什么stream没有toList方法”下的注释。它比forEach()或collect()慢,表达能力也更差。它可能会在以后的JDK构建中得到优化,所以在这里添加它以防万一。

假设列表<字符串>

    myFinalList = Arrays.asList(
            myListToParse.stream()
                    .filter(Objects::nonNull)
                    .map(this::doSomething)
                    .toArray(String[]::new)
    );

在doSomething()中有一个微型基准测试,1M个条目,20%的空值和简单的转换

private LongSummaryStatistics benchmark(final String testName, final Runnable methodToTest, int samples) {
    long[] timing = new long[samples];
    for (int i = 0; i < samples; i++) {
        long start = System.currentTimeMillis();
        methodToTest.run();
        timing[i] = System.currentTimeMillis() - start;
    }
    final LongSummaryStatistics stats = Arrays.stream(timing).summaryStatistics();
    System.out.println(testName + ": " + stats);
    return stats;
}

结果如下

并行:

toArray: LongSummaryStatistics{count=10, sum=3721, min=321, average=372,100000, max=535}
forEach: LongSummaryStatistics{count=10, sum=3502, min=249, average=350,200000, max=389}
collect: LongSummaryStatistics{count=10, sum=3325, min=265, average=332,500000, max=368}

顺序:

toArray: LongSummaryStatistics{count=10, sum=5493, min=517, average=549,300000, max=569}
forEach: LongSummaryStatistics{count=10, sum=5316, min=427, average=531,600000, max=571}
collect: LongSummaryStatistics{count=10, sum=5380, min=444, average=538,000000, max=557}

parallel没有空值和过滤器(所以流的大小): toArrays在这种情况下具有最好的性能,而.forEach()在接收数组列表上的"indexOutOfBounds"失败,必须替换为.forEachOrdered()

toArray: LongSummaryStatistics{count=100, sum=75566, min=707, average=755,660000, max=1107}
forEach: LongSummaryStatistics{count=100, sum=115802, min=992, average=1158,020000, max=1254}
collect: LongSummaryStatistics{count=100, sum=88415, min=732, average=884,150000, max=1014}

我更喜欢第二种方式。

当您使用第一种方法时,如果您决定使用并行流来提高性能,则无法控制forEach将元素添加到输出列表中的顺序。

当您使用toList时,Streams API将保留顺序,即使您使用并行流。