Java 8 Stream versus For-Loop Performance


Published: 2019-03-03
Updated: 2019-03-21
Web: https://fritzthecat-blog.blogspot.com/2019/03/java-8-stream-versus-for-loop.html


I couldn't believe that Java 8 streams are slower than for-loops when not running in parallel on multiple cores, but it's true, read this article by Angelika Langer.

→ If you use sequential streams then you don’t do it for performance reasons; you do it because you like the functional programming style.

I myself couldn't even verify that streams are faster than for-loops when running on multiple cores in parallel, not even with Java 11.

In this Blog I will present a performance test that compares a for-loop with a stream implementation. You need several test runs (first time is slowest!), different surrounding conditions, and an average calculation to get a realistic view on the performance of your application. Finally I will run the test also with parallel streams on 4 cores and present the results.

Mind that one CPU can have several cores. In Java you call Runtime.getRuntime().availableProcessors() to find out the number of cores, on Ubuntu-LINUX you can see it in System-Monitor on the "Resources" tab, on WINDOWS in Task-Manager on the "Performance" tab.

Test Source

Here is the main() runner with initial test parameters:

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;

public class StreamPerformance
{
public static void main(String[] args) {
System.out.println("Java: "+System.getProperty("java.version"));

StreamPerformance streamPerformance = new StreamPerformance(5000000);
streamPerformance.performTests(40);
}

.....
}

This is the test class. It will iterate 5 million random test-numbers and find the maximum. I want 40 executions of this, taking into account also the warm-up phase.

All following source goes where the "....." is.

    private final long[] longArray;
private final Collection<Long> longList;
private Long firstMax;

public StreamPerformance(int testDataLength) {
longArray = createTestArray(testDataLength);
longList = createTestList(longArray);
}

private long[] createTestArray(int testDataLength) {
final long[] longArray = new long[testDataLength];
for (int i = 0; i < testDataLength; i++)
longArray[i] = (long) (Math.random() * 1000000d);
return longArray;
}

private Collection<Long> createTestList(long[] longArray) {
final Collection<Long> longList = new ArrayList<Long>(longArray.length);
for (int i = 0; i < longArray.length; i++)
longList.add(Long.valueOf(longArray[i]));
return longList;
}

The StreamPerformance constructor sets up the test data as array and Collection. The collection is a clone of the array. It takes part in the test because its iteration is slower than that of the array (different surrounding conditions).

    public void performTests(final int numberOfTests) {
long sumStreamWithArray = 0L;
long sumForLoopWithArray = 0L;
long sumStreamWithList = 0L;
long sumForLoopWithList = 0L;

for (int testNumber = 0; testNumber < numberOfTests; testNumber++) {
final boolean doForLoop = (testNumber % 2 == 0);
final boolean doArray = ((testNumber / 2) % 2 == 0);

long millis = performTest(doForLoop, doArray);

if (doForLoop)
if (doArray)
sumForLoopWithArray += millis;
else
sumForLoopWithList += millis;
else
if (doArray)
sumStreamWithArray += millis;
else
sumStreamWithList += millis;

System.out.println(testNumber+": "+(doForLoop ? "ForLoop" : "Stream")+" maximum in "+(doArray ? "array" : "list")+" found in "+millis+" millis.");
}

System.out.println("ForLoop with array millis: "+sumForLoopWithArray);
System.out.println("Stream with array millis: "+sumStreamWithArray);
System.out.println("ForLoop with list millis: "+sumForLoopWithList);
System.out.println("Stream with list millis: "+sumStreamWithList);

final long forLoopSum = sumForLoopWithArray + sumForLoopWithList;
final long streamSum = sumStreamWithArray + sumStreamWithList;
System.out.println("ForLoop millis: "+forLoopSum);
System.out.println("Stream millis: "+streamSum);

System.out.println("ForLoop was "+Math.round((double) streamSum / (double) forLoopSum)+" times faster than Stream.");
}

This is still test management, not the maximum retrieval loop. The performTests() method executes the test as many times as its parameter demands, but it will switch between array and collection, and it will switch between stream and for-loop in a way that every second run will be done with stream. Finally the collected sums are written to stdout, first the results of the 4 test modes, then a general result for stream and for-loop, and last it will tell how many times the for-loop was faster.

    private long performTest(boolean doForLoop, boolean doArray)    {
final long startValue = Long.MIN_VALUE;
final long before = System.currentTimeMillis();
final Long max;
if (doForLoop)
max = performForLoopTest(doArray, startValue);
else
max = performStreamTest(doArray, startValue);

if (firstMax == null)
firstMax = max;
else if (firstMax.equals(max) == false) // assert maximum
throw new IllegalStateException("Maximum "+max+" was not found correctly, should be "+firstMax);

return (System.currentTimeMillis() - before);
}

private Long performForLoopTest(boolean doArray, final long startValue) {
if (doArray) {
long max = startValue;
for (int i = 0; i < longArray.length; i++) {
long l = longArray[i];
if (l > max)
max = l;
}
return Long.valueOf(max);
}

Long max = Long.valueOf(startValue);
for (Long l : longList)
if (l > max)
max = l;
return max;
}

private Long performStreamTest(boolean doArray, final long startValue) {
if (doArray) {
long max = Arrays.stream(longArray).reduce(startValue, Math::max);
return Long.valueOf(max);
}

Long max = longList.stream().reduce(Long.valueOf(startValue), Math::max);
return max;
}

These three methods perform one single test run, i.e. performTest() method will do exactly one iteration to find a maximum. It receives the test-mode in parameters doForLoop and doArray. According to that mode it calls either the stream test or the for-loop test. It takes the execution time as milliseconds and returns it to the caller. Mind that this also asserts that all maximum retrievals yield the same result.

The performForLoopTest() method executes a for-loop, either iterating the array or the collection.
The performStreamTest() does the same but using streams.
Both return the found maximum to the caller for assertion.

Click here to see the entire test-source for copy & paste.

  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
package streams;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;

/**
* Check whether streams outperform for-loops.
* See also https://jaxenter.com/java-performance-tutorial-how-fast-are-the-java-8-streams-118830.html
*/
public class StreamPerformance
{
public static void main(String[] args) {
System.out.println("Java: "+System.getProperty("java.version"));

StreamPerformance streamPerformance = new StreamPerformance(5000000);
streamPerformance.performTests(40);
}


private final long[] longArray;
private final Collection<Long> longList;
private Long firstMax;

public StreamPerformance(int testDataLength) {
longArray = createTestArray(testDataLength);
longList = createTestList(longArray);
}

private long[] createTestArray(int testDataLength) {
final long[] longArray = new long[testDataLength];
for (int i = 0; i < testDataLength; i++)
longArray[i] = (long) (Math.random() * 1000000d);
return longArray;
}

private Collection<Long> createTestList(long[] longArray) {
final Collection<Long> longList = new ArrayList<Long>(longArray.length);
for (int i = 0; i < longArray.length; i++)
longList.add(Long.valueOf(longArray[i]));
return longList;
}

public void performTests(final int numberOfTests) {
long sumStreamWithArray = 0L;
long sumForLoopWithArray = 0L;
long sumStreamWithList = 0L;
long sumForLoopWithList = 0L;

for (int testNumber = 0; testNumber < numberOfTests; testNumber++) {
final boolean doForLoop = (testNumber % 2 == 0);
final boolean doArray = ((testNumber / 2) % 2 == 0);

long millis = performTest(doForLoop, doArray);

if (doForLoop)
if (doArray)
sumForLoopWithArray += millis;
else
sumForLoopWithList += millis;
else
if (doArray)
sumStreamWithArray += millis;
else
sumStreamWithList += millis;

//System.out.println(testNumber+": "+(doForLoop ? "ForLoop" : "Stream")+" maximum in "+(doArray ? "array" : "list")+" found in "+millis+" millis.");
}

System.out.println("ForLoop with array millis: "+sumForLoopWithArray);
System.out.println("Stream with array millis: "+sumStreamWithArray);
System.out.println("ForLoop with list millis: "+sumForLoopWithList);
System.out.println("Stream with list millis: "+sumStreamWithList);

final long forLoopSum = sumForLoopWithArray + sumForLoopWithList;
final long streamSum = sumStreamWithArray + sumStreamWithList;
System.out.println("ForLoop millis: "+forLoopSum);
System.out.println("Stream millis: "+streamSum);

System.out.println("ForLoop was "+Math.round((double) streamSum / (double) forLoopSum)+" times faster than Stream.");
}

private long performTest(boolean doForLoop, boolean doArray) {
final long startValue = Long.MIN_VALUE;
final long before = System.currentTimeMillis();
final Long max;
if (doForLoop)
max = performForLoopTest(doArray, startValue);
else
max = performStreamTest(doArray, startValue);

if (firstMax == null)
firstMax = max;
else if (firstMax.equals(max) == false) // assert maximum
throw new IllegalStateException("Maximum "+max+" was not found correctly, should be "+firstMax);

return (System.currentTimeMillis() - before);
}

private Long performForLoopTest(boolean doArray, final long startValue) {
if (doArray) {
long max = startValue;
for (int i = 0; i < longArray.length; i++) {
long l = longArray[i];
if (l > max)
max = l;
}
return Long.valueOf(max);
}

Long max = Long.valueOf(startValue);
for (Long l : longList)
if (l > max)
max = l;
return max;
}

private Long performStreamTest(boolean doArray, final long startValue) {
if (doArray) {
long max = Arrays.stream(longArray).reduce(startValue, Math::max);
return Long.valueOf(max);
}

Long max = longList.stream().reduce(Long.valueOf(startValue), Math::max);
return max;
}

}

So far the test logic. Let's see what it did on my machine.

Sequential Result

You could run this test now with Java 8 or Java 11. Here is the result of Java 8:


Java: 1.8.0_25
ForLoop with array millis: 58
Stream with array millis: 101
ForLoop with list millis: 185
Stream with list millis: 862
ForLoop millis: 243
Stream millis: 963
ForLoop was 4 times faster than Stream.

Now such results could depend on compiler or JVM optimizations. Let's see what Java 11 does (don't forget to also recompile with Java 11!):


Java: 11.0.2
ForLoop with array millis: 61
Stream with array millis: 126
ForLoop with list millis: 250
Stream with list millis: 662
ForLoop millis: 311
Stream millis: 788
ForLoop was 3 times faster than Stream.

Got better. But not good enough to blindly rewrite everything to use streams. Maybe it would make sense when using parallel(), but this has some subtle implications which could cause hard-to-detect bugs. You will have to rethink every single case before rewriting.

Parallel Test

For this test, rewriting is easy. You just need to change the perfomStreamTest() method to the following:

    private Long performStreamTest(boolean doArray, final long startValue) {
if (doArray) {
long max = Arrays.stream(longArray).parallel().reduce(startValue, Math::max);
return Long.valueOf(max);
}

Long max = longList.stream().parallel().reduce(Long.valueOf(startValue), Math::max);
return max;
}

The sequential stream can easily be turned into a parallel one, just by appending a parallel() call. For convenience you could also use parallelStream() instead of stream().

Here are the results of parallel processing on my 4-core machine:


Java: 1.8.0_25
ForLoop with array millis: 50
Stream with array millis: 113
ForLoop with list millis: 144
Stream with list millis: 703
ForLoop millis: 194
Stream millis: 816
ForLoop was 4 times faster than Stream.

Java: 11.0.2
ForLoop with array millis: 72
Stream with array millis: 124
ForLoop with list millis: 350
Stream with list millis: 638
ForLoop millis: 422
Stream millis: 762
ForLoop was 2 times faster than Stream.

I expected more. With 4 cores, parallel should be about 3 times faster than the sequential. Instead parallel stream improved just by 30%, and that only with Java 11, with Java 8 it's obviously at the same speed as its sequential counterpart.





ɔ⃝ Fritz Ritzberger, 2019-03-03