Lecture08 - nus-cs2030/2324-s2 GitHub Wiki
In this lecture, we focus on the stream programming model where we define Java Streams to perform iteration (or looping). Streams are lazily evaluated which enables us to write infinite streams. Streams also take advantage of multi-core processors by allowing computation to be done in parallel.
Let us start with a simple iteration problem:
shell> int sum = 0
sum ==> 0
jshell> for (int x = 1; x <= 10; x = x + 1) {
...> sum = sum + x;
...> }
jshell> sum
sum ==> 55
There are many state changes involved in this iterative code; one can count the number of assignments.
Compare this a stream version below.
Here we make use of an integer primitive stream IntStream
.
shell> int sum = IntStream.rangeClosed(1, 10).
...> sum()
sum ==> 55
Notice that other than assigning the resulting sum, there is no
further assignments, hence it is (side-)effect free.
One may argue that the computation of rangeClosed
and sum
are abstract away, however we can equivalently write an effect-free solution using the more general iterate
and reduce
operations with no need for further assignments.
jshell> int sum = IntStream.iterate(1, x -> x <= 10, x -> x + 1).
...> reduce(0, (x, y) -> x + y)
sum ==> 55
A stream represents a sequence of elements generated via a generator (or data source). These elements go through some processing via intermediate pipeline operations, and end up at a terminator with a result.
It is also interesting to note that the iterate
method suggests
taking in a seed value, followed by a predicate as the second
argument, and then a function. Since IntStream
only processes
integer elements, the predicate and the function are their primitive
equivalents, i.e. IntPredicate
of type (int -> boolean)
and
IntUnaryOperator
of type (int -> int)
.
At this point, you are advised to familiarize yourself with some of the operations of IntStream
in the Java API.
Take note of the generators (static
factory methods), the
intermediate pipeline operations (methods that return IntStream
)
and the terminators (methods that do not return IntStream
).
Most methods do not have side-effects (i.e. no mutable state), other
than IntStream.builder
and collect
. These will not be used in
our course. If you are interested, IntStream.builder
allows us to
build a stream by adding elements imperatively, and
collect
allows us to define an imperative reduction and construct a mutable collection out of the stream.
As Java is inherently OOP, each function type that a stream's
higher order methods (e.g. map
and filter
) take in is associated
with a name.
You have already seen IntUnaryOperator
which is a
mapping from integer to integer, hence (int -> int)
. Indeed when
using say, the map
method of IntStream
, one only needs to note the
behaviour of the function, i.e. taking in an integer input argument
and returning an integer argument, so as to enable us to effectively write
stream expressions:
jshell> IntStream.rangeClosed(1,10).map(x -> x * 2).filter(x -> x % 3 == 0).count() // 6, 12, 18
$.. ==> 3
You will, however, need to be more carefully when associating types to the functions, e.g.
jshell> Function<Integer,Integer> f = x -> x * 2
f ==> $Lambda$..
jshell> Predicate<Integer> p = x -> x % 3 == 0
p ==> $Lambda$..
jshell> IntStream.rangeClosed(1,10).map(f).filter(p).count()
| Error:
| incompatible types: java.util.function.Function<java.lang.Integer,java.lang.Integer> cannot be converted to java.util.function.IntUnaryOperator
| IntStream.rangeClosed(1,10).map(f).filter(p).count()
|
jshell> IntUnaryOperator f = x -> x * 2
f ==> $Lambda$..
jshell> IntPredicate p = x -> x % 3 == 0
p ==> $Lambda$..
jshell> IntStream.rangeClosed(1,10).map(f).filter(p).count()
$.. ==> 3
It is also interesting to compare our ImList
that we have
extended as a collection pipeline during one of our recitation
sessions and a Java stream.
ImList
is made up of a finite list of elements and strictly
evaluated; while Stream
is lazily evaluated. You can observe the
difference in behaviour below:
jshell> new ImList<Integer>(List.of(1, 2, 3)).
...> map(x -> { System.out.println(x.toString()); return x * 2;})
1
2
3
$.. ==> [2, 4, 6]
jshell> new ImList<Integer>(List.of(1, 2, 3)).
...> map(x -> { System.out.println(x.toString()); return x * 2;}).
...> reduce(0, (x,y) -> x + y)
1
2
3
$.. ==> 12
Both pipelines above will perform mapping regardless of whether
reduce
method terminates the pipeline.
Compare this with a solution using IntStream
jshell> IntStream.of(1, 2, 3).
...> map(x -> { System.out.println(x + ""); return x * 2;})
$.. ==> java.util.stream.IntPipeline$4@4bf558aa
jshell> IntStream.of(1, 2, 3).
...> map(x -> { System.out.println(x + ""); return x * 2;}).
...> reduce(0, (x,y) -> x + y)
1
2
3
$.. ==> 12
Notice that no operation is done on the elements in the absence of a terminal.
Another difference is that ImList
can be processed multiple times,
while streams can only be processed once.
jshell> ImList<Integer> list = new ImList<Integer>(List.of(1, 2, 3)).
...> map(x -> x * 2)
list ==> [2, 4, 6]
jshell> list.reduce(0, (x, y) -> x + y) // sum the elements
$.. ==> 12
jshell> list.reduce(0, (x, y) -> x + 1) // count the number of elements
$.. ==> 3
jshell> IntStream stream = IntStream.of(1, 2, 3).
...> map(x -> x * 2)
stream ==> java.util.stream.IntPipeline$4@497470ed
jshell> stream.reduce(0, (x, y) -> x + y)
$.. ==> 12
jshell> stream.reduce(0, (x, y) -> x + 1)
| Exception java.lang.IllegalStateException: stream has already been operated upon or closed
| at AbstractPipeline.evaluate (AbstractPipeline.java:229)
| at IntPipeline.reduce (IntPipeline.java:515)
| at (#26:1)
Here is an example of using streams to determine if a given integer is prime or otherwise:
jshell> boolean isPrime(int n) {
...> return n > 1 && IntStream.range(2, n).
...> noneMatch(x -> n % x == 0);
...> }
| created method isPrime(int)
jshell> isPrime(3)
$.. ==> true
jshell> isPrime(9)
$.. ==> false
jshell> isPrime(11)
$.. ==> true
An alternative stream implementation is given below that uses more general pipeline operations:
jshell> boolean isPrime(int n) { // alternative solution using filter and count
...> return n > 1 && IntStream.range(2, n).
...> filter(x -> n % x == 0).
...> reduce(0, (x, y) -> x + 1) == 0; // or count() == 0;
...> }
| modified method isPrime(int)
How do we make use of isPrime
to generate the first five hundred
primes? Doing this in an iterative way requires us to consciously
count the number of valid primes as the while
loop progresses.
jshell> int n = 2;
n ==> 2
jshell> int numOfPrimes = 0;
numOfPrimes ==> 0
jshell> while (numOfPrimes < 500) {
...> if (isPrime(n)) {
...> System.out.println(n);
...> numOfPrimes = numOfPrimes + 1;
...> }
...> n = n + 1;
...> }
2
3
5
:
3571
With streams, one can just declare to iterate successive values starting from 2, filter those values that are prime, and limit these to 500 values.
jshell> IntStream.iterate(2, x -> x + 1).
...> filter(x -> isPrime(x)).
...> limit(500).
...> forEach(x -> System.out.println(x))
2
3
5
:
3571
Indeed, iterate
suggests a limitless (or infinite) iteration of
elements; compare this with the three-argument version that we used
earlier. This is possible because streams are lazily evaluated.
One can construct an infinite stream in the following way
jshell> IntStream.iterate(1, x -> x + 1).
...> filter(x -> isPrime(x))
$.. ==> java.util.stream.IntPipeline$..
and no evaluation is done since there are no terminal operations; you cannot do that using imperative control flow.
However, if you do have a terminal (e.g. forEach
) then you will
need to limit
the number of elements first.
At times you may see that printing each element using
forEach
is written asforEach(System.out::println)
. This syntax uses method reference. You are advised not to use any method reference due to limitations to our grader.
As mentioned earlier, reduce
results in a final outcome by aggregating the
stream elements.
There are two forms of reduce
: the one-argument version and the
two-argument version.
The two-argument version is the usual one that requires a starting
integer value to be specified followed by an IntBinaryOperator
function
((int, int) -> int)
jshell> IntStream.rangeClosed(1, 10).
...> reduce(0, (x, y) -> x + y)
$.. ==> 55
On the other hand, the one-argument reduce
begins reduction from the
first element. If there is only one element, this value is returned
wrapped in an OptionalInt
(an Optional
that operates only on
integers; different from Optional<Integer>
). If there are no elements in the stream, OptionalInt.empty
is returned.
Otherwise the reduced result wrapped in OptionalInt
is returned.
jshell> IntStream.rangeClosed(1, 1).
...> reduce((x, y) -> x + y)
$.. ==> OptionalInt[1]
jshell> IntStream.rangeClosed(1, -1).
...> reduce((x, y) -> x + y)
$.. ==> OptionalInt.empty
jshell> IntStream.rangeClosed(1, 10).
...> reduce((x, y) -> x + y)
$.. ==> OptionalInt[55]
We have seen the use of stream for looping.
What about nested loops?
One can make use of flatMap
.
jshell> IntStream.rangeClosed(1, 3).
...> flatMap(x -> IntStream.rangeClosed(x, 3).map(y -> x * y)).
...> forEach(x -> System.out.print(x + " "))
1 2 3 4 6 9
The flatMap
operation takes in a function of the form (int -> IntStream)
.
What if we use map
instead?
It will result in a compilation error since map
expects the function of
the form (int -> int)
.
jshell> IntStream.rangeClosed(1, 3).
...> map(x -> IntStream.rangeClosed(x, 3).map(y -> x * y)).
...> forEach(x -> System.out.print(x + " "))
| Error:
| incompatible types: bad return type in lambda expression
| java.util.stream.IntStream cannot be converted to int
| map(x -> IntStream.rangeClosed(x, 3).map(y -> x * y)).
|
So far we have worked with the primitive stream IntStream
.
Java also provides a generic Stream<T>
type.
Here is the equivalent generic stream implementation for the code
above:
jshell> Stream.<Integer>of(1,2,3).
...> flatMap(x -> Stream.<Integer>iterate(x, y -> y <= 3, y -> y + 1).map(y -> x * y)).
...> forEach(x -> System.out.print(x + " "))
1 2 3 4 6 9
If we use map
instead you will notice that there is no longer a
compilation error.
jshell> Stream.<Integer>of(1,2,3).
...> map(x -> Stream.<Integer>iterate(x, y -> y <= 3, y -> y + 1).map(y -> x * y)).
...> forEach(x -> System.out.print(x + " "))
java.util.stream.ReferencePipeline$.. java.util.stream.ReferencePipeline$.. java.util.stream.ReferencePipeline$..
The above generates a stream of three streams!
This is because map takes in a Function<T,R>
where T
is bound to
Integer
and R
is bound to Stream<T>
; ReferencePipeline
is an
implementation of Stream<T>
that has been exposed!
Two noteworthy methods that allow you to convert from primitive to generic streams are
-
boxed()
which maps each primitive element to its wrapper type; -
mapToObj(..)
which maps each primitive element to any reference type.
We have discussed lazy evaluation in streams and their relation to infinite streams. Let us study the following pipeline closely.
jshell> Stream.<Integer>iterate(1, x -> x + 1).
...> map(x -> { System.out.println("map1: " + x); return x;}).
...> map(x -> { System.out.println("map2: " + x); return x;}).
...> limit(5).
...> toList()
map1: 1
map2: 1
map1: 2
map2: 2
map1: 3
map2: 3
map1: 4
map2: 4
map1: 5
map2: 5
$14 ==> [1, 2, 3, 4, 5]
We know that iterate
will generate an infinite stream and limit
will process only the first five stream elements.
Within the two map
operations, notice that it is not the case that all five elements go through the first map
operation before they all go through the second map
. Rather, each element does through two map
operations one after another.
Specifically, when a terminal is invoked (in this case toList()
),
a request for a value is initiated and passed upstream:
-
toList
signals to thelimit
operation for a value; -
limit
signals to the upstreammap
operation for a value (provided it has not reached the limit); -
map
requests its upstreammap
operation for a value; -
map
requests a value fromiterate
.
The iterate
generator then generates a value and passes the value
downstream for processing:
-
iterate
passes an element to downstreammap
; -
map
performs transformation and passes the result to its downstreammap
; -
map
performs transformation and passes the result tolimit
; -
limit
passes the value downstream, while taking note of the number of values that has passed through it; -
toList
adds the value to the resultant list.
This is the essence of lazy evaluation in streams.
Let us start our discussion with set comprehension from Math. Suppose we want to generate a set comprising pairs of integers where the first value of the pair ranges from 1 to 3, and the second value of the pair ranges from the first value to 3. The notation is
In python, we can generate the pair of values using list comprehension notation. For example,
>>> [ (x,y) for x in range(1, 4) for y in range(x, 4) ]
[(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]
>>> [ x * y for x in range(1, 4) for y in range(x, 4) ]
[1, 2, 3, 4, 6, 9]
How do we generate such a list in Java?
One can make use of Stream.Builder
to construct a finite stream
imperatively:
jshell> Stream.Builder<Pair<Integer,Integer>> builder = Stream.<Pair<Integer,Integer>>builder()
builder ==> java.util.stream.Streams$StreamBuilderImpl@..
jshell> for (int x = 1; x <= 3; x = x + 1) {
...> for (int y = x; y <= 3; y = y + 1) {
...> builder.accept(new Pair<Integer,Integer>(x, y)); // mutable!
...> }
...> }
jshell> Stream<Pair<Integer,Integer>> stream = builder.build()
stream ==> java.util.stream.ReferencePipeline$..
jshell> stream.toList()
$.. ==> [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]
Alternatively, we can make use of map
and flatMap
jshell> List.<Integer>of(1, 2, 3).stream().
...> flatMap(x -> Stream.<Integer>iterate(x, x -> x <= 3, x -> x + 1).map(y -> new Pair<Integer,Integer>(x, y))).
...> toList()
$.. ==> [(1, 1), (1, 2), (1, 3), (2, 2), (3, 3)]
With a given list comprehension notation, one can make use of a
translation scheme to generate the equivalent Java stream using
map
, flatMap
and filter
.
Here are some examples:
-
[ 2 * x | x <- [1,2,3] ]
where[1,2,3]
denotes a stream generator is equivalent to
Stream.of(1,2,3).map(x -> 2 * x)
-
[ x + y | x <- [1,2,3,4], y <- [1,2,3] ]
comprising two stream generators is equivalent to
Stream.of(1,2,3,4).flatMap(x -> Stream.of(1,2,3).map(y -> x + y))
What about [z + x + y | z <- [1,2], x <- [1,2,3,4], y <- [1,2,3]
?
Notice that this can be simplified to
Stream.of[1,2].flatMap(z -> [ z + x + y | x -> [1,2,3,4], y -> [1,2,3]])
and expanding the inner list comprehension gives
Stream.of(1,2).flatMap(z -> Stream.of(1,2,3,4).flatMap(x -> Stream.of(1,2,3).map(y -> z + x + y)))
Moreover, a list comprehension may also comprise of a generator followed by a test. An example in python would be
>>> [ (x,y) for x in range(1,4) if x % 2 == 1 for y in range(x,4) ]
[(1, 1), (1, 2), (1, 3), (3, 3)]
with the additional condition that x
values generated must be odd.
Such a list comprehension, denoted [ (x, y) | x <- [1,2,3], odd(x), y <- [x,..,3]]
is equivalent to
jshell> Stream.of(1,2,3).filter(x -> x % 2 == 1).
...> flatMap(x -> Stream.iterate(x, y -> y <= 3, y -> y + 1).
...> map(y -> new Pair<Integer,Integer>(x,y))).
...> toList()
$.. ==> [(1, 1), (1, 2), (1, 3), (3, 3)]
From the examples above, we can devise a translation scheme as follows:
-
[ e | i <- str] <-> str.map(i -> e)
wherestr
is a stream generator, ande
is an expression over stream elementsi
-
[ e | i <- str1, j <-str2, ..E..] <-> str1.flatMap(i -> [ e | j <- str2, ..E..])
with stream generatorsstr1
,str2
,..E..
, ande
is an expression overi
,j
and possibly elements from other generators..E..
-
[ e | i <- str, test, ..E.. ] <-> [ e | i <- filter(str,test), ..E..]
wherefilter(str,test)
denotes a stream generator resulting from the application oftest
on elements generated fromstr
;filter(str,test)
will be equivalent tostr.filter(i -> test)
To ensure the correct execution of streams, one must obey the following usage rules:
-
Stream operations must not interfere with stream data. As long as we keep to our discipline of effect-free coding, this will not be an issue.
-
Stream operations should preferably be stateless. In other words, how an element is processed should not be dependent on neighbouring elements.
The latter is especially important when streams are processed in
parallel using the parallel()
operator.
Let us use the previous example of generating the first ten prime
numbers.
jshell> IntStream.iterate(2, x -> x + 1).
...> filter(x -> isPrime(x)).
...> limit(10).
...> peek(x -> System.out.println(x)).
...> forEach(x -> {})
2
3
5
7
11
13
17
19
23
29
Rather than output at the forEach
terminal, we make use of peek
to output the elements as soon as it reaches the operation. Since the
stream here is sequential, we would expect the stream elements to be
processed one by one starting from 2.
Now we make the stream parallel.
jshell> IntStream.iterate(2, x -> x + 1).
...> parallel().
...> filter(x -> isPrime(x)).
...> limit(10).
...> peek(x -> System.out.println(x)).
...> forEach(x -> {})
17
23
19
11
13
2
3
5
7
29
Notice now that the stream elements are no processed in order as
multiple processors are available to process different parts of the
stream.
It should be noted that the elements still fall between 2 and
29. This is because limit
is stateful.
Try to perform the peek
before limit
and observe the output.
In addition, one needs to be mindful that performing a reduction on a
parallel stream requires that the reduction be associative.
For example, summing values is associative, i.e. ((1 + 2) + 3)
is the
same as (1 + (2 + 3))
.
Here is summation using a sequential stream.
jshell> IntStream.iterate(2, x -> x + 1).
...> limit(10).
...> reduce(0, (x, y) -> { System.out.println("Adding " + x + " and " + y); return x + y;})
Adding 0 and 2
Adding 2 and 3
Adding 5 and 4
Adding 9 and 5
Adding 14 and 6
Adding 20 and 7
Adding 27 and 8
Adding 35 and 9
Adding 44 and 10
Adding 54 and 11
$.. ==> 65
Here is the output when summing a parallel stream
jshell> IntStream.iterate(2, x -> x + 1).
...> parallel().
...> limit(10).
...> reduce(0, (x, y) -> { System.out.println("Adding " + x + " and " + y); return x + y;})
Adding 0 and 10
Adding 0 and 5
Adding 0 and 9
Adding 0 and 6
Adding 0 and 8
Adding 0 and 3
Adding 0 and 7
Adding 7 and 8
Adding 0 and 4
Adding 5 and 6
Adding 4 and 11
Adding 0 and 2
Adding 2 and 3
Adding 5 and 15
Adding 0 and 11
Adding 10 and 11
Adding 9 and 21
Adding 15 and 30
Adding 20 and 45
Adding 65 and 0
$.. ==> 65
No matter how summing proceeds, we are guaranteed that the final
result will always be the same.
It is also interesting to note that 0
is added to more
than one stream element, which provides further evidence that several
processors starts reduction at the same time with the same starting value.
One has to be mindful that the the starting value provided will not
give a wrong result when reduced in parallel.
Now contrast addition with division which is a non-associative operation. First the sequential version.
jshell> DoubleStream.iterate(1.0, x -> x + 1).
...> limit(4).
...> reduce(24.0, (x, y) -> { System.out.println("Dividing " + x + " by " + y); return x / y;})
Dividing 24.0 by 1.0
Dividing 24.0 by 2.0
Dividing 12.0 by 3.0
Dividing 4.0 by 4.0
$.. ==> 1.0
Note that `((((24.0/1.0)/2.0)/3.0)/4.0)` gives 1.0. What if we
parallelize the stream?
jshell> DoubleStream.iterate(1.0, x -> x + 1).
...> limit(4).parallel().
...> reduce(24.0, (x, y) -> { System.out.println("Dividing " + x + " by " + y); return x / y;})
Dividing 24.0 by 3.0
Dividing 24.0 by 4.0
Dividing 8.0 by 6.0
Dividing 24.0 by 2.0
Dividing 24.0 by 1.0
Dividing 24.0 by 12.0
Dividing 2.0 by 1.3333333333333333
Dividing 1.5 by 24.0
$.. ==> 0.0625
The result is no longer correct!
Even though parallelizing a stream using a multi-core processor would suggest a linear speedup in computation, always keep in mind that there is an overhead in managing parallel tasks. As such do not parallelize trivial tasks.