The slides of my talk at Devoxx BE 2017. This in depth talk is all about collectors: those available, because we need to know them, those that we can create, those we had no idea they could be created, and the others, as there is in fact no limit to what can be done with this API. The concept of downstream collector will be used to show how we can write entire data processing pipelines using collectors only, and pass them as parameters to other pipelines.
4. Collectors?
Why should we be interested in collectors?
▪ They are part of the Stream API
▪ And kind of left aside…
And it’s a pity because it is a very powerful API
10. Collectors?
Why should we be interested in collectors?
▪ They are part of the Stream API
▪ And kind of left aside…
And it’s a pity because it is a very powerful API
▪ Even if we can also write unreadable code with it!
11. Agenda
Quick overview about streams
About collectors
Extending existing collectors
Making a collector readable
Creating new collectors
Composing Collectors
13. About Streams
A Stream:
▪ Is an object that connects to a source
▪ Has intermediate & terminal operations
▪ Some of the terminal operations can be collectors
▪ A collector can take more collectors as
parameters
14. A Stream is…
An object that connects to a source of data and
watch them flow
There is no data « in » a stream ≠ collection
stream
15. About Streams
On a stream:
▪ Any operation can be modeled with a collector
▪ Why is it interesting?
stream.collect(collector);
26. Limit and Skip
Two methods that rely on the order of the
elements:
- Limit = keeps the n first elements
- Skip = skips the n first elements
Needs to keep track of the index of the
elements and to process them in order
28. Intermediate vs Terminal
Only a terminal operation triggers the
consuming of the data from the source
movies.stream()
.filter(movie -> movie.releaseYear() == 2007)
.flatMap(movie -> movie.actors().stream())
.map(movie -> movie.getTitle());
29. Intermediate vs Terminal
Only a terminal operation triggers the
consuming of the data from the source
movies.stream()
.filter(movie -> movie.releaseYear() == 2007)
.flatMap(movie -> movie.actors().stream())
.map(movie -> movie.getTitle())
.forEach(movie -> System.out.println(movie.getTitle()));
33. Terminal Operations
Second Batch:
- allMatch
- anyMatch
- noneMatch
- findFirst
- findAny
Do not need to consume
all the data = short-circuit
operations
34. Terminal Operations
Special cases:
- max
- min
- reduce
Returns an Optional (to handle empty streams)
https://www.youtube.com/watch?v=Ej0sss6cq14@StuartMarks
35. A First Collector
And then there is collect!
The most seen:
Takes a collector as a parameter
List<String> result =
strings.stream()
.filter(s -> s.itEmpty())
.collect(Collectors.toList());
36. A First Collector (bis)
And then there is collect!
The most seen:
Takes a collector as a parameter
Set<String> result =
strings.stream()
.filter(s -> s.itEmpty())
.collect(Collectors.toSet());
37. A Second Collector
And then there is collect!
Maybe less known?:
Takes a collector as a parameter
String authors =
authors.stream()
.map(Author::getName)
.collect(Collectors.joining(", "));
39. A Third Collector
Creating a Map
Map<Integer, List<String>> result =
strings.stream()
.filter(s -> !s.isEmpty())
.collect(
Collectors.groupingBy(
s -> s.length()
)
);
40. 3
4
5
one, two, three, four, five, six, seven, eight, nine, ten
one, two, six, ten
four, five, nine
three, seven, eight
groupingBy(String::length)
Map<Integer, List<String>>
43. A Third Collector (bis)
Creating a Map
Map<Integer, Long> result =
strings.stream()
.filter(s -> s.itEmpty())
.collect(
Collectors.groupingBy(
s -> s.length(), Collectors.counting()
)
);
50. Creating Lists
A closer look at that code:
List<String> result =
strings.stream()
.filter(s -> !s.isEmpty())
.collect(Collectors.toList());
51. stream a b b
collector
1) Build the list
2) Add elements one
by one
a b c
ArrayList
52. Creating Lists
1) Building the list: supplier
2) Adding an element to that list: accumulator
Supplier<List> supplier = () -> new ArrayList();
BiConsumer<List<E>, E> accumulator = (list, e) -> list.add(e);
54. Creating Lists
1) Building the list: supplier
2) Adding an element to that list: accumulator
3) Combining two lists
Supplier<List> supplier = ArrayList::new;
BiConsumer<List<E>, E> accumulator = List::add;
BiConsumer<List<E>, List<E>> combiner = List::addAll;
55. Creating Lists
So we have:
List<String> result =
strings.stream()
.filter(s -> !s.isEmpty())
.collect(ArrayList::new,
List::add,
List::adAll);
56. Creating Lists
So we have:
List<String> result =
strings.stream()
.filter(s -> !s.isEmpty())
.collect(ArrayList::new,
Collection::add,
Collection::adAll);
57. Creating Sets
Almost the same:
Set<String> result =
strings.stream()
.filter(s -> !s.isEmpty())
.collect(HashSet::new,
Collection::add,
Collection::adAll);
58. String Concatenation
Now we need to create a String by
concatenating the elements using a separator:
« one, two, six »
Works with Streams of Strings
59. String Concatenation
Let us collect
strings.stream()
.filter(s -> s.length() == 3)
.collect(() -> new String(),
(finalString, s) -> finalString.concat(s),
(s1, s2) -> s1.concat(s2));
60. String Concatenation
Let us collect
strings.stream()
.filter(s -> s.length() == 3)
.collect(() -> new String(),
(finalString, s) -> finalString.concat(s),
(s1, s2) -> s1.concat(s2));
61. String Concatenation
Let us collect
strings.stream()
.filter(s -> s.length() == 3)
.collect(() -> new StringBuilder(),
(sb, s) -> sb.append(s),
(sb1, sb2) -> sb1.append(sb2));
62. String Concatenation
Let us collect
strings.stream()
.filter(s -> s.length() == 3)
.collect(StringBuilder::new,
StringBuilder::append,
StringBuilder::append);
63. String Concatenation
Let us collect
StringBuilder stringBuilder =
strings.stream()
.filter(s -> s.length() == 3)
.collect(StringBuilder::new,
StringBuilder::append,
StringBuilder::append);
64. String Concatenation
Let us collect
String string =
strings.stream()
.filter(s -> s.length() == 3)
.collect(StringBuilder::new,
StringBuilder::append,
StringBuilder::append)
.toString();
65. A Collector is…
3 Operations
- Supplier: creates the mutable container
- Accumulator
- Combiner
66. A Collector is…
3 + 1 Operations
- Supplier: creates the mutable container
- Accumulator
- Combiner
- Finisher, that can be the identity function
67. Collecting and Then
And we have a collector for that!
strings.stream()
.filter(s -> s.length() == 3)
.collect(
Collectors.collectingAndThen(
collector,
finisher // Function
)
);
73. Collect toMap
Useful for remapping maps
Do not generate duplicate keys!
map.entrySet().stream()
.collect(
Collectors.toMap(
entry -> entry.getKey(),
entry -> // create a new value
)
);
76. The Collector Interface
public interface Collector<T, A, R> {
public Supplier<A> supplier(); // A: mutable container
public BiConsumer<A, T> accumulator(); // T: processed elments
public BinaryOperator<A> combiner(); // Often the type returned
public Function<A, R> finisher(); // Final touch
}
77. The Collector Interface
public interface Collector<T, A, R> {
public Supplier<A> supplier(); // A: mutable container
public BiConsumer<A, T> accumulator(); // T: processed elments
public BinaryOperator<A> combiner(); // Often the type returned
public Function<A, R> finisher(); // Final touch
public Set<Characteristics> characteristics();
}
78. Type of a Collector
In a nutshell:
- T: type of the elements of the stream
- A: type the mutable container
- R: type of the final container
We often have A = R
The finisher may be the identity function
≠
79. one, two, three, four, five, six, seven, eight, nine, ten
groupingBy(String::length)
3
4
5
one, two, six, ten
four, five, nine
three, seven, eight
80. one, two, three, four, five, six, seven, eight, nine, ten
Collector<String, ?, Map<Integer, List<String>> > c =
groupingBy(String::length)
3
4
5
one, two, six, ten
four, five, nine
three, seven, eight
81. one, two, three, four, five, six, seven, eight, nine, ten
Collector<String, ?, Map<Integer, List<String>> > c =
groupingBy(String::length)
3
4
5
one, two, six, ten
four, five, nine
three, seven, eight
82. one, two, three, four, five, six, seven, eight, nine, ten
Collector<String, ?, Map<Integer, List<String>> > c =
groupingBy(String::length)
3
4
5
one, two, six, ten
four, five, nine
three, seven, eight
83. one, two, three, four, five, six, seven, eight, nine, ten
Collector<String, ?, Map<Integer, List<String>> > c =
groupingBy(
String::length,
?
)
3
4
5
one, two, six, ten
four, five, nine
three, seven, eight
84. one, two, three, four, five, six, seven, eight, nine, ten
Collector<String, ?, Map<Integer, List<String>> > c =
groupingBy(
String::length,
Collector<String, ?, >
)
3
4
5
one, two, six, ten
four, five, nine
three, seven, eight
89. Intermediate Collectors
The mapping Collector provides an
intermediate operation
Why is it interesting?
To create downstream collectors!
So what about integrating all our stream
processing as a collector?
stream.collect(mapping(function, downstream));
91. Intermediate Collectors
The mapping Collector provides an
intermediate operation
We have a Stream<T>
So predicate is a Predicate<T>
Downstream is a Collector<T, ?, R>
stream.collect(mapping(function, downstream));
stream.collect(filtering(predicate, downstream));
92. Intermediate Collectors
The mapping Collector provides an
intermediate operation
We have a Stream<T>
So flatMapper is a Function<T, Stream<TT>>
And downstream is a Collector<TT, ?, R>
stream.collect(mapping(function, downstream));
stream.collect(flatMapping(flatMapper, downstream));
94. Characteristics
Three characteristics for the collectors:
- IDENTITY_FINISH: the finisher is the identity
function
- UNORDERED: the collector does not preserve
the order of the elements
- CONCURRENT: the collector is thread safe
95. Handling Empty Optionals
Two things:
- Make an Optional a Stream
- Remove the empty Streams with flatMap
Map<K, Optional<V>> // with empty Optionals...
-> Map<K, Steam<V>> // with empty Streams
-> Stream<Map.Entry<K, V>> // the empty are gone
-> Map<K, V> // using a toMap
96. Joins
1) The authors that published the most
together
2) The authors that published the most
together in a year
StreamsUtils to the rescue!
99. Application
What is interesting in modeling a processing as
a collector?
We can reuse this collector as a downstream
collector for other processings
102. Dealing with Issues
The main issue is the empty stream
A whole stream may have elements
But when we build an histogram, a given
substream may become empty…
104. API Collector
A very rich API indeed
Quite complex…
One needs to have a very precise idea of the
data processing pipeline
Can be extended!
105. API Collector
A collector can model a whole processing
Once it is written, it can be passed as a
downstream to another processing pipeline
Can be made composable to improve
readability
https://github.com/JosePaumard