Java Collectors fatigue

Biju Kunjummen
3 min readFeb 29, 2024

I was recently working on a small project involving hierarchical data. The data was represented in a structure similar to this Java record:

record Link(int parentId, int childId) {
}

Now, given a list of these links, I needed to get this into a Map of parent-to-child mapping. Using a loop a code would look like this:

Map<Integer, Set<Integer>> parentChild = new HashMap<>();

for (Link link : links) {
Set<Integer> children = parentChild.computeIfAbsent(link.parentId(), k -> new TreeSet<>());
children.add(link.childId());
}

Using Java Streams this kind of grouping can be way more succinctly expressed using a code that looks like this:

Map<Integer, Set<Integer>> result = links.stream()
.collect(
Collectors.groupingBy(
link -> link.parentId(),
Collectors.mapping(link -> link.childId(), Collectors.toSet())));

However, while working on the project I couldn’t recall the right collector to use for the Streams version of the code and resorted to the iteration.

I call this the Collector fatigue — Collector and Stream version though concise, may not always be intuitive and needs checking the documentation or samples to figure out the right variation to get the job done.

It reminded me that it may be a good time to review once more how the Collector works. This post is a short exploration of how it works under the covers.

Collector Under the covers

Consider first a simple Collector, taking in a stream of String types and creating a set out of it.

Set<String> result = List.of("one", "two", "three", "four", "five", "one")
.stream()
.collect(Collectors.toSet());

Clearly, this is way more intuitive than the loop version of the code:

Set<String> set = new HashSet<>();
for (String s : List.of("one", "two", "three", "four", "five", "one")) {
set.add(s);
}

Under the covers, the loop version is exactly how it functions when the Collector is created using Collectors.toSet().

It creates a type which has 4 elements in it:

  1. A supplier which initializes the result container — `Set` in this instance
  2. An accumulator which takes care of adding new elements to the result container
  3. A combiner that can put together contents of multiple result containers together
  4. A finisher, taking care of any final set of consolidation

A loop version of the Collector would look like this:

Set<String> set = customCollector.supplier().get();
for (String s : List.of("one", "two", "three", "four", "five", "one")) {
customCollector.accumulator().accept(set, s);
}
Set<String> finalResult = customCollector.finisher().apply(set);

The combiner hasn’t come into play here, a combiner is used typically with parallel streams where multiple accumulations need to be combined.

This is the high-level detail of how a Collector works under the covers.

So, how does a Custom Collector look?

A Custom Collector

Going back to the original problem, converting a list of Links to a parent-child map, if I were to define a custom collector to do such a transformation, the Collector definition would look like this:

record CustomCollector<T, A, R>(Supplier<A> supplier,
BiConsumer<A, T> accumulator,
BinaryOperator<A> combiner,
Function<A, R> finisher,
Set<Characteristics> characteristics
) implements Collector<T, A, R> {

}

See how all the Custom version of the Collector is just a holder for the result supplier, accumulator, combiner and finisher !

So given this what would a Custom Collector for a Map type look like:

Supplier<Map<Integer, Set<Integer>>> supplier = () -> new HashMap<>();

BiConsumer<Map<Integer, Set<Integer>>, Link> accumulator = (map, item) -> {
final Set<Integer> children = map.computeIfAbsent(item.parentId(), k -> new HashSet<>());
children.add(item.childId());
};

BinaryOperator<Map<Integer, Set<Integer>>> combiner = (left, right) -> {
if (left.size() < right.size()) {
right.putAll(left);
return right;
} else {
left.putAll(right);
return left;
}
};

Collector<Link, Map<Integer, Set<Integer>>, Map<Integer, Set<Integer>>> customCollector =
new CustomCollector<>(
supplier,
accumulator,
combiner,
Function.identity(),
Set.of(Collector.Characteristics.UNORDERED));

See how the supplier, accumulator, combiner and finisher are being put together into the Collector type.

Using this collector looks like this:

Map<Integer, Set<Integer>> result = links.stream().collect(customCollector);

Final Thoughts

Collectors utility method hides the complexity of needing to create the Custom Collectors, so to create the Map, the Collectors version looks like this, internally generating the version similar to the Custom collector.

Map<Integer, Set<Integer>> result = links.stream()
.collect(
Collectors.groupingBy(
link -> link.parentId(),
Collectors.mapping(link -> link.childId(), Collectors.toSet())));

The loop version may look simpler but knowing the right Collector and right utility method in Collectors to use will go a long way in making the code far more concise and readable.

--

--

Biju Kunjummen

Sharing knowledge about Java, Cloud and general software engineering practices