Monday, July 6, 2015

Playing with the Java 8 Collectors API

I recently came across a problem that looked like it had to be a walk in the park using the Java 8 Collectors API. A short glance at the API doc with its myriad of angle brackets, one letter type parameters and predefined Collectors promised a type-safe solution just waiting to be discovered.

I did indeed find a solution to the problem quite quickly, but was not quite happy about the clumsy way it looked. This blog post is meant to share my findings trying to solve as simple a task as the one represented by the following Unit Test:

public class MyCollectorsTest {  
    private List<KeyValuePair> createKeyValuePairs() {
        return new ImmutableList.Builder<KeyValuePair>()
                .add(new KeyValuePair("java", "christoph"))
                .add(new KeyValuePair("java", "susanne"))
                .add(new KeyValuePair("scala", "susanne"))
                .add(new KeyValuePair("java", "martin"))
                .add(new KeyValuePair("java", "thomas"))
                .add(new KeyValuePair("java", "armin"))
                .add(new KeyValuePair("scala", "armin"))
                .build();
    }

    @Test
    public void testGroupByKeysAndJoinValues() {
        final Map<String, String> result = new MyCollectors().groupByKeysAndJoinValues(createKeyValuePairs());
        assertThat(result.size(), is(2));
        assertThat(result.get("java"), is("armin, christoph, martin, susanne, thomas"));
        assertThat(result.get("scala"), is("armin, susanne"));
    }
}

Version 1

    // Version 1: use groupingBy, get entrySet and collect it to a map, sorting the values in the values function
    public Map<String, String> groupByKeysAndJoinValuesVersion1(final List<KeyValuePair> tuples) {
        return tuples.stream()
                .collect(groupingBy(KeyValuePair::getTheKey))
                .entrySet()
                .stream()
                .collect(toMap(Map.Entry::getKey, this::sortAndJoin1));
    }

    private String sortAndJoin1(final Map.Entry<String, List<KeyValuePair>> e) {
        return e.getValue().stream()
                .map(KeyValuePair::getTheValue)
                .sorted()
                .collect(joining(", "));
    }

Well...it works but it feels kinda cumbersome to have to pick out the EntrySet's values inside the toMap() function only to get sorted values. Also i wasn't happy with the fact that sortAndJoin1() - as its name suggests - did more than one thing. Let's keep trying:

Version 2

    // Version 2: the same as version 1 but implemented with nested collectors
    public Map<String, String> groupByKeysAndJoinValuesVersion2(final List<KeyValuePair> tuples) {
        return tuples.stream()
                .collect(
                        groupingBy(
                                KeyValuePair::getTheKey,
                                mapping(
                                        KeyValuePair::getTheValue,
                                        collectingAndThen(toList(), this::sortAndJoin2)
                                )
                        )
                );
    }

    private String sortAndJoin2(final List<String> stringList) {
        return stringList.stream().sorted().collect(joining(", "));
    }

Oooook...this is basically the same code as above, but uses nested or downstream collectors. To be perfectly honest, i think that having the possibility of downstream collectors is great, but the syntax is rather awkward and it takes some weird formatting to make it readable at all. Furthermore, this solution did not solve the problem i had with version 1: a separate method for sorting and joining. Yes, i know i can nest lamdas until the cows come home, but eventually, some time in the far future someone else might need to read (and understand) my code and i don't want them to curse me when they have nightmares from my code ;)

Anyway, since i'm always sure, there's a better way to do anything i kept on trying and eventually came up with the following:

Version 3

    // Version 3: collect to TreeSet thus sorting the values.
    public Map<String, String> groupByKeysAndJoinValuesVersion3(final List<KeyValuePair> tuples) {
        return tuples.stream()
                .collect(
                        groupingBy(
                                KeyValuePair::getTheKey,
                                mapping(
                                        KeyValuePair::getTheValue,
                                        collectingAndThen(
                                                toCollection(TreeSet::new),
                                                (theSet) -> theSet.stream().collect(joining(", "))
                                        )
                                )
                        )
                );
    }

Using the mapping Collector and TreeSet i could remove the need to manually sort the collection before joining it. Anyway - a small victory given the fact that the code still looks complicated, even though it doesn't do much.

I thought i could avoid it, but it seemed the only way to get readable code was to write my own collector. So, with a little bit of help from IntelliJ completing the type arguments for me i set out in a last desparate try to achieve concise and readable code, removing all the details from the call site:

Version 4

    // Version 4: use custom collector to hide sorting and joining.
    public Map<String, String> groupByKeysAndJoinValuesVersion4(final List<KeyValuePair> tuples) {
        return tuples.stream()
                .collect(groupingBy(KeyValuePair::getTheKey, new KeyValuePairSetStringCollector()));
    }

    private static class KeyValuePairSetStringCollector implements Collector<KeyValuePair, Set<String>, String> {
        @Override
        public Supplier<Set<String>> supplier() {
            return TreeSet::new;
        }

        @Override
        public BiConsumer<Set<String>, KeyValuePair> accumulator() {
            return (strings, keyValuePair) -> strings.add(keyValuePair.getTheValue());
        }

        @Override
        public BinaryOperator<Set<String>> combiner() {
            return (keyValuePairs, keyValuePairs2) -> {
                keyValuePairs.addAll(keyValuePairs2);
                return keyValuePairs;
            };
        }

        @Override
        public Function<Set<String>, String> finisher() {
            return (set) -> set.stream().collect(joining(", "));
        }

        @Override
        public Set<Characteristics> characteristics() {
            return new HashSet<>();
        }
    }
The collector can be rewritten to use a bit less boilerplate like so:
    private static Collector<KeyValuePair, Set<String>, String> toKeyValuePairSet() {
        final Supplier<Set<String>> supplier = TreeSet::new;
        final BiConsumer<Set<String>, KeyValuePair> accumulator = 
            (strings, keyValuePair) -> strings.add(keyValuePair.getTheValue());
        final BinaryOperator<Set<String>> combiner = (keyValuePairs1, keyValuePairs2) -> {
            keyValuePairs1.addAll(keyValuePairs2);
            return keyValuePairs1;
        };
        final Function<Set<String>, String> finisher = (set) -> set.stream().collect(joining(", "));
        return Collector.of(supplier, accumulator, combiner, finisher);
    }

It may depend on the case at hand, but in this case maybe the custom Collector pollutes the call site's code least.

Conclusion

The collector's API is definitely powerful, i do have my honest doubts, however, that the more advanced features will gain a lot of popularity outside of library code. Maybe it was intended to be that way, i don't know. Java 8 gives the developer much better tools to transform and modify data than did its previous versions, but it is still a far cry from being as comfortable or intuitive as other languages.

In the end i decided i needed to know how this would look in scala.

The Final Version

In scala this is basically a three-liner:

keyValuePairs.groupBy(_.theKey).collect {  
  case (key: String, values: List[KeyValuePair]) =>
    (key, values.map(_.theValue).sorted.mkString(", "))
}

Happy Hacking :)