Java 8 has introduced a new abstraction called Stream, letting us processing data in a declarative way. Furthermore, streams can leverage multi-core architectures without you having to write a single line of multithread code.
Collectors is class an implementations of Collector
that implement various useful reduction operations, such as accumulating elements into collections, summarizing elements according to various criteria, etc.
Using Collectors
To demonstrate the usage of stream Collectors, let me define a class to hold my data as:
class Employee { private String empId; private String name; private Double salary; private String department; public Employee(String empId, String name, Double salary, String department) { this.empId = empId; this.name = name; this.salary = salary; this.department = department; } // getters and toString }
So, let me have a list of Employee as:
Employee john = new Employee("E123", "John Nhoj", 200.99, "IT"); Employee south = new Employee("E223", "South Htuos", 299.99, "Sales"); Employee reet = new Employee("E133", "Reet Teer", 300.99, "IT"); Employee prateema = new Employee("E143", "Prateema Rai", 300.99, "Benefits"); Employee yogen = new Employee("E323", "Yogen Rai", 200.99, "Sales"); List<Employee> employees = Arrays.asList(john, south, reet, prateema, yogen);
1. Calculating statistical values
Finding average salary
Double averageSalary = employees.stream().collect(averagingDouble(Employee::getSalary)); // 260.79
Similarly, there are averagingInt(ToIntFunction<? super T> mapper) and averagingLong(ToLongFunction<? super T> mapper) to find the average values for Integer and Long types.
Finding total salary
Double totalSalary = employees.stream().collect(summingDouble(Employee::getSalary));
// 1303.95
summingInt(ToIntFunction<? super T> mapper) and summingLong(ToLongFunction<? super T> mapper) are available for summing Integer and Long types.
Finding max salary
Double maxSalary = employees.stream().collect(collectingAndThen(maxBy(comparingDouble(Employee::getSalary)), emp -> emp.get().getSalary())); // 300.99
collectingAndThen function has declaration of:
Collector<T,A,RR> collectingAndThen(Collector<T,A,R> downstream, Function<R,RR> finisher)
Function finisher can be used to format the final result of Collector output as:
String avgSalary = employees.stream() .collect(collectingAndThen(averagingDouble(Employee::getSalary), new DecimalFormat("'$'0.000")::format)); // $260.790
Calculating statistics in one shot
DoubleSummaryStatistics statistics = employees.stream().collect(summarizingDouble(Employee::getSalary)); System.out.println("Average: " + statistics.getAverage() + ", Total: " + statistics.getSum() + ", Max: " + statistics.getMax() + ", Min: "+ statistics.getMin()); // Average: 260.79, Total: 1303.95, Max: 300.99, Min: 200.99
Similarly, summarizingInt(ToIntFunction<? super T> mapper) and summarizingLong(ToLongFunction<? super T> mapper) are available for Integer and Long types.
2. Mapping and Joining Stream
Mapping only employee names
List<String> employeeNames = employees.stream().collect(mapping(Employee::getName, toList())); // [John Nhoj, South Htuos, Reet Teer, Prateema Rai, Yogen Rai]
Joining employee names
String employeeNamesStr = employees.stream().map(Employee::getName).collect(joining(",")); // John Nhoj,South Htuos,Reet Teer,Prateema Rai,Yogen Rai
joining() function has overloaded version to take prefix as suffix as:
Collector<CharSequence,?,String> joining(CharSequence delimiter, CharSequence prefix, CharSequence suffix)
So, if you want collect employee names in specific format, then you can do
String employeeNamesStr = employees.stream().map(Employee::getName).collect(joining(", ", "Employees = {", "}")); // Employees = {John Nhoj, South Htuos, Reet Teer, Prateema Rai, Yogen Rai}
3. Grouping Elements
Grouping employees by Department
groupingBy() takes classifier Function as:
Collector<T,?,Map<K,List<T>>> groupingBy(Function<? super T,? extends K> classifier)
So, grouping of employees by department is:
Map<String, List<Employee>> deptEmps = employees.stream().collect(groupingBy(Employee::getDepartment)); // {Sales=[{empId='E223', name='South Htuos', salary=299.99, department='Sales'}, {empId='E323', name='Yogen Rai', salary=200.99, department='Sales'}], Benefits=[{empId='E143', name='Prateema Rai', salary=300.99, department='Benefits'}], IT=[{empId='E123', name='John Nhoj', salary=200.99, department='IT'}, {empId='E133', name='Reet Teer', salary=300.99, department='IT'}]}
Counting employees per Department
There is overloaded version of groupingBy() as:
Collector<T,?,Map<K,List<T>>> groupingBy(Function<? super T,? extends K> classifier,Collector<? super T,A,D> downstream)
So, counting of employees per department would be:
Map<String, Long> deptEmpsCount = employees.stream().collect(groupingBy(Employee::getDepartment, counting())); // {Sales=2, Benefits=1, IT=2}
Calculating average salary per Department with sorted Department name
Another overload method of groupingBy() is:
Collector<T,?,M> groupingBy(Function<? super T,? extends K> classifier, Supplier<M> mapFactory, Collector<? super T,A,D> downstream)
TreeMap can be used to groupBy department name sorted as:
Map<String, Double> averageSalaryDeptSorted = employees.stream().collect(groupingBy(Employee::getDepartment, TreeMap::new, averagingDouble(Employee::getSalary))); // {Benefits=300.99, IT=250.99, Sales=250.49}
There are ConcurrentHashMap version of groupBy() leveraging multi-core architectures.
Map<String, Long> deptEmpCount = employees.stream().collect(groupingByConcurrent(Employee::getDepartment, counting())); // {Sales=2, IT=2, Benefits=1}
4. Partitioning Elements
partitioningBy() takes a predicate to partion the result into true for meeting the predicate criterion and false for not as:
Collector<T,?,Map<Boolean,List<T>>> partitioningBy(Predicate<? super T> predicate)
Finding employees with salary greater then average salary is:
Map<Boolean, List<Employee>> portionedEmployees = employees.stream().collect(partitioningBy(e -> e.getSalary() > averageSalary)); // {false=[{empId='E123', name='John Nhoj', salary=200.99, department='IT'}, {empId='E323', name='Yogen Rai', salary=200.99, department='Sales'}], true=[{empId='E223', name='South Htuos', salary=299.99, department='Sales'}, {empId='E133', name='Reet Teer', salary=300.99, department='IT'}, {empId='E143', name='Prateema Rai', salary=300.99, department='Benefits'}]}
You can use overloaded version of this method to filter the result as:
Collector<T,?,Map<Boolean,D>> partitioningBy(Predicate<? super T> predicate, Collector<? super T,A,D> downstream)
Conclusion
Collectors class has many utility functions to operate over the stream and extract the result meaningfully.
All the source code for the example above are available on GitHub.