Java 8 Collectors 详解：从基础到自定义实现

1. 概述

本文将深入探讨 Java 8 的 Collectors，这些工具用于 Stream 处理的最后阶段。关于 Stream API 的基础知识，建议参考这篇教程。如果对并行处理中的 Collectors 应用感兴趣，可以查看这个项目。

2. Stream.collect() 方法

collect() 是 Java 8 Stream API 的终端操作之一。它允许我们对 Stream 中的元素执行可变的折叠操作（如将元素重新打包到数据结构、应用额外逻辑、拼接等）。该操作的具体策略通过 Collector 接口的实现提供。

3. Collectors 工具类

所有预定义的收集器实现都位于 Collectors 类中。为提升代码可读性，通常使用静态导入：

import static java.util.stream.Collectors.*;

也可以按需导入特定收集器：

import static java.util.stream.Collectors.toList;
import static java.util.stream.Collectors.toMap;
import static java.util.stream.Collectors.toSet;

后续示例将复用以下列表：

List<String> givenList = Arrays.asList("a", "bb", "ccc", "dd");

3.1. Collectors.toCollection()

使用 toSet() 或 toList() 时无法指定具体实现类。若需自定义集合类型，应使用 toCollection() 并传入目标集合的构造函数引用：

List<String> result = givenList.stream()
  .collect(toCollection(LinkedList::new));

注意：此方法不适用于不可变集合。遇到这种情况，要么自定义收集器，要么使用 collectingAndThen()。

3.2. Collectors.toList()

toList() 的核心作用是将 Stream 元素收集到 List 实例中。需注意两点：

无法保证具体 List 实现类型
若需精确控制实现类，应改用 toCollection()

List<String> result = givenList.stream()
  .collect(toList());

3.3. Collectors.toUnmodifiableList()

Java 10 引入的便捷方法，用于生成不可修改的 List：

List<String> result = givenList.stream()
  .collect(toUnmodifiableList());

尝试修改会抛出 UnsupportedOperationException：

assertThatThrownBy(() -> result.add("foo"))
  .isInstanceOf(UnsupportedOperationException.class);

3.4. Collectors.toSet()

toSet() 将 Stream 元素收集到 Set 实例中。关键点：

无法保证具体 Set 实现类型
自动去重（重复元素只保留一次）
需要控制实现类时改用 toCollection()

Set<String> result = givenList.stream()
  .collect(toSet());

去重示例：

List<String> listWithDuplicates = Arrays.asList("a", "bb", "c", "d", "bb");
Set<String> result = listWithDuplicates.stream()
  .collect(toSet());
assertThat(result).hasSize(4);

3.5. Collectors.toUnmodifiableSet()

Java 10 提供的不可修改 Set 收集器：

Set<String> result = givenList.stream()
  .collect(toUnmodifiableSet());

修改尝试会抛出异常：

assertThatThrownBy(() -> result.add("foo"))
  .isInstanceOf(UnsupportedOperationException.class);

3.6. Collectors.toMap()

toMap() 将 Stream 元素收集到 Map 中，需提供两个函数：

keyMapper()：从元素提取 Map 键
valueMapper()：从元素提取 Map 值

示例：以字符串为键，长度为值：

Map<String, Integer> result = givenList.stream()
  .collect(toMap(Function.identity(), String::length));

踩坑提醒：遇到重复键会直接抛出 IllegalStateException，即使值相同：

List<String> listWithDuplicates = Arrays.asList("a", "bb", "c", "d", "bb");
assertThatThrownBy(() -> {
    listWithDuplicates.stream().collect(toMap(Function.identity(), String::length));
}).isInstanceOf(IllegalStateException.class);

解决方案：使用带合并函数的重载版本：

Map<String, Integer> result = givenList.stream()
  .collect(toMap(Function.identity(), String::length, (item, identicalItem) -> item));

3.7. Collectors.toUnmodifiableMap()

Java 10 的不可修改 Map 收集器：

Map<String, Integer> result = givenList.stream()
  .collect(toUnmodifiableMap(Function.identity(), String::length));

修改尝试会抛出异常：

assertThatThrownBy(() -> result.put("foo", 3))
  .isInstanceOf(UnsupportedOperationException.class);

3.8. Collectors.collectingAndThen()

collectingAndThen() 是特殊收集器，允许在收集完成后立即执行额外操作。示例：收集到 List 后转为不可变集合：

List<String> result = givenList.stream()
  .collect(collectingAndThen(toList(), ImmutableList::copyOf));

3.9. Collectors.joining()

joining() 用于拼接 Stream<String> 元素：

无参：直接拼接
单参：指定分隔符
三参：指定分隔符、前缀、后缀

// 直接拼接
String result = givenList.stream()
  .collect(joining()); // "abbcccdd"

// 指定分隔符
String result = givenList.stream()
  .collect(joining(" ")); // "a bb ccc dd"

// 完整格式
String result = givenList.stream()
  .collect(joining(" ", "PRE-", "-POST")); // "PRE-a bb ccc dd-POST"

3.10. Collectors.counting()

counting() 简单统计元素数量：

Long result = givenList.stream()
  .collect(counting());

3.11. Collectors.summarizingDouble/Long/Int()

数值统计收集器，返回包含统计信息的特殊类：

DoubleSummaryStatistics result = givenList.stream()
  .collect(summarizingDouble(String::length));

可获取：

平均值：result.getAverage()
计数：result.getCount()
最大值：result.getMax()
最小值：result.getMin()
总和：result.getSum()

3.12. Collectors.averagingDouble/Long/Int()

计算提取元素的平均值：

Double result = givenList.stream()
  .collect(averagingDouble(String::length));

3.13. Collectors.summingDouble/Long/Int()

计算提取元素的总和：

Double result = givenList.stream()
  .collect(summingDouble(String::length));

3.14. Collectors.maxBy/minBy()

根据 Comparator 返回最大/最小元素，结果用 Optional 包装：

Optional<String> result = givenList.stream()
  .collect(maxBy(Comparator.naturalOrder()));

3.15. Collectors.groupingBy()

groupingBy() 按属性分组，结果存储在 Map 中。示例：按字符串长度分组，值存储为 Set：

Map<Integer, Set<String>> result = givenList.stream()
  .collect(groupingBy(String::length, toSet()));

结果示例：

{1=["a"], 2=["bb", "dd"], 3=["ccc"]}

3.16. Collectors.partitioningBy()

partitioningBy() 是 groupingBy 的特例，使用 Predicate 分组：

true 键：匹配谓词的元素
false 键：不匹配的元素

Map<Boolean, List<String>> result = givenList.stream()
  .collect(partitioningBy(s -> s.length() > 2));

结果：

{false=["a", "bb", "dd"], true=["ccc"]}

3.17. Collectors.teeing()

Java 12 新增，允许同时应用两个收集器并合并结果。示例：同时计算最小值和最大值：

List<Integer> numbers = Arrays.asList(42, 4, 2, 24);
String result = numbers.stream().collect(teeing(
  minBy(Integer::compareTo), // 第一个收集器
  maxBy(Integer::compareTo), // 第二个收集器
  (min, max) -> "Min: " + min.get() + ", Max: " + max.get() // 合并函数
));

4. 自定义收集器

自定义收集器需实现 Collector 接口，指定三个泛型参数：

public interface Collector<T, A, R> {
    // T: 输入元素类型
    // A: 可变累加器类型
    // R: 最终结果类型
}

示例：实现 ImmutableSet 收集器

public class ImmutableSetCollector<T>
  implements Collector<T, ImmutableSet.Builder<T>, ImmutableSet<T>> {

    @Override
    public Supplier<ImmutableSet.Builder<T>> supplier() {
        return ImmutableSet::builder;
    }

    @Override
    public BiConsumer<ImmutableSet.Builder<T>, T> accumulator() {
        return ImmutableSet.Builder::add;
    }

    @Override
    public BinaryOperator<ImmutableSet.Builder<T>> combiner() {
        return (left, right) -> left.addAll(right.build());
    }

    @Override
    public Function<ImmutableSet.Builder<T>, ImmutableSet<T>> finisher() {
        return ImmutableSet.Builder::build;
    }

    @Override
    public Set<Characteristics> characteristics() {
        return Sets.immutableEnumSet(Characteristics.UNORDERED);
    }

    public static <T> ImmutableSetCollector<T> toImmutableSet() {
        return new ImmutableSetCollector<>();
    }
}

使用示例：

List<String> givenList = Arrays.asList("a", "bb", "ccc", "dddd");
ImmutableSet<String> result = givenList.stream()
  .collect(toImmutableSet());

5. Java 9 改进

5.1. Collectors.filtering()

与 Stream.filter() 类似，但用于分组场景下的过滤。区别：

filter()：先过滤再分组，过滤掉的元素消失
filtering()：先分组再过滤，保留分组痕迹

List<Integer> numbers = List.of(1, 2, 3, 5, 5);

// 先过滤再分组（丢失分组信息）
Map<Integer, Long> result1 = numbers.stream()
  .filter(val -> val > 3)
  .collect(Collectors.groupingBy(i -> i, Collectors.counting()));
assertEquals(1, result1.size());

// 先分组再过滤（保留分组信息）
Map<Integer, Long> result2 = numbers.stream()
  .collect(Collectors.groupingBy(i -> i,
    Collectors.filtering(val -> val > 3, Collectors.counting())));
assertEquals(4, result2.size());

5.2. Collectors.flatMapping()

与 mapping() 类似，但更精细：直接将元素流传递给下游收集器，避免中间集合。示例模型：

class Blog {
    private String authorName;
    private List<String> comments;
      
    // 构造函数和 getter
}

对比 mapping() 和 flatMapping()：

Blog blog1 = new Blog("1", "Nice", "Very Nice");
Blog blog2 = new Blog("2", "Disappointing", "Ok", "Could be better");
List<Blog> blogs = List.of(blog1, blog2);

// mapping 产生嵌套结构
Map<String, List<List<String>>> result1 = blogs.stream()
  .collect(Collectors.groupingBy(Blog::getAuthorName,
    Collectors.mapping(Blog::getComments, Collectors.toList())));
assertEquals(2, result1.get("1").get(0).size()); // 嵌套 List

// flatMapping 扁平化结构
Map<String, List<String>> result2 = blogs.stream()
  .collect(Collectors.groupingBy(Blog::getAuthorName,
    Collectors.flatMapping(blog -> blog.getComments().stream(), 
    Collectors.toList())));
assertEquals(2, result2.get("1").size()); // 直接 List

6. 总结

本文全面解析了 Java 8 Collectors 的核心功能，从预定义收集器到自定义实现，并介绍了 Java 9/10/12 的重要改进。建议结合实际项目练习这些技术，并探索并行处理增强项目。更多技术文章可访问我的博客。

Persistence

REST

Security