Java HashSet 完全指南 | Baeldung中文网

1. 概述

本指南将深入探讨 Java 集合框架中的核心实现之一 —— HashSet。作为最常用的 Set 接口实现，它提供了高效的去重存储方案，是每个 Java 开发者必须掌握的基础数据结构。

2. HashSet 简介

HashSet 是 Java 集合框架的基础组件，其核心特点如下：

✅ 存储唯一元素：自动去重，允许存储一个 null 值
✅ 基于 HashMap 实现：底层使用 HashMap 存储数据
❌ 不保证插入顺序：元素遍历顺序与插入顺序无关
⚠️ 非线程安全：多线程环境下需手动同步

当创建 HashSet 实例时，会自动初始化内部 HashMap：

public HashSet() {
    map = new HashMap<>();
}

要深入理解 HashMap 的工作原理，建议阅读 HashMap 详解。

3. HashSet 的核心 API

3.1. add() 方法

功能：向集合添加元素，仅当元素不存在时才添加成功
返回值：成功添加返回 true，元素已存在返回 false

@Test
public void whenAddingElement_shouldAddElement() {
    Set<String> hashset = new HashSet<>();
 
    assertTrue(hashset.add("String Added"));
}

实现原理：直接调用 HashMap 的 put() 方法，元素作为 key，固定对象 PRESENT 作为 value：

public boolean add(E e) {
    return map.put(e, PRESENT) == null;
}

其中 map 是内部 HashMap 实例：

private transient HashMap<E, Object> map;

⚠️ 关键点：HashSet 的性能严重依赖 hashCode() 和 equals() 的正确实现。建议先熟悉 hashCode 原理。

3.2. contains() 方法

功能：检查集合是否包含指定元素
返回值：存在返回 true，否则返回 false

@Test
public void whenCheckingForElement_shouldSearchForElement() {
    Set<String> hashsetContains = new HashSet<>();
    hashsetContains.add("String Added");
 
    assertTrue(hashsetContains.contains("String Added"));
}

查找流程：

计算元素的哈希值
定位到对应的桶（bucket）
遍历桶中的链表（或红黑树）进行精确匹配

3.3. remove() 方法

功能：从集合中移除指定元素
返回值：成功移除返回 true，元素不存在返回 false

@Test
public void whenRemovingElement_shouldRemoveElement() {
    Set<String> removeFromHashSet = new HashSet<>();
    removeFromHashSet.add("String Added");
 
    assertTrue(removeFromHashSet.remove("String Added"));
}

3.4. clear() 方法

功能：清空集合中的所有元素
实现：直接调用底层 HashMap 的 clear() 方法

@Test
public void whenClearingHashSet_shouldClearHashSet() {
    Set<String> clearHashSet = new HashSet<>();
    clearHashSet.add("String Added");
    clearHashSet.clear();
    
    assertTrue(clearHashSet.isEmpty());
}

3.5. size() 方法

功能：返回集合中的元素数量
实现：委托给 HashMap 的 size() 方法

@Test
public void whenCheckingTheSizeOfHashSet_shouldReturnThesize() {
    Set<String> hashSetSize = new HashSet<>();
    hashSetSize.add("String Added");
    
    assertEquals(1, hashSetSize.size());
}

3.6. isEmpty() 方法

功能：检查集合是否为空
返回值：空集合返回 true，否则返回 false

@Test
public void whenCheckingForEmptyHashSet_shouldCheckForEmpty() {
    Set<String> emptyHashSet = new HashSet<>();
    
    assertTrue(emptyHashSet.isEmpty());
}

3.7. iterator() 方法

功能：返回集合元素的迭代器
特点：

遍历顺序不确定（无序）
迭代器是快速失败（fail-fast）的

@Test
public void whenIteratingHashSet_shouldIterateHashSet() {
    Set<String> hashset = new HashSet<>();
    hashset.add("First");
    hashset.add("Second");
    hashset.add("Third");
    Iterator<String> itr = hashset.iterator();
    while(itr.hasNext()){
        System.out.println(itr.next());
    }
}

⚠️ 快速失败机制：迭代过程中若通过非迭代器方法修改集合，会抛出 ConcurrentModificationException：

@Test(expected = ConcurrentModificationException.class)
public void whenModifyingHashSetWhileIterating_shouldThrowException() {
 
    Set<String> hashset = new HashSet<>();
    hashset.add("First");
    hashset.add("Second");
    hashset.add("Third");
    Iterator<String> itr = hashset.iterator();
    while (itr.hasNext()) {
        itr.next();
        hashset.remove("Second"); // 触发异常
    }
}

正确移除方式：使用迭代器的 remove() 方法：

@Test
public void whenRemovingElementUsingIterator_shouldRemoveElement() {
 
    Set<String> hashset = new HashSet<>();
    hashset.add("First");
    hashset.add("Second");
    hashset.add("Third");
    Iterator<String> itr = hashset.iterator();
    while (itr.hasNext()) {
        String element = itr.next();
        if (element.equals("Second"))
            itr.remove(); // 安全移除
    }
 
    assertEquals(2, hashset.size());
}

💡 注意：快速失败机制是尽力而为的，不保证在所有并发场景下都能触发异常。

4. 如何将 HashSet 转换为 TreeSet（对象集合）

通过 TreeSet(Collection<? extends E> c) 构造器可将 HashSet 转换为 TreeSet。转换要求集合元素必须实现 Comparable 接口，否则会抛出 ClassCastException。

示例对象类（需实现 Comparable）：

public class Employee implements Comparable<Employee> {
    private int employeeId;
    private String employeeName;

    Employee(int employeeId, String employeeName) {
        this.employeeId = employeeId;
        this.employeeName = employeeName;
    }

    int getEmployeeId() {
        return employeeId;
    }

    public String getEmployeeName() {
        return employeeName;
    }

    @Override
    public String toString() {
        return employeeId + " " + employeeName;
    }

    @Override
    public int compareTo(Employee o) {
        return Integer.compare(o.employeeId, this.employeeId); // 降序排列
    }
}

成功转换场景（元素实现 Comparable）：

@Test
public void givenComparableObject_whenConvertingToTreeSet_thenNoExceptionThrown() {

    HashSet<Employee> hashSet = new HashSet<>();
        
    hashSet.add(new Employee(3, "John"));
    hashSet.add(new Employee(5, "Mike"));
    hashSet.add(new Employee(2, "Bob"));
    hashSet.add(new Employee(1, "Tom"));
    hashSet.add(new Employee(4, "Johnny"));  
        
    assertDoesNotThrow(()->{
        TreeSet<Employee> treeSet = new TreeSet<>(hashSet);
    });
}

失败转换场景（元素未实现 Comparable）：

@Test
public void givenNonComparableObject_whenConvertingToTreeSet_thenExceptionThrown() {

    HashSet<Employee> hashSet = new HashSet<>();
        
    hashSet.add(new Employee(3, "John"));
    hashSet.add(new Employee(5, "Mike"));
    hashSet.add(new Employee(2, "Bob"));
    hashSet.add(new Employee(1, "Tom"));
    hashSet.add(new Employee(4, "Johnny"));  
        
    assertThrows(ClassCastException.class,() -> { 
        TreeSet<Employee> treeSet = new TreeSet<>(hashSet); 
    });
}

5. HashSet 如何保证唯一性？

HashSet 通过两步检查保证元素唯一性：

哈希值检查：使用 hashCode() 定位桶位置
对象相等检查：在桶内使用 equals() 精确匹配

⚠️ 关键点：两个对象即使 hashCode() 相同，equals() 也可能返回 false。因此必须同时满足：

hashCode() 值相同

equals() 返回 true

6. HashSet 的性能

HashSet 性能主要受两个参数影响：

6.1. 初始容量（Initial Capacity）

默认值：16
影响：过小会导致频繁扩容（rehash），过大则浪费内存

6.2. 加载因子（Load Factor）

默认值：0.75
含义：当元素数量达到容量×加载因子时触发扩容

自定义构造方式：

Set<String> hashset = new HashSet<>();              // 默认：容量16，加载因子0.75
Set<String> hashset = new HashSet<>(20);            // 自定义容量
Set<String> hashset = new HashSet<>(20, 0.5f);      // 自定义容量和加载因子

6.3. 性能权衡

场景	推荐配置	原因
大数据量 + 少迭代	高初始容量	减少扩容次数
小数据量 + 频繁迭代	低初始容量	节省内存，提高遍历效率

⚠️ 时间复杂度：

平均情况：O(1)（理想哈希分布）

最坏情况：O(log n)（JDK 8+，红黑树优化）

极端情况：O(n)（所有元素哈希冲突）

7. 总结

HashSet 是 Java 集合框架中高效去重的利器，其核心优势在于：

✅ 常数级时间复杂度：理想情况下增删查均为 O(1)
✅ 自动去重：基于 hashCode() 和 equals() 实现唯一性
✅ 灵活配置：可自定义容量和加载因子优化性能

最佳实践建议：

确保存储对象正确实现 hashCode() 和 equals()
根据数据规模和访问模式调整初始容量
多线程环境使用 Collections.synchronizedSet() 或 ConcurrentHashMap.newKeySet()

掌握 HashSet 的工作原理和性能特性，能帮助我们在实际开发中写出更高效、更健壮的代码。

Persistence

REST

Security