1. 概述

Apache Commons CSV 库提供了创建和读取 CSV 文件的丰富功能。本文将通过一个简单示例,展示如何高效利用这个库。

2. Maven 依赖

首先通过 Maven 导入最新版本:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-csv</artifactId>
    <version>1.10.0</version>
</dependency>

✅ 最新版本查询:Maven 仓库

3. 读取 CSV 文件

假设有 book.csv 文件包含书籍信息:

author,title
Dan Simmons,Hyperion
Douglas Adams,The Hitchhiker's Guide to the Galaxy

读取代码示例:

Map<String, String> AUTHOR_BOOK_MAP = new HashMap<>() {
    {
        put("Dan Simmons", "Hyperion");
        put("Douglas Adams", "The Hitchhiker's Guide to the Galaxy");
    }
});
String[] HEADERS = { "author", "title"};
@Test
void givenCSVFile_whenRead_thenContentsAsExpected() throws IOException {
    Reader in = new FileReader("src/test/resources/book.csv");

    CSVFormat csvFormat = CSVFormat.DEFAULT.builder()
        .setHeader(HEADERS)
        .setSkipHeaderRecord(true)
        .build();

    Iterable<CSVRecord> records = csvFormat.parse(in);

    for (CSVRecord record : records) {
        String author = record.get("author");
        String title = record.get("title");
        assertEquals(AUTHOR_BOOK_MAP.get(author), title);
    }
}

关键点:

  • 跳过首行表头(setSkipHeaderRecord(true)
  • 通过 CSVFormat 定义文件格式
  • 后续章节会展示更多格式配置选项

4. 创建 CSV 文件

生成相同结构的 CSV 文件:

@Test
void givenAuthorBookMap_whenWrittenToStream_thenOutputStreamAsExpected() throws IOException {
    StringWriter sw = new StringWriter();

    CSVFormat csvFormat = CSVFormat.DEFAULT.builder()
        .setHeader(HEADERS)
        .build();
    
    try (final CSVPrinter printer = new CSVPrinter(sw, csvFormat)) {
        AUTHOR_BOOK_MAP.forEach((author, title) -> {
            try {
                printer.printRecord(author, title);
            } catch (IOException e) {
                e.printStackTrace();
            }
        });
    }
    assertEquals(EXPECTED_FILESTREAM, sw.toString().trim());
}

⚠️ 注意:try-with-resources 确保 CSVPrinter 正确关闭

5. 表头与列读取

5.1. 按索引访问列

最基础的方式,适用于无表头场景:

Reader in = new FileReader("book.csv");
Iterable<CSVRecord> records = csvFormat.parse(in);
for (CSVRecord record : records) {
    String columnOne = record.get(0);
    String columnTwo = record.get(1);
}

5.2. 通过预定义表头访问

更直观的列访问方式:

Iterable<CSVRecord> records = csvFormat.parse(in);
for (CSVRecord record : records) {
    String author = record.get("author");
    String title = record.get("title");
}

5.3. 使用枚举作为表头

避免字符串硬编码,提升代码健壮性:

enum BookHeaders{
    author, title
}    

CSVFormat csvFormat = CSVFormat.DEFAULT.builder()
    .setHeader(BookHeaders.class)
    .setSkipHeaderRecord(true)
    .build();

Iterable<CSVRecord> records = csvFormat.parse(in);

for (CSVRecord record : records) {
    String author = record.get(BookHeaders.author);
    String title = record.get(BookHeaders.title);
    assertEquals(AUTHOR_BOOK_MAP.get(author), title);
}

5.4. 跳过表头行

CSV 文件通常首行为表头,直接跳过:

CSVFormat csvFormat = CSVFormat.DEFAULT.builder()
    .setSkipHeaderRecord(true)
    .build();

Iterable<CSVRecord> records = csvFormat.parse(in);

for (CSVRecord record : records) {
    String author = record.get("author");
    String title = record.get("title");
}

5.5. 创建带表头的文件

生成文件时自动添加表头:

FileWriter out = new FileWriter("book_new.csv");
CSVPrinter printer = csvFormat.print(out);

6. 总结

本文通过示例展示了 Apache Commons CSV 的核心功能。更多高级用法可参考:

💡 踩坑提示:处理大文件时注意内存消耗,建议使用流式处理而非全量加载


原始标题:Introduction to Apache Commons CSV