使用Java替换文件中的特定单词

1. 概述

本文将介绍如何使用Java标准库及Apache Commons IO库，实现文件中特定单词的替换功能。我们将通过多种方案对比，帮助读者选择最适合实际场景的实现方式。

2. 示例数据

所有示例将基于以下统一数据：

首先创建测试文件 data.txt，内容如下：

This is a sample file.
This is a sample file.
This is a sample file.

关键参数定义如下：

private static final String FILE_PATH = "src/test/resources/data.txt";
private static final String FILE_OUTPUT_PATH = "src/test/resources/data_output.txt";
private static final String OUTPUT_TO_VERIFY 
  = "This is a test file."+System.lineSeparator()+"This is a test file."+System.lineSeparator()+"This is a test file.";

3. 使用 BufferedReader 方案

BufferedReader 提供逐行读取文件的能力，我们采用以下处理流程：

逐行读取文件内容
使用 StringBuilder 拼接完整文本
调用 String.replace() 替换目标单词
通过 FileWriter 写入新文件

@Test
void givenFile_whenUsingBufferedReader_thenReplacedWordCorrect() throws IOException {
    StringBuilder fileContent = new StringBuilder();
    try (BufferedReader br = Files.newBufferedReader(Paths.get(FILE_PATH))) {
        String line;
        while ((line = br.readLine()) != null) {
            fileContent.append(line).append(System.lineSeparator());
        }
        String replacedContent = fileContent.toString().replace("sample", "test").trim();
        try (FileWriter fw = new FileWriter(FILE_OUTPUT_PATH)) {
            fw.write(replacedContent);
        }

        assertEquals(OUTPUT_TO_VERIFY, replacedContent);
    }
}

✅ 优点：实现简单，适合中小文件
⚠️ 缺陷：大文件处理时内存占用较高

4. 使用 Scanner 方案

Scanner 类也能实现类似功能，核心逻辑与 BufferedReader 方案一致：

@Test
void givenFile_whenUsingScanner_thenReplacedWordCorrect() throws IOException {
    StringBuilder fileContent = new StringBuilder();
    try (Scanner scanner = new Scanner(new File(FILE_PATH))) {
        while (scanner.hasNextLine()) {
            fileContent.append(scanner.nextLine()).append(System.lineSeparator());
        }
        String replacedContent = fileContent.toString().replace("sample", "test").trim();
        try (FileWriter fw = new FileWriter(FILE_OUTPUT_PATH)) {
            fw.write(replacedContent);
        }

        assertEquals(OUTPUT_TO_VERIFY, replacedContent);
    }
}

❌ 性能提示：Scanner 在处理大文件时性能略逊于 BufferedReader，更适合结构化数据解析场景

5. 使用 NIO2 Files API

Java NIO2 的 Files.lines() 方法提供流式处理能力，结合 Lambda 表达式可实现优雅的逐行替换：

@Test
void givenFile_whenUsingFilesAPI_thenReplacedWordCorrect() throws IOException{
    try (Stream<String> lines = Files.lines(Paths.get(FILE_PATH))) {
        List<String> list = lines.map(line -> line.replace("sample", "test"))
          .collect(Collectors.toList());
        Files.write(Paths.get(FILE_OUTPUT_PATH), list, StandardCharsets.UTF_8);

        assertEquals(OUTPUT_TO_VERIFY, String.join(System.lineSeparator(), list));
    }
}

✅ 优势：现代API设计，代码简洁
⚠️ 注意：Files.write() 会覆盖目标文件，需谨慎处理文件权限

6. 使用 Apache Commons IO

通过第三方库 FileUtils 简化文件操作：

@Test
void givenFile_whenUsingFileUtils_thenReplacedWordCorrect() throws IOException{
    StringBuilder fileContent = new StringBuilder();
    List<String> lines = FileUtils.readLines(new File(FILE_PATH), "UTF-8");
    lines.forEach(line -> fileContent.append(line).append(System.lineSeparator()));
    String replacedContent = fileContent.toString().replace("sample", "test").trim();
    try (FileWriter fw = new FileWriter(FILE_OUTPUT_PATH)) {
        fw.write(replacedContent);
    }

    assertEquals(OUTPUT_TO_VERIFY, replacedContent);
}

❌ 内存陷阱：此方案会同时保存原始文件和修改后文件在内存中
⚠️ 适用性：仅推荐处理小文件，大文件场景存在OOM风险

7. 内存高效方案

针对大文件场景，采用流式处理避免内存堆积：

@Test
void givenLargeFile_whenUsingFilesAPI_thenReplacedWordCorrect() throws IOException {
    try (Stream<String> lines = Files.lines(Paths.get(FILE_PATH))) {
        Files.writeString(Paths.get(FILE_OUTPUT_PATH), "",
          StandardCharsets.UTF_8, StandardOpenOption.CREATE,StandardOpenOption.TRUNCATE_EXISTING);
        lines.forEach(line -> {
            line = line.replace("sample", "test") + System.lineSeparator();
            try {
                Files.writeString(Paths.get(FILE_OUTPUT_PATH),line,StandardCharsets.UTF_8, 
                  StandardOpenOption.APPEND);
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        });

        assertEquals(OUTPUT_TO_VERIFY, Files.readString(Paths.get(FILE_OUTPUT_PATH)).trim());
    }
}

✅ 核心优势：内存占用恒定，适合GB级文件处理
⚠️ 性能权衡：频繁IO操作会增加CPU负载
🔧 修复技巧：文件末尾会多出换行符，读取时需用 trim() 处理

8. 总结

各方案对比如下：

方案	内存占用	CPU负载	适用场景
BufferedReader	高	低	中小文件简单处理
Scanner	高	中	需要格式解析的场景
NIO2 Files API	中	中	现代Java项目首选
Apache Commons IO	极高	低	已引入依赖的小文件场景
内存高效方案	极低	高	大文件处理必备

选择建议：

常规场景优先使用 NIO2 Files API，代码简洁且性能均衡
超大文件处理选择 内存高效方案，但需监控CPU使用率
已使用Apache Commons的项目可沿用 FileUtils，但需警惕内存问题

所有示例代码已上传至 GitHub仓库，建议读者实际运行测试用例加深理解。

Persistence

REST

Security