Java中反转义HTML字符 | Baeldung中文网

1. 概述

服务端应用有时需要解析HTML字符，这时转义/反转义操作就派上用场了。本文将介绍几种在Java中反转义HTML字符的方法，并分析相关库的使用场景。⚠️ 注意：HTML符号在Java字符串中可能表示多个字符（如 < 转为 <），直接处理容易踩坑。

2. 反转义HTML字符的几种方法

处理HTML符号需要特别注意，因为像 < 或 > 这类字符在HTML中会被转义为 < 和 >。虽然可以手动实现反转义，但效率低且易出错。推荐使用成熟库处理，以下是主流方案：

2.1. 使用Apache Commons的`StringEscapeUtils`

Apache Commons是Java生态中的明星库，其StringEscapeUtils类提供了便捷方法。核心方法unescapeHtml4()位于org.apache.commons.text包：

String expectedQuote = "\"Hello\" Baeldung";
String escapedQuote = "&quot;Hello&quot; Baeldung";
Assert.assertEquals(expectedQuote, StringEscapeUtils.unescapeHtml4(escapedQuote));

String escapedStringsWithHtmlSymbol = "&lt;p&gt;&lt;strong&gt;Test sentence in bold type.&lt;/strong&gt;&lt;/p&gt;";
String expectedStringsWithHtmlSymbol = "<p><strong>Test sentence in bold type.</strong></p>";
Assert.assertEquals(expectedStringsWithHtmlSymbol, StringEscapeUtils.unescapeHtml4(escapedStringsWithHtmlSymbol));

✅ 优势：功能全面，支持HTML4实体
❌ 注意：需引入commons-text依赖

2.2. 使用Spring框架的`HtmlUtils`

Spring框架提供了HtmlUtils.htmlUnescape()方法处理HTML转义字符，位于org.springframework.web.util包：

String expectedQuote = "\"Code smells\" -Martin Fowler";
String escapedQuote = "&quot;Code smells&quot; -Martin Fowler";
Assert.assertEquals(expectedQuote, HtmlUtils.htmlUnescape(escapedQuote));

String escapedStringsWithHtmlSymbol = "&lt;p&gt;Loren Ipsum is a popular paragraph.&lt;/p&gt;";
String expectedStringsWithHtmlSymbol = "<p>Loren Ipsum is a popular paragraph.</p>";
Assert.assertEquals(expectedStringsWithHtmlSymbol, HtmlUtils.htmlUnescape(escapedStringsWithHtmlSymbol));

✅ 优势：Spring项目零额外依赖
⚠️ 注意：仅支持基本HTML实体

2.3. 使用Unbescape的`HtmlEscape`

Unbescape库专注于转义/反转义操作，支持HTML/JSON/CSS等多种格式。核心方法HtmlEscape.unescapeHtml()使用简单：

String expectedQuote = "\"Carpe diem\" -Horace";
String escapedQuote = "&quot;Carpe diem&quot; -Horace";
Assert.assertEquals(expectedQuote, HtmlEscape.unescapeHtml(escapedQuote));

String escapedStringsWithHtmlSymbol = "&lt;p&gt;&lt;em&gt;Pizza is a famous Italian food. Duh.&lt;/em&gt;&lt;/p&gt;";
String expectedStringsWithHtmlSymbol = "<p><em>Pizza is a famous Italian food. Duh.</em></p>";
Assert.assertEquals(expectedStringsWithHtmlSymbol, HtmlEscape.unescapeHtml(escapedStringsWithHtmlSymbol));

✅ 优势：轻量级，支持多格式
❌ 注意：社区活跃度不如主流库

2.4. 使用Jsoup的`Entities.unescape()`

Jsoup是HTML处理神器，其Entities.unescape()方法专门处理HTML字符反转义：

String expectedQuote = "\"Jsoup\" is another strong library";
String escapedQuote = "&quot;Jsoup&quot; is another strong library";
Assert.assertEquals(expectedQuote,  Entities.unescape(escapedQuote));

String escapedStringsWithHtmlSymbol = "&lt;p&gt;It simplifies working with real-world &lt;strong&gt;HTML&lt;/strong&gt; and &lt;strong&gt;XML&lt;/strong&gt;&lt;/p&gt;";
String expectedStringsWithHtmlSymbol = "<p>It simplifies working with real-world <strong>HTML</strong> and <strong>XML</strong></p>";
Assert.assertEquals(expectedStringsWithHtmlSymbol,  Entities.unescape(escapedStringsWithHtmlSymbol));

✅ 优势：HTML处理能力最强
⚠️ 注意：库体积相对较大

3. 总结

本文对比了四种主流方案处理HTML字符反转义：

方案	适用场景	核心优势
Apache Commons	通用项目	功能全面，社区支持好
Spring框架	Spring项目	零额外依赖
Unbescape	轻量需求	多格式支持
Jsoup	HTML密集处理	最强HTML解析能力

实际开发中建议：

Spring项目直接用HtmlUtils
需要复杂HTML处理时选Jsoup
轻量需求考虑Unbescape
通用场景Apache Commons最稳妥

所有示例代码可在GitHub仓库获取。

Persistence

REST

Security

1. 概述

2. 反转义HTML字符的几种方法

2.1. 使用Apache Commons的StringEscapeUtils

2.2. 使用Spring框架的HtmlUtils

2.3. 使用Unbescape的HtmlEscape

2.4. 使用Jsoup的Entities.unescape()

3. 总结

2.1. 使用Apache Commons的`StringEscapeUtils`

2.2. 使用Spring框架的`HtmlUtils`

2.3. 使用Unbescape的`HtmlEscape`

2.4. 使用Jsoup的`Entities.unescape()`