Java HTML净化：防御XSS攻击实战指南

1. 引言

跨站脚本攻击（XSS）是Web应用中最常见的安全漏洞之一。攻击者通过注入恶意脚本，在用户浏览器中执行任意代码，可能导致数据窃取、会话劫持或页面篡改。本文将介绍如何在Java应用中通过HTML净化技术有效防御XSS攻击。

2. 项目配置

首先添加OWASP Java HTML净化器依赖到pom.xml：

<dependency>
    <groupId>com.googlecode.owasp-java-html-sanitizer</groupId>
    <artifactId>owasp-java-html-sanitizer</artifactId>
    <version>20240325.1</version>
</dependency>

✅ 该库提供高度可配置的策略驱动净化器，能处理复杂HTML的同时防御XSS攻击。

3. 基础OWASP HTML净化实现

创建工具类实现基础HTML净化：

public class HtmlSanitizerUtil {
    private static final PolicyFactory POLICY = Sanitizers.FORMATTING.and(Sanitizers.LINKS);

    public static String sanitize(String htmlContent) {
        return POLICY.sanitize(htmlContent);
    }
}

核心要点：

组合FORMATTING和LINKS两个内置策略
允许<b>, <i>, <u>等基础格式标签
允许<a>超链接标签
sanitize()方法应用策略返回净化后的HTML

测试用例验证：

String input = "<script>alert('XSS')</script><b>Hello</b> <a href='https://example.com'>link</a>";
String expectedOutput = "<b>Hello</b> <a href=\"https://example.com\" rel=\"nofollow\">link</a>";

String sanitized = HtmlSanitizerUtil.sanitize(input);
assertEquals(expectedOutput, sanitized);

⚠️ 净化结果：

移除<script>恶意脚本
保留安全标签
自动为链接添加rel="nofollow"属性

4. 使用HtmlPolicyBuilder实现灵活净化

当需要更精细控制时，使用HtmlPolicyBuilder构建自定义策略：

private static final PolicyFactory POLICY = new HtmlPolicyBuilder()
  .allowCommonBlockElements()
  .allowCommonInlineFormattingElements()
  .toFactory();

public static String sanitize(String html) {
    return POLICY.sanitize(html);
}

策略特点：

允许<div>, <p>, <ul>, <ol>等块级元素
允许<b>, <i>, <em>等内联元素
PolicyFactory线程安全，可复用

测试验证：

String input = "<div onclick='alert(1)'><p><b>Text</b></p></div><script>alert('x')</script>";
String expectedOutput = "<div><p><b>Text</b></p></div>";

String sanitized = HtmlSanitizer.sanitize(input);
assertEquals(expectedOutput, sanitized);

❌ 净化结果：

移除onclick事件处理器
移除<script>标签
保留安全结构标签

5. 创建自定义策略

针对复杂场景构建精细化策略：

public class CustomHtmlSanitizer {
    private static final PolicyFactory POLICY = new HtmlPolicyBuilder()
      .allowElements("a", "p", "div", "span", "h1", "h2", "h3")
      .allowUrlProtocols("https")
      .allowAttributes("href").onElements("a")
      .requireRelNofollowOnLinks()
      .allowAttributes("class").globally()
      .allowStyling()
      .toFactory();

    public static String sanitize(String html) {
        return POLICY.sanitize(html);
    }
}

策略规则：

允许特定元素：<a>, <p>, <div>, <span>, <h1>-<h3>
仅允许HTTPS协议链接
限制href属性仅用于<a>标签
自动为链接添加rel="nofollow"
全局允许class属性
允许安全CSS样式（如color, font-weight）

测试用例：

String input = "<h1 class='title' style='color:red;'>Welcome</h1>"
  + "<a href='https://example.com' onclick='stealCookies()'>Click</a>"
  + "<script>alert('xss');</script>";

String expectedOutput = 
  "<h1 class=\"title\" style=\"color:red\">Welcome</h1><a href=\"https://example.com\" rel=\"nofollow\">Click</a>";

String sanitized = CustomHtmlSanitizer.sanitize(input);
assertEquals(expectedOutput, sanitized);

✅ 适用于博客评论、CMS内容等需要保留部分格式的场景。

6. 替代方案：JSoup HTML Cleaner

当需要HTML解析和操作时，JSoup是更好的选择：

添加依赖：

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.20.1</version>
</dependency>

实现净化器：

public class JsoupHtmlSanitizer {
    public static String sanitize(String html) {
        Safelist safelist = Safelist.basic()
          .addTags("h1", "h2", "h3")
          .addAttributes("a", "target")
          .addProtocols("a", "href", "http", "https");
        
        return Jsoup.clean(html, safelist);
    }
}

策略特点：

基于Safelist模型
基础标签 + 标题标签
允许target属性（用于新标签页打开）
限制链接协议为HTTP/HTTPS

测试验证：

String input = "<h1 onclick='x()'>Title</h1><a href='javascript:alert(1)' target='_blank'>Click</a>";
String expectedOutput = "<h1>Title</h1><a target=\"_blank\" rel=\"nofollow\">Click</a>";

String sanitized = JsoupHtmlSanitizer.sanitize(input);
assertEquals(expectedOutput, sanitized);

⚠️ JSoup自动为target="_blank"的链接添加rel="nofollow"，防止反向标签页劫持攻击。

7. 总结

防御XSS攻击的HTML净化方案对比：

场景	推荐方案	优势
严格安全策略	OWASP HTML Sanitizer	精细控制，策略驱动
HTML解析操作	JSoup	DOM操作能力强
简单需求	JSoup Safelist	直观易用

✅ 关键实践：

始终净化用户输入的HTML
根据场景选择合适工具
定期更新净化策略
结合其他安全措施（如CSP）

完整代码示例可在GitHub仓库获取。

Persistence

REST

Security