1. 概述

本文将深入探讨 Resilience4j 内部用于实现弹性机制的事件系统,以及在 Spring Boot 应用中如何通过接口查看这些事件

我们将复用 Spring Boot 集成 Resilience4j 指南 中的项目,演示 Resilience4j 如何在 actuator 接口下展示不同模式的事件。

2. 模式事件

该库内部使用事件驱动弹性模式的行为(允许或拒绝调用),作为核心通信机制。此外,这些事件为监控、可观测性以及故障排查提供了宝贵信息。

熔断器、重试、限流器、舱壁隔离和超时限制器实例发出的事件会分别存储在环形事件消费者缓冲区中。缓冲区大小可通过 eventConsumerBufferSize 属性配置,默认存储 100 个事件。

接下来我们将在 actuator 接口下查看每种模式的具体事件列表。

3. 熔断器

3.1. 配置

/api/circuit-breaker 接口定义的熔断器实例提供默认配置:

resilience4j.circuitbreaker:
  configs:
    default:
      registerHealthIndicator: true
      slidingWindowSize: 10
      minimumNumberOfCalls: 5
      permittedNumberOfCallsInHalfOpenState: 3
      automaticTransitionFromOpenToHalfOpenEnabled: true
      waitDurationInOpenState: 5s
      failureRateThreshold: 50
      eventConsumerBufferSize: 50
  instances:
    externalService:
      baseConfig: default

3.2. 事件

Resilience4j 通过以下 actuator 接口暴露熔断器相关事件:

http://localhost:8080/actuator/circuitbreakers

熔断器是最复杂的弹性机制,定义的事件类型也最多。其实现依赖状态机概念,通过事件触发状态转换。下面我们观察从初始 CLOSED 状态转换到 OPEN 状态再回到 CLOSED 状态时的事件序列。

✅ 成功调用会触发 CircuitBreakerOnSuccess 事件:

{
    "circuitBreakerName": "externalService",
    "type": "SUCCESS",
    "creationTime": "2023-03-22T16:45:26.349252+02:00",
    "errorMessage": null,
    "durationInMs": 526,
    "stateTransition": null
}

❌ 当熔断器处理失败请求时:

@Test
void testCircuitBreakerEvents() throws Exception {
    EXTERNAL_SERVICE.stubFor(WireMock.get("/api/external")
      .willReturn(serverError()));

    IntStream.rangeClosed(1, 5)
      .forEach(i -> {
        ResponseEntity<String> response = restTemplate.getForEntity("/api/circuit-breaker", String.class);
        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.INTERNAL_SERVER_ERROR);
      });
    ...
}

**失败请求会触发 CircuitBreakerOnErrorEvent**:

{
"circuitBreakerName": "externalService",
"type": "ERROR",
"creationTime": "2023-03-19T20:13:05.069002+02:00",
"errorMessage": "org.springframework.web.client.HttpServerErrorException$InternalServerError: 500 Server Error: \"{\"error\": \"Internal Server Error\"}\"",
"durationInMs": 519,
"stateTransition": null
}

⚠️ 这些成功/错误事件包含 durationInMs 属性,这是个有用的性能指标。

当失败率超过配置阈值时,实例会触发 CircuitBreakerOnFailureRateExceededEvent,导致状态转换为 OPEN 并触发 CircuitBreakerOnStateTransitionEvent

{
"circuitBreakerName": "externalService",
"type": "FAILURE_RATE_EXCEEDED",
"creationTime": "2023-03-19T20:13:07.554813+02:00",
"errorMessage": null,
"durationInMs": null,
"stateTransition": null
},
{
"circuitBreakerName": "externalService",
"type": "STATE_TRANSITION",
"creationTime": "2023-03-19T20:13:07.563623+02:00",
"errorMessage": null,
"durationInMs": null,
"stateTransition": "CLOSED_TO_OPEN"
}

从最后一个事件的 stateTransition 属性可见,熔断器处于 OPEN 状态。新的调用尝试会抛出 CallNotPermittedException,进而触发 CircuitBreakerOnCallNotPermittedEvent

{
    "circuitBreakerName": "externalService",
    "type": "NOT_PERMITTED",
    "creationTime": "2023-03-22T16:50:11.897977+02:00",
    "errorMessage": null,
    "durationInMs": null,
    "stateTransition": null
}

当配置的 waitDuration 过期后,熔断器将转换到中间状态 OPEN_TO_HALF_OPEN,同样通过 CircuitBreakerOnStateTransitionEvent 通知:

{
    "circuitBreakerName": "externalService",
    "type": "STATE_TRANSITION",
    "creationTime": "2023-03-22T16:50:14.787381+02:00",
    "errorMessage": null,
    "durationInMs": null,
    "stateTransition": "OPEN_TO_HALF_OPEN"
}

OPEN_TO_HALF_OPEN 状态下,如果配置的 minimumNumberOfCalls 次调用成功,将再次触发 CircuitBreakerOnStateTransitionEvent 切换回 CLOSED 状态:

{
    "circuitBreakerName": "externalService",
    "type": "STATE_TRANSITION",
    "creationTime": "2023-03-22T17:48:45.931978+02:00",
    "errorMessage": null,
    "durationInMs": null,
    "stateTransition": "HALF_OPEN_TO_CLOSED"
}

熔断器相关事件提供了实例性能和请求处理的洞察。通过分析这些事件,我们可以识别潜在问题并跟踪性能指标

4. 重试机制

4.1. 配置

/api/retry 接口创建重试实例:

resilience4j.retry:
  configs:
    default:
      maxAttempts: 3
      waitDuration: 100
  instances:
    externalService:
      baseConfig: default

4.2. 事件

查看重试模式在 actuator 接口下的事件:

http://localhost:8080/actuator/retryevents

当调用失败时,会根据配置进行重试:

@Test
void testRetryEvents()throws Exception {
    EXTERNAL_SERVICE.stubFor(WireMock.get("/api/external")
      .willReturn(serverError()));
    ResponseEntity<String> response = restTemplate.getForEntity("/api/retry", String.class);
     
    ...
}

每次重试尝试都会触发 RetryOnErrorEvent,重试实例根据配置安排下一次重试。事件包含 numberOfAttempts 计数字段:

{
"retryName": "retryApi",
"type": "RETRY",
"creationTime": "2023-03-19T22:57:51.458811+02:00",
"errorMessage": "org.springframework.web.client.HttpServerErrorException$InternalServerError: 500 Server Error: \"{\"error\": \"Internal Server Error\"}\"",
"numberOfAttempts": 1
}

当配置的重试次数用尽后,重试实例发布 RetryOnFailedEvent 并抛出底层异常:

{
"retryName": "retryApi",
"type": "ERROR",
"creationTime": "2023-03-19T23:30:11.440423+02:00",
"errorMessage": "org.springframework.web.client.HttpServerErrorException$InternalServerError: 500 Server Error: \"{\"error\": \"Internal Server Error\"}\"",
"numberOfAttempts": 3
}

重试机制通过这些事件决定是否继续重试或放弃并报告失败,反映了当前进程状态。监控这些事件有助于优化重试配置以获得最大收益。

5. 超时限制器

5.1. 配置

/api/time-limiter 接口定义超时限制器配置:

resilience4j.timelimiter:
  configs:
    default:
      cancelRunningFuture: true
      timeoutDuration: 2s
  instances:
    externalService:
      baseConfig: default

5.2. 事件

超时限制器事件在以下接口查看:

http://localhost:8080/actuator/timelimiterevents

超时限制器事件提供操作状态信息,实例根据事件决定允许请求完成或超时取消。

✅ 调用在配置时限内完成时触发 TimeLimiterOnSuccessEvent

{
    "timeLimiterName":"externalService",
    "type":"SUCCESS",
    "creationTime":"2023-03-20T20:48:43.089529+02:00"
}

❌ **调用在时限内失败时触发 TimeLimiterOnErrorEvent**:

{
    "timeLimiterName":"externalService",
    "type":"ERROR",
    "creationTime":"2023-03-20T20:49:12.089537+02:00"
}

由于 /api/time-limiter 接口实现了超过 timeoutDuration 的延迟,会导致调用超时。遇到 TimeoutException 后触发 TimeLimiterOnErrorEvent

@Test
void testTimeLimiterEvents() throws Exception {
    EXTERNAL_SERVICE.stubFor(WireMock.get("/api/external")
      .willReturn(ok()));
    ResponseEntity<String> response = restTemplate.getForEntity("/api/time-limiter", String.class);
        
    ...
}
{
    "timeLimiterName":"externalService",
    "type":"TIMEOUT",
    "creationTime":"2023-03-20T19:32:38.733874+02:00"
}

监控超时限制器事件可跟踪请求状态并排查超时问题,帮助优化响应时间。

6. 舱壁隔离

6.1. 配置

创建舱壁隔离实例:

resilience4j.bulkhead:
  configs:
    default:
      max-concurrent-calls: 3
      max-wait-duration: 1
  instances:
    externalService:
      baseConfig: default

6.2. 事件

在 actuator 接口查看舱壁隔离事件:

http://localhost:8080/actuator/bulkheadevents

观察当提交调用超过并发限制时的事件

@Test
void testBulkheadEvents() throws Exception {
    EXTERNAL_SERVICE.stubFor(WireMock.get("/api/external").willReturn(ok()));
    Map<Integer, Integer> responseStatusCount = new ConcurrentHashMap<>();
    ExecutorService executorService = Executors.newFixedThreadPool(5);

    List<Callable<Integer>> tasks = new ArrayList<>();
    IntStream.rangeClosed(1, 5)
      .forEach(
        i ->
          tasks.add(
            () -> {
            ResponseEntity<String> response =
              restTemplate.getForEntity("/api/bulkhead", String.class);
            return response.getStatusCodeValue();
            }));

    List<Future<Integer>> futures = executorService.invokeAll(tasks);
    for (Future<Integer> future : futures) {
      int statusCode = future.get();
      responseStatusCount.merge(statusCode, 1, Integer::sum);
    }
    ...
}

舱壁机制根据配置响应事件:允许或拒绝调用。当调用在并发限制内时,消耗可用槽位并触发 BulkheadOnCallPermittedEvent

{
    "bulkheadName":"externalService",
    "type":"CALL_PERMITTED",
    "creationTime":"2023-03-20T14:10:52.417063+02:00"
}

当达到并发限制时,**舱壁实例拒绝后续调用,抛出 BulkheadFullException 并触发 BulkheadOnCallRejectedEvent**:

{
    "bulkheadName":"externalService",
    "type":"CALL_REJECTED",
    "creationTime":"2023-03-20T14:10:52.419099+02:00"
}

调用完成(成功或失败)后释放槽位,触发 BulkheadOnCallFinishedEvent

{
    "bulkheadName":"externalService",
    "type":"CALL_FINISHED",
    "creationTime":"2023-03-20T14:10:52.500715+02:00"
}

观察舱壁事件有助于确保资源隔离,在高负载或故障期间维持稳定性能。通过跟踪允许/拒绝的调用次数,可优化舱壁配置以平衡服务可用性和资源保护。

7. 限流器

7.1. 配置

/api/rate-limiter 接口创建限流器实例:

resilience4j.ratelimiter:
  configs:
    default:
      limit-for-period: 5
      limit-refresh-period: 60s
      timeout-duration: 0s
      allow-health-indicator-to-fail: true
      subscribe-for-events: true
      event-consumer-buffer-size: 50
  instances:
    externalService:
      baseConfig: default

7.2. 事件

限流器事件在以下接口查看:

http://localhost:8080/actuator/ratelimiterevents

观察对 /api/rate-limiter 接口并发调用超过速率限制时的事件:

@Test
void testRateLimiterEvents() throws Exception {
    EXTERNAL_SERVICE.stubFor(WireMock.get("/api/external")
      .willReturn(ok()));

    IntStream.rangeClosed(1, 50)
      .forEach(i -> {
        ResponseEntity<String> response = restTemplate.getForEntity("/api/rate-limiter", String.class);
        int statusCode = response.getStatusCodeValue();
        responseStatusCount.put(statusCode, responseStatusCount.getOrDefault(statusCode, 0) + 1);
      });
        
    ...
}

初始请求成功获取令牌桶中的令牌,触发 RateLimiterOnSuccessEvent

{
    "rateLimiterName":"externalService",
    "type":"SUCCESSFUL_ACQUIRE",
    "creationTime":"2023-03-20T10:55:19.314306+02:00"
}

当配置的 limit-refresh-period 内令牌耗尽时,**后续调用导致 RequestNotPermitted 异常,触发 RateLimiterOnFailureEvent**:

{
    "rateLimiterName":"externalService",
    "type":"FAILED_ACQUIRE",
    "creationTime":"2023-03-20T12:48:28.623726+02:00"
}

限流器事件允许监控接口处理请求的速率。通过跟踪成功/失败获取事件,可评估速率限制是否合理,确保客户端获得良好服务的同时保护资源。

8. 总结

本文详细介绍了 Resilience4j 为熔断器、限流器、舱壁隔离和超时限制器模式发出的事件,以及访问这些事件的接口

完整源代码可在 GitHub 获取。


原始标题:Resilience4j Events Endpoints