1. 概述

Netflix Servo 是 Java 应用的指标监控工具,类似于 Dropwizard Metrics 但更简单。它仅依赖 JMX 提供指标暴露和发布的接口。本文将介绍 Servo 的核心功能及如何用它收集和发布应用指标。

2. Maven 依赖

在开始前,先添加 Servo 核心依赖到 pom.xml

<dependency>
    <groupId>com.netflix.servo</groupId>
    <artifactId>servo-core</artifactId>
    <version>0.13.2</version>
</dependency>

还有多个扩展可用(如 Servo-Apache、Servo-AWS),后续可能需要。最新版本可在 Maven Central 查找。

3. 收集指标

Servo 提供四种核心指标类型:Counter、Gauge、Timer 和 Informational。

3.1. 指标类型 - Counter

Counter 用于记录增量值,常用实现包括:

  • BasicCounter:基础计数器
  • PeakRateCounter:峰值速率计数器
  • StepCounter:步长计数器

基础计数器示例

Counter counter = new BasicCounter(MonitorConfig.builder("test").build());
assertEquals("counter should start with 0", 0, counter.getValue().intValue());

counter.increment();
assertEquals("counter should have increased by 1", 1, counter.getValue().intValue());

counter.increment(-1);
assertEquals("counter should have decreased by 1", 0, counter.getValue().intValue());

峰值速率计数器(记录每秒最大值):

Counter counter = new PeakRateCounter(MonitorConfig.builder("test").build());
assertEquals("counter should start with 0", 0, counter.getValue().intValue());

counter.increment();
SECONDS.sleep(1);

counter.increment();
counter.increment();

assertEquals("peak rate should have be 2", 2, counter.getValue().intValue());

步长计数器(记录上一轮询间隔的速率):

System.setProperty("servo.pollers", "1000"); // 设置轮询间隔为1秒
Counter counter = new StepCounter(MonitorConfig.builder("test").build());

assertEquals("counter should start with rate 0.0", 0.0, counter.getValue());

counter.increment();
SECONDS.sleep(1);

assertEquals("counter rate should have increased to 1.0", 1.0, counter.getValue());

⚠️ 踩坑提示:默认轮询间隔是60秒和10秒,需通过 servo.pollers 系统属性调整。

3.2. 指标类型 - Gauge

Gauge 返回当前值,常用实现:

  • BasicGauge:基础值监控
  • MaxGauge/MinGauge:最大/最小值跟踪
  • NumberGauge:数值包装(需保证线程安全)

基础 Gauge 示例

Gauge<Double> gauge = new BasicGauge<>(MonitorConfig.builder("test")
  .build(), () -> 2.32);
assertEquals(2.32, gauge.getValue(), 0.01);

最大值 Gauge

MaxGauge gauge = new MaxGauge(MonitorConfig.builder("test").build());
assertEquals(0, gauge.getValue().intValue());

gauge.update(4);
assertEquals(4, gauge.getCurrentValue(0));

gauge.update(1);
assertEquals(4, gauge.getCurrentValue(0)); // 仍保持最大值

3.3. 指标类型 - Timer

Timer 测量事件耗时,常用实现:

  • BasicTimer:基础计时器
  • StatsTimer:统计计时器(提供丰富统计信息)
  • BucketTimer:分桶计时器(按范围分组)

基础计时器

BasicTimer timer = new BasicTimer(MonitorConfig.builder("test").build(), SECONDS);
Stopwatch stopwatch = timer.start();

SECONDS.sleep(1);
timer.record(2, SECONDS);
stopwatch.stop();

assertEquals("timer should count 1 second", 1, timer.getValue().intValue());
assertEquals("timer should count 3 seconds in total", 3.0, timer.getTotalTime(), 0.01);
assertEquals("timer should record 2 updates", 2, timer.getCount().intValue());

统计计时器(提供百分位、方差等):

System.setProperty("netflix.servo", "1000");
StatsTimer timer = new StatsTimer(MonitorConfig
  .builder("test")
  .build(), new StatsConfig.Builder()
  .withComputeFrequencyMillis(2000)
  .withPercentiles(new double[] { 99.0, 95.0, 90.0 })
  .withPublishMax(true)
  .withPublishMin(true)
  .withPublishCount(true)
  .withPublishMean(true)
  .withPublishStdDev(true)
  .withPublishVariance(true)
  .build(), SECONDS);

// ...(计时操作代码)

final Map<String, Number> metricMap = timer.getMonitors().stream()
  .collect(toMap(monitor -> getMonitorTagValue(monitor, "statistic"),
    monitor -> (Number) monitor.getValue()));
 
assertThat(metricMap.keySet(), containsInAnyOrder(
  "count", "totalTime", "max", "min", "variance", "stdDev", "avg", 
  "percentile_99", "percentile_95", "percentile_90"));

分桶计时器(按时间范围分组):

BucketTimer timer = new BucketTimer(MonitorConfig
  .builder("test")
  .build(), new BucketConfig.Builder()
  .withBuckets(new long[] { 2L, 5L }) // 定义分桶边界
  .withTimeUnit(SECONDS)
  .build(), SECONDS);

timer.record(3); // 落入5s桶
timer.record(6); // 落入overflow桶

Map<String, Long> metricMap = timer.getMonitors().stream()
  .filter(monitor -> monitor.getConfig().getTags().containsKey("servo.bucket"))
  .collect(toMap(
    m -> getMonitorTagValue(m, "servo.bucket"),
    m -> (Long) m.getValue()));

assertThat(metricMap, allOf(hasEntry("bucket=2s", 0L), hasEntry("bucket=5s", 1L),
  hasEntry("bucket=overflow", 1L)));

3.4. 指标类型 - Informational

Informational 记录描述性信息,仅实现 BasicInformational

BasicInformational informational = new BasicInformational(
  MonitorConfig.builder("test").build());
informational.setValue("information collected");

3.5. MonitorRegistry

所有指标类型都继承自 Monitor。注册监控器是发布指标的前提:

  • 单例注册:通过 DefaultMonitorRegistry
  • 动态注册:使用 DynamicCounter/DynamicTimer
  • 注解注册:通过 @Monitor@MonitorTags

基础注册

Gauge<Double> gauge = new BasicGauge<>(MonitorConfig.builder("test")
  .build(), () -> 2.32);
DefaultMonitorRegistry.getInstance().register(gauge);

动态注册(⚠️ 性能警告:每次更新都触发查找):

DynamicCounter.increment("monitor-name", "tag-key", "tag-value");

注解注册(反射自动处理):

@Monitor(
  name = "integerCounter",
  type = DataSourceType.COUNTER,
  description = "Total number of update operations.")
private AtomicInteger updateCount = new AtomicInteger(0);

@MonitorTags
private TagList tags = new BasicTagList(
  newArrayList(new BasicTag("tag-key", "tag-value")));

@Test
public void givenAnnotatedMonitor_whenUpdated_thenDataCollected() throws Exception {
    System.setProperty("servo.pollers", "1000");
    Monitors.registerObject("testObject", this); // 自动注册注解字段
    assertTrue(Monitors.isObjectRegistered("testObject", this));

    updateCount.incrementAndGet();
    updateCount.incrementAndGet();
    SECONDS.sleep(1);

    List<List<Metric>> metrics = observer.getObservations();
    // ...(断言验证)
}

4. 发布指标

收集指标后,可通过轮询机制发布到外部系统。

4.1. MetricPoller

MetricPoller 负责从数据源拉取指标:

  • MonitorRegistryMetricPoller:从注册表获取
  • JvmMetricPoller:JVM 指标
  • JmxMetricPoller:JMX 指标
  • ✅ 扩展支持:Apache/Tomcat 指标

JVM 指标轮询示例

MemoryMetricObserver observer = new MemoryMetricObserver();
PollRunnable pollRunnable = new PollRunnable(new JvmMetricPoller(),
  new BasicMetricFilter(true), observer);
PollScheduler.getInstance().start();
PollScheduler.getInstance().addPoller(pollRunnable, 1, SECONDS); // 每秒轮询

SECONDS.sleep(1);
PollScheduler.getInstance().stop();
List<List<Metric>> metrics = observer.getObservations();

assertThat(metrics, hasSize(greaterThanOrEqualTo(1)));
List<String> keys = extractKeys(metrics);
assertThat(keys, hasItems("loadedClassCount", "initUsage", "maxUsage", "threadCount"));

4.2. MetricObserver

MetricObserver 处理轮询到的指标:

  • MemoryMetricObserver:内存存储
  • FileMetricObserver:文件存储
  • AsyncMetricObserver:异步处理
  • ✅ 扩展支持:Atlas/CloudWatch/Graphite

自定义 Observer

public class CustomObserver extends BaseMetricObserver {
    @Override
    public void updateImpl(List<Metric> metrics) {
        // 实现自定义发布逻辑
    }
}

4.3. 发布到 Netflix Atlas

Atlas 是 Netflix 的时间序列数据库,适合存储 Servo 指标。

添加依赖

<dependency>
    <groupId>com.netflix.servo</groupId>
    <artifactId>servo-atlas</artifactId>
    <version>0.13.2</version>
</dependency>

启动 Atlas 服务

curl -LO 'https://github.com/Netflix/atlas/releases/download/v1.4.4/atlas-1.4.4-standalone.jar'
curl -LO 'https://raw.githubusercontent.com/Netflix/atlas/v1.4.x/conf/memory.conf'
java -jar atlas-1.4.4-standalone.jar memory.conf

配置 Atlas 发布

System.setProperty("servo.pollers", "1000");
System.setProperty("servo.atlas.batchSize", "1");
System.setProperty("servo.atlas.uri", "http://localhost:7101/api/v1/publish");
AtlasMetricObserver observer = new AtlasMetricObserver(
  new BasicAtlasConfig(), BasicTagList.of("servo", "counter"));

PollRunnable task = new PollRunnable(
  new MonitorRegistryMetricPoller(), new BasicMetricFilter(true), observer);

发布指标验证

Counter counter = new BasicCounter(MonitorConfig
  .builder("test")
  .withTag("servo", "counter")
  .build());
DefaultMonitorRegistry.getInstance().register(counter);

// 模拟指标变化
for (int i = 0; i < 3; i++) {
    counter.increment(RandomUtils.nextInt(10));
    SECONDS.sleep(1);
    counter.increment(-1 * RandomUtils.nextInt(10));
    SECONDS.sleep(1);
}

assertThat(atlasValuesOfTag("servo"), containsString("counter"));

Atlas 图表生成graph

5. 总结

Netflix Servo 提供了轻量级的 Java 指标监控方案:

  1. ✅ 四种核心指标类型满足不同场景
  2. ✅ 灵活的注册机制(单例/动态/注解)
  3. ✅ 丰富的发布扩展(Atlas/CloudWatch/Graphite)

与 Dropwizard Metrics 的对比可参考此文。完整代码见 GitHub


原始标题:Introdution to Netflix Servo | Baeldung