Introduce jitter metric based on RFC 1889 Appendix A#8586
Introduce jitter metric based on RFC 1889 Appendix A#8586GGraziadei wants to merge 8 commits intoapache:masterfrom
Conversation
|
Question on the if (d <= 0) {
return;
}Is this skip intentional? Reading RFC 3550 §A.8 (which supersedes RFC 1889 with the same text): d = transit - s->transit;
s->transit = transit;
if (d < 0) d = -d;
s->jitter += (1./16.) * ((double)d - s->jitter);The update is unconditional. When For comparison, the major RTP stacks all apply the update unconditionally:
The test Suggested fix: drop the Was the skip intentional? |
|
Hello @rzo1 thank you for your comment.
|
rzo1
left a comment
There was a problem hiding this comment.
Follow-up review now that the d <= 0 skip is fixed. Most items are nits; the ones I'd want resolved before merge are the config-key naming (smoothing_factor vs the rest of Storm's dot-case keys), the RFC 1889a metric-name suffix (load-bearing for downstream dashboards once shipped), and the validator/runtime mismatch.
One general note that doesn't fit any single line: enabling jitter triples the per-(component:stream) gauge count on every task. On topologies with lots of streams that's a meaningful cardinality bump for the metrics backend (and most TSDBs charge per series). Worth a sentence in docs/Metrics.md warning operators that enabling this multiplies stream-keyed metrics 3×.
| topology.max.spout.pending: null # ideally should be larger than topology.producer.batch.size. (esp. if topology.batch.flush.interval.millis=0) | ||
| topology.state.synchronization.timeout.secs: 60 | ||
| topology.stats.sample.rate: 0.05 | ||
| topology.stats.ewma.enable: false |
There was a problem hiding this comment.
topology.stats.ewma.smoothing_factor isn't represented here even though it's a tunable. Either add topology.stats.ewma.smoothing.factor: 0.0625 (or null with a comment about the RFC1889_ALPHA fallback) so operators can discover the knob via defaults.yaml, or call out the default in the Config.java Javadoc — right now the 1/16 default is invisible outside the source.
| * @see <a href="https://www.rfc-editor.org/rfc/rfc1889#appendix-A.8">RFC 1889 Appendix A.8</a> | ||
| */ | ||
| @CustomValidator(validatorClass = ConfigValidation.EwmaSmoothingFactorValidator.class) | ||
| public static final String TOPOLOGY_STATS_EWMA_SMOOTHING_FACTOR = "topology.stats.ewma.smoothing_factor"; |
There was a problem hiding this comment.
Underscore breaks Storm's dot.case config convention. Every neighbour key uses dots (topology.stats.sample.rate, topology.builtin.metrics.bucket.size.secs, …). Suggest topology.stats.ewma.smoothing.factor. Worth changing now — once released it's a public surface.
| @IsPositiveNumber | ||
| public static final String TOPOLOGY_STATS_SAMPLE_RATE = "topology.stats.sample.rate"; | ||
| /** | ||
| * Enabling jitter streaming calculation (RFC 1889a). |
There was a problem hiding this comment.
Minor: "RFC 1889a" isn't a real RFC label — the algorithm lives in Appendix A.8 of RFC 1889 (and was carried unchanged into RFC 3550 §A.8). Suggest "RFC 1889 §A.8" or "RFC 3550 §A.8" everywhere it appears in this PR (Javadoc, metric names, docs). RFC 1889 has been obsolete since 2003, so a forward link to RFC 3550 is friendlier to readers.
| * Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version | ||
| * 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at | ||
| * <p> |
There was a problem hiding this comment.
<p> tags in the license header are inconsistent with the rest of storm-client (the standard ASF header uses blank lines between paragraphs — see e.g. TaskMetrics.java in this same PR). Same applies to EwmaGaugeTest.java (<p/>) and TaskMetricsTest.java (<p/>). Looks like an IDE auto-formatter artifact.
| import java.util.concurrent.atomic.AtomicLong; | ||
|
|
||
| /** | ||
| * Lock-free jitter estimator following RFC 1889 Section 6.3.1. |
There was a problem hiding this comment.
RFC 1889 §6.3.1 is the Receiver Reports section, not the jitter algorithm. The algorithm is in §A.8. Suggest: Lock-free jitter estimator following RFC 1889 §A.8 / RFC 3550 §A.8.
| private static final String METRIC_NAME_PROCESS_RFC_1889a_JITTER = "__process-rfc1889a-jitter"; | ||
| private static final String METRIC_NAME_COMPLETE_LATENCY = "__complete-latency"; | ||
| private static final String METRIC_NAME_COMPLETE_RFC_1889a_JITTER = "__complete-rfc1889a-jitter"; | ||
| private static final String METRIC_NAME_EXECUTE_LATENCY = "__execute-latency"; | ||
| private static final String METRIC_NAME_EXECUTE_RFC_1889a_JITTER = "__execute-rfc1889a-jitter"; |
There was a problem hiding this comment.
Two issues with these metric names:
rfc1889asuffix isn't a real RFC label — the algorithm is in Appendix A.8 of RFC 1889. Theareads like a version suffix. Suggest__complete-rfc1889-jitteretc., or simply__complete-jitter(the RFC reference belongs inMetrics.md, not in every metric series name).- Inconsistent identifier casing — the constants spell it
RFC_1889a(camel-case mid-word with lowercasea). Convention in this file is all-uppercase:RFC_1889_JITTERorRFC1889_JITTER.
These metric names become public API the moment the PR ships — much easier to fix now than after dashboards are wired up.
|
|
||
| private final ConcurrentMap<String, RateCounter> rateCounters = new ConcurrentHashMap<>(); | ||
| private final ConcurrentMap<String, RollingAverageGauge> gauges = new ConcurrentHashMap<>(); | ||
| private final ConcurrentMap<String, Gauge> gauges = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Going to a raw ConcurrentMap<String, Gauge> loses type info that the previous ConcurrentMap<String, RollingAverageGauge> had. Cleaner would be ConcurrentMap<String, Gauge<?>> — that keeps the wildcard typing, lets getOrCreateGauge work unchanged, and the only raw cast moves to registerGauge where you already have @SuppressWarnings({"unchecked", "rawtypes"}).
| return getOrCreateGauge(metricName, streamId, RollingAverageGauge.class, this.rollingAverageGaugeFactory); | ||
| } | ||
|
|
||
| private EwmaGauge getExponentialWeightedMobileAverageGauge(String metricName, String streamId) { |
There was a problem hiding this comment.
Typo: "Mobile" → "Moving". EWMA = Exponentially Weighted Moving Average. (The field name ewmaSmoothingFactor is fine; just the helper method.)
| if (o instanceof Number) { | ||
| double alpha = ((Number) o).doubleValue(); | ||
| if (alpha > 0.0 && alpha < 1.0) { | ||
| return; | ||
| } | ||
| } | ||
| throw new IllegalArgumentException( | ||
| "Field " + name + " must be a number in the open interval (0, 1), got: " + o); |
There was a problem hiding this comment.
This validator only accepts Number, but ConfigUtils.ewmaSmoothingFactor calls ObjectReader.getDouble(value) which also parses strings like "0.5". Other numeric configs in Storm (e.g. topology.stats.sample.rate) accept stringified numbers from YAML, so the asymmetry is surprising — a value would be rejected at validation time but accepted at runtime if validation is bypassed (programmatic conf, tests, etc.).
Suggest mirroring what the runtime does: try ObjectReader.getDouble(o) inside a try/catch, then range-check. That way the validator and the runtime parser agree.
| @Mock private RollingAverageGauge rollingAverageGauge; | ||
| @Mock private EwmaGauge ewmaGauge; |
There was a problem hiding this comment.
These two @Mock fields are declared but never referenced anywhere in the test class — Mockito still spends init cost on them. Drop both.
What is the purpose of the change
In deterministic real-time processing, predictability of latency is as important as latency itself. This is a constraint to building a deterministic system.
To ensure negligible performance impact, I propose to use an Exponentially Weighted Moving Average (EWMA), following RFC 1889 logic https://www.rfc-editor.org/rfc/rfc1889#appendix-A.8
Mathematical Model:
J_new = J_old + (|D_current - D_previous| - J_old) * smoothing_factor
Performance impact
How was the change tested
Config,TaskMetrics,EwmaGaugemetrics2doesn't affect it.worker_log.zip
Example results in worker logs
In the context of #8583