Methodology: Why & How Optimum Gateway Collects Metrics
The methodology behind Optimum Gateway’s telemetry design comes down to three guiding principles:
Capture the Critical Path
The system sits in the hot path of validator rewards
: blocks, attestations, and gossip messages passing between Ethereum CL clients and the Optimum network acceleration layer. We need visibility into three things:
Throughput
→ Are we carrying the expected traffic volume?Latency
→ Are we delivering faster than vanilla libp2p?Reliability
→ Are we dropping or mangling messages?
Hence metrics are anchored at the ingress (CL → Gateway) and egress (Gateway → Optimum) points, and mirrored on the reverse path.
Measure Each Layer Separately
To debug propagation issues, we separate metrics into orthogonal categories:
Message Counters
Why:
Know how much data flows through at all times.How:
Simple counters per topic (libp2p_total_messages, optimum_total_messages), and per-topic counters (_per_topic_total) to isolate heavy hitters.Impact:
Lets operators see if certain topics (e.g., beacon_block) are lagging behind.
Latency
Why:
Ethereum validator rewards depend directly on milliseconds of propagation.How:
Message propagation latency
: time from sender to receiver (libp2p_propagation_latency).Block arrival latency
: time from slot start to block arrival.Cross-gateway ETH latency
: spread between fastest and slowest gateways seeing the same block.
Impact
: Makes latency bottlenecks visible at message, block, and multi-gateway levels.
Message Size Distribution
Why:
Large gossip messages can dominate bandwidth and CPU. Outliers often indicate malformed messages or DoS attempts.How:
- Histograms for raw size distribution (message_size_bytes).
- Rolling 5 s windows for max/min/mean/median.
Impact:
Separates “normal” beacon blocks (~300 KB) from anomalous blobs (>1 MB).
Aggregation Efficiency
Why:
Gateway batches messages every~25 ms
(interval can change) before sending into Optimum network. Batching efficiency directly affects throughput and latency.How:
- aggregation_included_total counts messages per batch.
- aggregation_message_size_bytes tracks how big aggregated blobs are.
Impact:
Operators can tune batch size/interval tradeoffs (more batching = higher throughput but higher latency).
Peer Connectivity
Why:
No peers = no propagation. Peer churn indicates network health issues.How:
Gauges for current peer counts, counters for connect/disconnect events.Impact:
Provides alerting hooks (e.g., “peer count dropped to zero”).
Error / Quality Signals
Why:
If we’re passing garbage, propagation speed doesn’t matter.How:
Counters for “bad messages” (decode failures, failed publishes).Impact:
Helps detect malformed messages, protocol drift, or attack attempts.
Enable Operations & Research
These metrics are designed not just for “uptime dashboards” but for research-grade measurements
:
Validator ROI:
Combine block arrival latency + inclusion stats to estimate ETH/year impact.Protocol Research:
Compare gossip vs mumP2P protocol on throughput and delay distributions.Operations:
Peer counts, error rates, and bad messages support on-call debugging.Security:
Message size anomalies + bad message counters act as an early warning system.
checkout the metrics section