Methodology: Why & How Optimum Gateway Collects Metrics
The methodology behind Optimum Gateway’s telemetry design comes down to three guiding principles:
Capture the Critical Path
The system sits in the hot path of validator rewards: blocks, attestations, and gossip messages passing between Ethereum CL clients and the Optimum network acceleration layer. We need visibility into three things:
Throughput→ Are we carrying the expected traffic volume?Latency→ Are we delivering faster than vanilla libp2p?Reliability→ Are we dropping or mangling messages?
Hence metrics are anchored at the ingress (CL → Gateway) and egress (Gateway → Optimum) points, and mirrored on the reverse path.
Measure Each Layer Separately
To debug propagation issues, we separate metrics into orthogonal categories:
Message Counters
Why:Know how much data flows through at all times.How:Simple counters per topic (libp2p_total_messages, optimum_total_messages), and per-topic counters (_per_topic_total) to isolate heavy hitters.Impact:Lets operators see if certain topics (e.g., beacon_block) are lagging behind.
Latency
Why:Ethereum validator rewards depend directly on milliseconds of propagation.How:Message propagation latency: time from sender to receiver (libp2p_propagation_latency).Block arrival latency: time from slot start to block arrival.Cross-gateway ETH latency: spread between fastest and slowest gateways seeing the same block.
Impact: Makes latency bottlenecks visible at message, block, and multi-gateway levels.
Message Size Distribution
Why:Large gossip messages can dominate bandwidth and CPU. Outliers often indicate malformed messages or DoS attempts.How:- Histograms for raw size distribution (message_size_bytes).
- Rolling 5 s windows for max/min/mean/median.
Impact:Separates “normal” beacon blocks (~300 KB) from anomalous blobs (>1 MB).
Aggregation Efficiency
Why:Gateway batches messages every~25 ms(interval can change) before sending into Optimum network. Batching efficiency directly affects throughput and latency.How:- aggregation_included_total counts messages per batch.
- aggregation_message_size_bytes tracks how big aggregated blobs are.
Impact:Operators can tune batch size/interval tradeoffs (more batching = higher throughput but higher latency).
Peer Connectivity
Why:No peers = no propagation. Peer churn indicates network health issues.How:Gauges for current peer counts, counters for connect/disconnect events.Impact:Provides alerting hooks (e.g., “peer count dropped to zero”).
Error / Quality Signals
Why:If we’re passing garbage, propagation speed doesn’t matter.How:Counters for “bad messages” (decode failures, failed publishes).Impact:Helps detect malformed messages, protocol drift, or attack attempts.
Enable Operations & Research
These metrics are designed not just for “uptime dashboards” but for research-grade measurements:
Validator ROI:Combine block arrival latency + inclusion stats to estimate ETH/year impact.Protocol Research:Compare gossip vs mump2p protocol on throughput and delay distributions.Operations:Peer counts, error rates, and bad messages support on-call debugging.Security:Message size anomalies + bad message counters act as an early warning system.
checkout the metrics section

