Methodology: Why & How Optimum Gateway Collects Metrics

The methodology behind Optimum Gateway’s telemetry design comes down to three guiding principles:

Capture the Critical Path

The system sits in the hot path of validator rewards: blocks, attestations, and gossip messages passing between Ethereum CL clients and the Optimum network acceleration layer. We need visibility into three things:

Throughput → Are we carrying the expected traffic volume?
Latency → Are we delivering faster than vanilla libp2p?
Reliability → Are we dropping or mangling messages?

Hence metrics are anchored at the ingress (CL → Gateway) and egress (Gateway → Optimum) points, and mirrored on the reverse path.

Measure Each Layer Separately

To debug propagation issues, we separate metrics into orthogonal categories:

Message Counters

Why: Know how much data flows through at all times.
How: Simple counters per topic (libp2p_total_messages, optimum_total_messages), and per-topic counters (_per_topic_total) to isolate heavy hitters.
Impact: Lets operators see if certain topics (e.g., beacon_block) are lagging behind.

Latency

Why: Ethereum validator rewards depend directly on milliseconds of propagation.
How:
- Message propagation latency: time from sender to receiver (libp2p_propagation_latency).
- Block arrival latency: time from slot start to block arrival.
- Cross-gateway ETH latency: spread between fastest and slowest gateways seeing the same block.
Impact: Makes latency bottlenecks visible at message, block, and multi-gateway levels.

Message Size Distribution

Why: Large gossip messages can dominate bandwidth and CPU. Outliers often indicate malformed messages or DoS attempts.
How:
- Histograms for raw size distribution (message_size_bytes).
- Rolling 5 s windows for max/min/mean/median.
Impact: Separates “normal” beacon blocks (~300 KB) from anomalous blobs (>1 MB).

Aggregation Efficiency

Why: Gateway batches messages every ~25 ms (interval can change) before sending into Optimum network. Batching efficiency directly affects throughput and latency.
How:
- aggregation_included_total counts messages per batch.
- aggregation_message_size_bytes tracks how big aggregated blobs are.
Impact: Operators can tune batch size/interval tradeoffs (more batching = higher throughput but higher latency).

Peer Connectivity

Why: No peers = no propagation. Peer churn indicates network health issues.
How: Gauges for current peer counts, counters for connect/disconnect events.
Impact: Provides alerting hooks (e.g., “peer count dropped to zero”).

Error / Quality Signals

Why: If we’re passing garbage, propagation speed doesn’t matter.
How: Counters for “bad messages” (decode failures, failed publishes).
Impact: Helps detect malformed messages, protocol drift, or attack attempts.

Enable Operations & Research

These metrics are designed not just for “uptime dashboards” but for research-grade measurements:

Validator ROI: Combine block arrival latency + inclusion stats to estimate ETH/year impact.
Protocol Research: Compare gossip vs mumP2P protocol on throughput and delay distributions.
Operations: Peer counts, error rates, and bad messages support on-call debugging.
Security: Message size anomalies + bad message counters act as an early warning system.

checkout the metrics section

Methodology: Why & How Optimum Gateway Collects Metrics ​

Capture the Critical Path ​

Measure Each Layer Separately ​

Message Counters ​

Latency ​

Message Size Distribution ​

Aggregation Efficiency ​

Peer Connectivity ​

Error / Quality Signals ​

Enable Operations & Research ​