Skip to content

Methodology: Why & How Optimum Gateway Collects Metrics

The methodology behind Optimum Gateway’s telemetry design comes down to three guiding principles:

Capture the Critical Path

The system sits in the hot path of validator rewards: blocks, attestations, and gossip messages passing between Ethereum CL clients and the Optimum network acceleration layer. We need visibility into three things:

  • Throughput → Are we carrying the expected traffic volume?
  • Latency → Are we delivering faster than vanilla libp2p?
  • Reliability → Are we dropping or mangling messages?

Hence metrics are anchored at the ingress (CL → Gateway) and egress (Gateway → Optimum) points, and mirrored on the reverse path.

Measure Each Layer Separately

To debug propagation issues, we separate metrics into orthogonal categories:

Message Counters

  • Why: Know how much data flows through at all times.
  • How: Simple counters per topic (libp2p_total_messages, optimum_total_messages), and per-topic counters (_per_topic_total) to isolate heavy hitters.
  • Impact: Lets operators see if certain topics (e.g., beacon_block) are lagging behind.

Latency

  • Why: Ethereum validator rewards depend directly on milliseconds of propagation.
  • How:
    • Message propagation latency: time from sender to receiver (libp2p_propagation_latency).
    • Block arrival latency: time from slot start to block arrival.
    • Cross-gateway ETH latency: spread between fastest and slowest gateways seeing the same block.
  • Impact: Makes latency bottlenecks visible at message, block, and multi-gateway levels.

Message Size Distribution

  • Why: Large gossip messages can dominate bandwidth and CPU. Outliers often indicate malformed messages or DoS attempts.
  • How:
    • Histograms for raw size distribution (message_size_bytes).
    • Rolling 5 s windows for max/min/mean/median.
  • Impact: Separates “normal” beacon blocks (~300 KB) from anomalous blobs (>1 MB).

Aggregation Efficiency

  • Why: Gateway batches messages every ~25 ms (interval can change) before sending into Optimum network. Batching efficiency directly affects throughput and latency.
  • How:
    • aggregation_included_total counts messages per batch.
    • aggregation_message_size_bytes tracks how big aggregated blobs are.
  • Impact: Operators can tune batch size/interval tradeoffs (more batching = higher throughput but higher latency).

Peer Connectivity

  • Why: No peers = no propagation. Peer churn indicates network health issues.
  • How: Gauges for current peer counts, counters for connect/disconnect events.
  • Impact: Provides alerting hooks (e.g., “peer count dropped to zero”).

Error / Quality Signals

  • Why: If we’re passing garbage, propagation speed doesn’t matter.
  • How: Counters for “bad messages” (decode failures, failed publishes).
  • Impact: Helps detect malformed messages, protocol drift, or attack attempts.

Enable Operations & Research

These metrics are designed not just for “uptime dashboards” but for research-grade measurements:

  • Validator ROI: Combine block arrival latency + inclusion stats to estimate ETH/year impact.
  • Protocol Research: Compare gossip vs mumP2P protocol on throughput and delay distributions.
  • Operations: Peer counts, error rates, and bad messages support on-call debugging.
  • Security: Message size anomalies + bad message counters act as an early warning system.

checkout the metrics section