High Availability Topologies for API Gateways

Q: What is the difference between active-active and active-passive gateway deployments?

Active-active sends live traffic to all nodes simultaneously, requiring distributed state synchronization and consistent hashing. Active-passive keeps a warm standby that only takes traffic after the primary fails, simplifying state management at the cost of recovery latency (typically 15–60 seconds for DNS-based failover).

Q: How do I prevent retry storms in a high-availability gateway setup?

Implement per-route retry budgets with jittered exponential backoff, set a maximum retry percentage (e.g., Envoy's max_retries: 3 with retry_on: 5xx,reset), and expose retry metrics so SREs can correlate upstream degradation with downstream amplification before a thundering herd forms.

Q: What health-check interval should I use for API gateway nodes?

Use a 10-second interval with a 2-second timeout, requiring 3 consecutive failures before ejecting a node and 2 consecutive successes before re-admitting it. This balances detection speed against flap sensitivity. For latency-sensitive paths, shorten the interval to 5 seconds but tighten the unhealthy threshold to 2.

Q: Can I use Kubernetes Ingress for a highly available gateway?

Standard Kubernetes Ingress lacks advanced traffic-splitting, weighted routing, and fine-grained health-check controls needed for production HA. Migrate to the Kubernetes Gateway API (with an Envoy or Kong data-plane) or deploy an external gateway such as Kong Gateway 3.x in DB-less mode for declarative, version-controlled configuration with zero control-plane SPOF.

Designing resilient request routing requires deliberate architectural planning that anticipates node failures, network partitions, and regional outages. This page covers the deployment models, middleware resilience patterns, and automated recovery workflows that form the HA layer within the broader API Gateway Fundamentals & Architecture domain. High availability extends well beyond simple redundancy: it demands deterministic control-plane synchronization, stateless data-plane execution, and explicit traffic isolation boundaries. Engineering teams must treat the gateway as a distributed system with defined failure domains — not a monolithic routing proxy.

Architectural Baseline

Before choosing a topology, engineers need a shared mental model of how gateways separate concerns:

Control plane — stores route configuration, plugin state, and certificate metadata. Examples: Kong’s PostgreSQL or DB-less declarative config, Envoy’s xDS management server, Tyk’s Redis-backed dashboard.
Data plane — the hot path that evaluates every inbound request against loaded routes and policies. In most production deployments the data plane must survive a control-plane outage without dropping traffic.
Health-check layer — synthetic probes (active) plus connection-state signals (passive) that determine when a node or upstream is ineligible to receive traffic.

Choosing a topology without understanding which layer carries which failure risk is the most common root cause of extended outages. A control-plane failure in an active-passive setup, for instance, blocks failover promotion even though the standby data plane is fully operational.

Additionally, ensure you have evaluated gateway selection criteria before committing to a topology — the gateway’s native node-grouping model (e.g., Kong DB-less vs. DB-backed, Envoy with a central xDS server vs. standalone) constrains which HA patterns are feasible without introducing custom glue code.

Topology Overview

The diagram below illustrates how active-active, active-passive, and multi-region deployments relate to each other and to the shared control plane.

Primary Concept: Active-Passive vs Active-Active

Active-Passive

Active-passive configurations route all production traffic through a primary instance while maintaining a synchronized standby. This model simplifies state management but introduces measurable recovery latency during control-plane failover.

In production, active-passive setups typically rely on DNS-based failover (e.g., AWS Route 53 failover routing policies with health-check TTLs of 30–60 seconds) or cloud load-balancer health checks (GCP Backend Services, Azure Application Gateway). The standby must maintain:

Warm TLS session caches and pre-loaded certificates
Synchronized plugin and route configuration (replicated from the control plane)
Pre-warmed upstream connection pools to avoid cold-start latency on promotion

Kong 3.x — DB-less declarative config with standby sync:

# kong.yml  (Kong Gateway 3.6, DB-less mode)
_format_version: "3.0"

services:
  - name: payments-api
    url: https://payments.internal.svc
    retries: 3
    connect_timeout: 1000   # ms
    write_timeout: 5000
    read_timeout: 5000
    routes:
      - name: payments-route
        paths: ["/v1/payments"]
        methods: ["POST", "GET"]
    plugins:
      - name: prometheus
        config:
          status_code_metrics: true
          latency_metrics: true
          upstream_health_metrics: true

Ship this file to both nodes via CI. On failover, DNS switches the A record; the standby is already running an identical configuration with no cold-start config pull required.

Active-Active

Active-active topologies distribute live traffic across multiple gateway instances simultaneously. This eliminates the single-node failure risk but requires:

Distributed session stores — Redis or Memcached for rate-limiting counters, idempotency keys, and auth token caches shared across nodes
Consistent hashing for session affinity (Envoy’s MAGLEV policy, Kong’s consistent-hashing load balancer)
Leader election for control-plane write operations, while keeping data-plane nodes fully independent

Split-brain scenarios — where two nodes independently accept conflicting writes — are the primary danger. Prevent them by keeping the data plane completely read-only with respect to stored state, and routing all config mutations through a single control-plane endpoint protected by a consensus protocol.

Envoy 1.32 — Active-Active with outlier detection and health checks:

# envoy.yaml  (Envoy 1.32+)
static_resources:
  clusters:
    - name: backend_service_pool
      connect_timeout: 0.5s
      type: EDS
      lb_policy: MAGLEV           # consistent hashing for session affinity
      outlier_detection:
        consecutive_5xx: 3
        interval: 5s
        base_ejection_time: 30s
        max_ejection_percent: 50  # never eject more than half the pool
      health_checks:
        - timeout: 2s
          interval: 10s
          unhealthy_threshold: 3
          healthy_threshold: 2
          http_health_check:
            path: "/healthz"
            expected_statuses:
              - start: 200
                end: 204
      circuit_breakers:
        thresholds:
          - priority: DEFAULT
            max_connections: 1000
            max_pending_requests: 500
            max_retries: 3

Multi-Region

Multi-region extends active-active across geographic boundaries, using Global Server Load Balancing (GSLB) or anycast routing to direct users to the nearest healthy data plane. Critical constraints:

GeoDNS minimum TTLs of 60–300 seconds mean regional failover cannot be instantaneous without client-side retry logic or anycast IP (e.g., Cloudflare Argo, AWS Global Accelerator)
Cross-region traffic must respect data residency requirements — verify that failover paths do not route EU traffic through US regions in violation of GDPR
Async replication lag between regions creates a consistency window; idempotency tokens stored in a global Redis deployment (cross-region replication with read-local, write-primary) close this gap

Secondary Concept: Middleware Resilience and Idempotency

Circuit Breakers and Retry Budgets

The middleware chain is responsible for preventing upstream degradation from cascading into gateway saturation. Circuit breakers, rate limiting and throttling, and exponential backoff must all be configured at the edge before a partial upstream failure becomes a gateway-wide incident.

Kong 3.x — Circuit-breaker and retry middleware for a payment service:

# kong.yml  (Kong Gateway 3.6)
services:
  - name: payment_processor
    url: https://internal-payments.svc.cluster.local
    retries: 3
    connect_timeout: 1000   # ms
    write_timeout: 5000
    read_timeout: 5000
    plugins:
      - name: request-termination    # circuit-breaker fallback
        config:
          status_code: 503
          message: "Payment service temporarily unavailable. Retry with your idempotency key."
        enabled: false               # toggled by automation when threshold breaches

      - name: request-transformer
        config:
          add:
            headers:
              - "X-Idempotency-Key:$(headers['x-request-id'])"
              - "X-Gateway-Node:$(hostname)"

Retry storms are the primary failure amplifier in HA deployments. Implement jittered exponential backoff with a per-route retry budget and expose retry metrics at the route level so SREs can correlate upstream degradation with retry amplification before a thundering herd forms.

Envoy 1.32 — Retry policy with jitter and budget:

# route config fragment  (Envoy 1.32+)
virtual_hosts:
  - name: payments
    domains: ["payments.api.example.com"]
    routes:
      - match:
          prefix: "/v1/payments"
        route:
          cluster: backend_service_pool
          retry_policy:
            retry_on: "5xx,reset,connect-failure,retriable-4xx"
            num_retries: 3
            per_try_timeout: 2s
            retry_back_off:
              base_interval: 100ms
              max_interval: 2s     # jitter applied automatically
          timeout: 10s

Idempotency Keys

Idempotency keys prevent duplicate processing during network drops or split-brain events in active-active setups. The gateway intercepts POST and PATCH payloads, hashes critical request attributes (X-Idempotency-Key header + route fingerprint), and checks the hash in a low-latency distributed cache (Redis) with a TTL matching the upstream’s maximum processing window. On a cache hit the gateway returns the cached response immediately, bypassing the upstream entirely.

For stateful protocols such as WebSockets or Server-Sent Events, middleware must implement graceful connection draining — not hard resets — during node rotation to preserve in-flight message delivery.

Vendor-Specific Behavior: Tyk 5.x Uptime Tests

Tyk’s built-in uptime test framework (uptime_tests) runs synthetic probes against upstream URLs on a configurable schedule, independent of the request path. Unlike Envoy’s passive outlier detection, Tyk probes fire even when there is zero traffic — making them valuable for warming standby upstreams before a failover event and for detecting silent upstream failures that would otherwise go unnoticed until a real request hits the degraded node.

{
  "uptime_tests": {
    "check_list": [
      {
        "url": "https://payments.internal.svc/healthz",
        "method": "GET",
        "timeout": 2,
        "body": "",
        "headers": { "X-Health-Check": "tyk-gateway" }
      }
    ],
    "config": {
      "failure_trigger_sample_size": 3,
      "time_wait": 10,
      "checker_pool_size": 16,
      "enable_uptime_analytics": true
    }
  }
}

The failure_trigger_sample_size of 3 means Tyk requires 3 consecutive probe failures before flagging the upstream as unhealthy — matching the recommended threshold for avoiding flappy eviction under transient packet loss.

Comparative Implementation Table

Gateway	HA Config Approach	Key Parameters	Trade-off
Kong 3.x (DB-less)	Declarative `kong.yml` shipped to all nodes via CI; no shared DB required	`retries`, `connect_timeout`, `upstream.healthchecks.active`	Simple GitOps story; control-plane is a flat file, but config rollback requires a re-deploy
Envoy 1.32+	xDS management server (e.g., Envoy control-plane or Istio Pilot) pushes dynamic config; data plane survives xDS outage	`outlier_detection`, `circuit_breakers`, `retry_policy`, `lb_policy: MAGLEV`	Highly flexible but adds management-server operational burden
Tyk 5.x	Redis-backed distributed data store synchronizes config across Tyk Gateway nodes; Tyk Dashboard for control plane	`proxy.check_timeout`, `health_check.enable_health_checks`, `uptime_tests`	Tight Redis coupling — Redis HA is a prerequisite; simpler than xDS for medium-scale
NGINX + njs	Upstream `keepalive` pools + `lua-resty-healthcheck`; config sync via rsync or S3 bucket pull on reload	`upstream.max_fails`, `upstream.fail_timeout`, `keepalive_requests`	Lowest operational overhead but manual failover; no native circuit-breaker without OpenResty

Operational Gotchas

DNS TTL Misconfiguration

Setting DNS TTL too high (>300 s) means clients cache stale A records long after failover, extending the effective outage. Setting it too low (<30 s) increases DNS resolver load and can trigger rate-limiting on public resolvers. Use 60 seconds as a default for internet-facing APIs and 10–30 seconds for internal service discovery backed by Consul or Kubernetes DNS.

Control-Plane SPOF Hidden Inside “HA” Deployments

Many teams replicate gateway data-plane nodes but leave the control plane (Kong Admin API, Tyk Dashboard, Envoy management server) as a single instance. A control-plane failure does not drop live traffic (data planes continue serving cached config), but it blocks any configuration change during an incident — exactly when you need to route traffic away from a degraded upstream. Run the control plane in its own HA configuration, or use DB-less/declarative mode to eliminate the dependency entirely.

Health-Check Endpoint Collateral Damage

If your /healthz endpoint calls downstream services to verify “deep” health, a partial upstream failure cascades into failed health checks across all gateway nodes simultaneously, causing the load balancer to mark the entire pool unhealthy. Use shallow health checks at the gateway (process is alive, TLS cert is valid) and rely on outlier detection to eject unhealthy upstreams from the pool.

Retry Amplification Under Latency Spikes

When an upstream degrades (slow, not failing), retries on the gateway multiply in-flight connections. A 3× retry budget with 3 upstream replicas produces 9× the connections against a service already struggling with capacity. Cap max_retries at the Envoy circuit-breaker level and set max_pending_requests to shed load early rather than queue retries indefinitely.

Certificate Rotation Race Conditions

In active-active setups, rolling certificate rotation can cause a brief window where one node presents the old certificate and another presents the new one. Clients with persistent connections hit both. Coordinate rotation through the control plane with a grace period that overlaps the old certificate’s validity window (typically 24–48 hours of dual-cert serving) before revoking the old leaf. Full guidance on gateway-edge certificate handling is in the security boundaries and zero-trust section.

Observability Gaps During Failover

When a node is ejected, its in-flight metrics pipeline may be interrupted. Pre-configure remote-write endpoints (Prometheus remote_write, OpenTelemetry OTLP exporter) so metrics survive process shutdown and container eviction. Use structured logs with a gateway_node_id field to correlate per-node error rates during failover events.

Production Configuration Checklist

Control plane is itself deployed with redundancy (two or more replicas, or DB-less with Git as source of truth)
Active health checks configured: 10 s interval, 2 s timeout, 3 consecutive failures to eject, 2 to re-admit
max_ejection_percent capped at 50 % to prevent ejecting the entire upstream pool
Per-route retry budget set with jittered exponential backoff (base_interval: 100ms, max_interval: 2s)
Circuit-breaker max_connections and max_pending_requests thresholds set per upstream SLA
Idempotency key middleware enabled for all non-idempotent routes (POST, PATCH, DELETE)
Redis or equivalent shared-state store deployed in HA mode (replication + sentinel or node-based cluster mode)
DNS TTL set to 60 s for internet-facing endpoints; 10–30 s for internal service discovery
Traffic mirroring configured on at least one canary route to validate standby before promotion
Prometheus alerts on: connection pool overflow (envoy_cluster_upstream_cx_overflow_total), upstream 5xx ratio, retry amplification rate
Graceful drain timeout matches the longest expected in-flight request (WebSocket / SSE connections included)
Certificate rotation grace period ≥ 24 h; rotation coordinated through the control plane
mTLS between gateway nodes and upstreams enforced in multi-region deployments
GitOps pipeline enforces declarative config validation (deck diff for Kong, envoy-config-validator for Envoy) before promotion

Observability and Automated Recovery

Continuous health validation requires synthetic probing, distributed tracing, and real-time traffic mirroring. OpenTelemetry instrumentation should capture gateway-to-upstream latency, error rates, and connection pool exhaustion metrics per node.

Prometheus alerting for connection pool saturation (Envoy 1.32+):

# prometheus-rules.yaml
groups:
  - name: gateway_ha_alerts
    rules:
      - alert: GatewayConnectionPoolExhausted
        expr: rate(envoy_cluster_upstream_cx_overflow_total[5m]) > 10
        for: 2m
        labels:
          severity: critical
          team: platform
        annotations:
          summary: "Data-plane connection pool saturation on "
          description: >
            Connection overflow rate exceeds 10/s for 2 m.
            Trigger HPA scale-out and review upstream connection limits.
            Runbook: https://runbooks.internal/gateway/pool-exhaustion

      - alert: GatewayRetryAmplification
        expr: |
          rate(envoy_cluster_upstream_rq_retry_total[5m]) /
          rate(envoy_cluster_upstream_rq_total[5m]) > 0.15
        for: 3m
        labels:
          severity: warning
        annotations:
          summary: "Retry ratio above 15 % — risk of thundering herd"
          description: "Reduce retry budget or investigate upstream latency spike."

Traffic mirroring duplicates production requests to a standby node without impacting user-facing latency, enabling safe validation of failover topologies. This pattern is essential for validating protocol translation layers before promoting them to primary routing paths.

Automated traffic shifting uses weighted routing and canary deployments to drain connections from unhealthy nodes without dropping in-flight requests. When automated recovery fails, escalation matrices must prioritize data consistency over availability: quarantine split-brain writes and reconcile through distributed consensus before re-admitting the node.

Integrating alert thresholds with auto-scaling policies keeps capacity planning proactive. When queue depth or TLS handshake failure rates breach thresholds, automated workflows trigger horizontal pod autoscaling, adjust retry budgets, or initiate graceful node eviction — transforming HA from a static topology into a continuously self-validating system.

FAQ

What is the difference between active-active and active-passive gateway deployments?

Active-active sends live traffic to all nodes simultaneously, requiring distributed state synchronization and consistent hashing. Active-passive keeps a warm standby that only takes traffic after the primary fails, simplifying state management at the cost of recovery latency (typically 15–60 seconds for DNS-based failover).

How do I prevent retry storms in a high-availability gateway setup?

Implement per-route retry budgets with jittered exponential backoff, set a maximum retry percentage (e.g., Envoy’s max_retries: 3 with retry_on: 5xx,reset), and expose retry metrics so SREs can correlate upstream degradation with downstream amplification before a thundering herd forms.

What health-check interval should I use for API gateway nodes?

Use a 10-second interval with a 2-second timeout, requiring 3 consecutive failures before ejecting a node and 2 consecutive successes before re-admitting it. For latency-sensitive paths, shorten the interval to 5 seconds but tighten the unhealthy threshold to 2.

Can I use Kubernetes Ingress for a highly available gateway?

Standard Kubernetes Ingress lacks advanced traffic-splitting, weighted routing, and fine-grained health-check controls needed for production HA. Migrate to the Kubernetes Gateway API (with an Envoy or Kong data plane) or deploy Kong Gateway 3.x in DB-less mode for declarative, version-controlled configuration with zero control-plane SPOF.

Active-Active vs Active-Passive Gateway Failover — failover time, split-brain, and state-sync trade-offs between the two HA models
Gateway Selection Criteria — capability matrix for choosing between Kong, Tyk, Envoy, and NGINX before committing to an HA topology
Kong vs Tyk vs Envoy for Microservices — side-by-side comparison including node grouping and HA feature sets
Scaling Limits & Capacity Planning — connection pool sizing, autoscaling thresholds, and load-test methodology
Implementing mTLS at the Gateway Edge — certificate rotation and network segmentation in multi-region deployments
Rate Limiting & Throttling Strategies — distributed rate-limiting with Redis backends that survive node failover

Up: API Gateway Fundamentals & Architecture