API Gateway Fundamentals & Architecture
Modern distributed systems require a centralized ingress control plane to manage north-south traffic, enforce Security Boundaries & Zero Trust, and abstract backend complexity. This architectural layer acts as a reverse proxy, handling authentication, rate limiting, and Protocol Translation Patterns before routing requests to microservices. Understanding the trade-offs between centralized vs. decentralized deployment models is critical for platform teams designing resilient, cross-cluster routing topologies.
Key architectural mandates:
- Centralized ingress control for north-south traffic management
- Decoupling client-facing contracts from backend service implementations
- Cross-cluster request routing and service mesh integration boundaries
- Observability, policy enforcement, and lifecycle management at scale
Target platforms: Envoy Proxy, Kong Gateway, NGINX Plus, AWS API Gateway, Apigee, Tyk.
Core Architectural Patterns & Deployment Models
Platform teams must choose between centralized ingress gateways and decentralized sidecar proxies. Centralized deployments minimize operational overhead but increase blast radius during edge failures. Sidecar architectures (e.g., Envoy in service meshes) provide granular east-west control but complicate lifecycle management and increase resource consumption per pod.
Modern gateways enforce strict control plane and data plane separation. The control plane manages configuration distribution, policy compilation, and certificate rotation. The data plane executes routing decisions, TLS termination, and request transformation with deterministic latency. Stateless data planes enable horizontal scaling, while stateful session management requires external stores or consistent hashing.
Cross-cluster routing relies on DNS-based service discovery or registry synchronization (Consul, Kubernetes API). Bootstrap configurations must define cluster endpoints, TLS verification contexts, and admin interfaces.
# Envoy Data Plane Bootstrap (envoy.yaml)
admin:
address:
socket_address: { address: 127.0.0.1, port_value: 9901 }
static_resources:
clusters:
- name: upstream_service
connect_timeout: 0.5s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: upstream_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address: { address: api.internal.svc, port_value: 8080 }
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
sni: api.internal.svc
Request Routing & Traffic Management
Layer 7 routing enables path, header, and method-based dispatch, contrasting with Layer 4 TCP/UDP load balancing. Dynamic routing updates require tight integration with service registries to propagate endpoint changes without connection drops. Traffic splitting enables canary releases and A/B testing by routing percentages of traffic to versioned upstreams.
Cross-region active-active setups require latency-aware routing and consistent hashing for session affinity. Retry policies must implement exponential backoff with jitter to prevent thundering herd effects. Circuit breakers isolate degraded services before they cascade failures across the topology.
# Kong Declarative Config (weighted routing + retries)
_format_version: "3.0"
services:
- name: payment-service
url: http://payment-v1.internal:8080
routes:
- name: payment-canary
paths: ["/v2/payments"]
strip_path: true
plugins:
- name: rate-limiting
config:
minute: 1000
policy: local
upstreams:
- name: payment-service
algorithm: round-robin
targets:
- target: payment-v1.internal:8080
weight: 90
- target: payment-v2.internal:8080
weight: 10
Gateway Selection & Vendor Evaluation
Evaluating commercial versus open-source solutions requires balancing extensibility against operational overhead. Plugin architectures (Kong, Tyk) enable rapid feature iteration but introduce runtime overhead and dependency management complexity. Compiled modules (NGINX, HAProxy) offer deterministic performance but require recompilation for custom logic.
Declarative configuration pipelines prevent configuration drift and enable GitOps workflows. Imperative admin APIs introduce state inconsistency risks during concurrent deployments. Cloud-native integration maturity dictates Kubernetes Ingress/CRD compatibility and operator reliability. For a structured evaluation matrix, consult Gateway Selection Criteria.
# Tyk Declarative API Definition (OAS + plugin pipeline)
tyk_api:
api_id: "user-service"
name: "User Management API"
org_id: "platform"
use_keyless: false
auth:
auth_header_name: "Authorization"
version_data:
not_versioned: true
versions:
default:
name: "v1"
use_extended_paths: true
extended_paths:
ignored: []
white_list: []
black_list: []
proxy:
listen_path: "/users/"
target_url: "http://user-svc.internal:8080"
strip_listen_path: true
middleware:
global:
plugin_bundle: "tyk-bundle-auth"
Production Scaling & Capacity Planning
Horizontal scaling limits are dictated by connection multiplexing efficiency and TLS handshake CPU costs. I/O-bound workloads benefit from async event loops, while CPU-bound tasks (JWT validation, regex matching) saturate worker threads. Auto-scaling must account for cold-start latency and graceful connection drain periods.
Backpressure mechanisms prevent cascading failures by enforcing queue limits and rejecting excess traffic. Detailed methodologies for capacity modeling are documented in Scaling Limits & Capacity Planning.
# NGINX Upstream & Rate Limiting (keepalive + least_conn)
upstream backend_pool {
least_conn;
server 10.0.1.10:8080;
server 10.0.1.11:8080;
keepalive 32;
keepalive_timeout 60s;
}
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=50r/s;
server {
listen 443 ssl;
location /api/ {
limit_req zone=api_limit burst=20 nodelay;
proxy_pass http://backend_pool;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
Resilience & High Availability Design
Enterprise-grade uptime requires multi-AZ active-active deployments with local configuration caching to survive control plane outages. Health check propagation must be decoupled from client-facing endpoints to prevent false positives during transient network partitions. Circuit breakers and outlier detection isolate degraded upstreams before they trigger cascading failures.
Cross-region disaster recovery relies on DNS failover with aggressive TTLs and traffic shifting policies. Reference High Availability Topologies for enterprise deployment blueprints.
# Envoy Outlier Detection & Circuit Breaking
clusters:
- name: resilient_upstream
connect_timeout: 0.5s
type: STRICT_DNS
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 1000
max_pending_requests: 500
max_retries: 3
outlier_detection:
consecutive_5xx: 5
interval: 10s
base_ejection_time: 30s
max_ejection_percent: 50
enforcing_consecutive_5xx: 100
Common Pitfalls
- Monolithic logic layer violation: Embedding business logic in the gateway violates separation of concerns and complicates versioning.
- TLS termination saturation: Over-provisioning RSA/ECDHE handshakes without hardware acceleration or session resumption causes CPU bottlenecks.
- Control plane latency blindness: Rolling config updates without staged validation introduces routing inconsistencies across the fleet.
- Retry misconfiguration: Aggressive retries without jitter or budget limits trigger thundering herd effects and upstream exhaustion.
- Missing request size limits: Unbounded
client_max_body_sizeor payload buffers expose the gateway to memory exhaustion and DoS vectors.
FAQ
What is the primary architectural difference between an API gateway and a traditional load balancer? A load balancer operates primarily at Layer 4 (TCP/UDP) or basic Layer 7 (HTTP/HTTPS) for traffic distribution, while an API gateway provides advanced Layer 7 routing, protocol translation, security policy enforcement, and developer lifecycle management.
How do you prevent an API gateway from becoming a single point of failure? Deploy the data plane as a stateless, horizontally scalable cluster across multiple availability zones, implement local configuration caching for control plane outages, and use DNS-based failover with health-checked endpoints.
When should routing logic reside in the gateway versus a service mesh? Gateways should handle north-south traffic, external authentication, and rate limiting. Service meshes should manage east-west traffic, mTLS, and intra-cluster retries. Overlapping responsibilities cause configuration drift and increased latency.
What are the typical scaling limits for a production API gateway? Scaling limits depend on connection multiplexing efficiency, TLS handshake overhead, and worker thread allocation. Typical production limits range from 50k to 150k concurrent connections per node, requiring horizontal scaling and connection pooling to avoid CPU saturation.