Security Boundaries & Zero Trust at the API Gateway

Q: What failure signal indicates a misconfigured mTLS policy on Envoy?

The most common signal is a flood of TLS alert code 42 (bad_certificate) in Envoy access logs combined with upstream_cx_destroy_local_with_active_rq incrementing. This means Envoy terminated connections before the upstream could respond, typically because the client sent a certificate signed by a CA not present in the trusted_ca bundle, or because downstream_tls_context is set to REQUIRE_CLIENT_CERT but no ca_certificate is attached.

Every request entering a distributed system carries an implicit question: who sent this, and are they allowed to do what they are asking? Traditional perimeter security answered that question once at the network edge and then trusted everything inside. In cloud-native environments where workload IPs are ephemeral, container orchestrators reschedule pods across nodes, and multi-tenant traffic shares physical infrastructure, that answer is no longer good enough. As part of API gateway fundamentals and architecture, zero trust moves that verification into every request, at every hop, with cryptographic evidence — and makes the gateway the primary enforcement point.

This page covers the prerequisites and mental model you need, then goes deep on the two primary controls — mutual TLS and JWT-driven policy — with runnable configurations for Kong 3.x, Envoy 1.32+, and Tyk 5.x, followed by a comparative trade-off table and the operational gotchas that trip up production teams.

Architectural Baseline

Before implementing zero trust controls at the gateway, three mental model shifts are required.

Identity is not the same as connectivity. A client that can reach your gateway port has established network reachability, nothing more. Zero trust requires the client to additionally prove cryptographic identity (via a certificate or signed token) and authorization (via claims or policy evaluation). These are separate verification steps and must be enforced in that order.

The gateway is a policy enforcement point, not a policy decision point. Policy decisions — “is service A allowed to call endpoint /billing/v2 with scope write?” — belong in a dedicated engine like Open Policy Agent. The gateway enforces that decision by calling an external authorization sidecar and blocking or passing the request. Collapsing both roles into the gateway configuration creates brittle, hard-to-audit YAML that breaks under policy drift.

Certificate lifecycle is infrastructure, not configuration. Certificates that expire in production are among the most common causes of zero-downtime outages. Build automatic rotation, CA bundle overlap windows, and SPIFFE/SPIRE workload identity into your platform before you write a single gateway policy. The deep-dive on certificate rotation mechanics lives in implementing mTLS at the gateway edge.

The Zero Trust Request Lifecycle

The diagram below shows the full enforcement sequence a request travels through before reaching an upstream microservice:

Primary Concept: Mutual TLS and OPA External Authorization

mTLS Certificate Validation — Kong 3.x

Kong 3.x ships the mtls-auth plugin at priority 1006, which ensures it executes before JWT validation (priority 1005) and rate limiting (priority 910). The plugin validates client certificates against a stored CA bundle and, optionally, maps the certificate subject to a Kong Consumer for fine-grained ACL control.

Step 1 — Register the CA certificate:

# Kong 3.x Admin API — register the SPIFFE CA bundle
curl -s -X POST http://localhost:8001/ca_certificates \
  --data-urlencode "cert@/etc/spiffe/ca-bundle.pem" \
  | jq '.id'
# Save the returned UUID — you'll reference it in the plugin config

Step 2 — Attach the plugin to a service:

# kong.yml — declarative format, Kong 3.x
_format_version: "3.0"
_transform: true

ca_certificates:
  - cert: |
      -----BEGIN CERTIFICATE-----
      <your-spiffe-ca-bundle>
      -----END CERTIFICATE-----
    id: "a1b2c3d4-e5f6-7890-abcd-ef1234567890"

services:
  - name: billing-api
    url: https://billing.internal.svc.cluster.local:8443
    routes:
      - name: billing-route
        paths: ["/api/billing"]
        protocols: ["https"]
    plugins:
      - name: mtls-auth
        config:
          ca_certificates:
            - "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
          skip_consumer_lookup: false
          revocation_check_mode: SKIP   # set to STRICT once CRL endpoint is live
          send_ca_dn: true
      - name: jwt
        config:
          claims_to_verify: ["exp", "iss", "aud"]
          key_claim_name: "sub"
          secret_is_base64: false
      - name: rate-limiting
        config:
          minute: 200
          hour: 5000
          policy: redis
          redis_host: redis.infra.svc.cluster.local

The send_ca_dn: true flag instructs Kong to include the CA distinguished name in the TLS CertificateRequest message during the handshake, so the client knows which CA to select from its keystore. Without this, clients with multiple installed certificates may send the wrong one.

Step 3 — Verify the handshake in Kong’s access log:

# Confirm mTLS is operating — look for the tls.client_cert field
curl -s --cert client.crt --key client.key --cacert ca.crt \
  https://gateway.example.com/api/billing/v1/invoices

# Kong structured log should include:
# "tls": {"client_cert": {"serial_number": "...", "subject": "spiffe://cluster.local/ns/billing/sa/invoicer"}}

OPA External Authorization — Envoy 1.32+

For fine-grained attribute-based policies, Envoy’s ext_authz filter delegates every access decision to an OPA sidecar running a Rego policy bundle. This decouples policy logic from gateway configuration and makes it auditable and version-controlled independently.

Envoy filter configuration (envoy.yaml, Envoy 1.32+):

# Envoy 1.32+ — listener with ext_authz + router filter chain
static_resources:
  listeners:
    - name: ingress
      address:
        socket_address: { address: 0.0.0.0, port_value: 8443 }
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingress_http
                http_filters:
                  # ext_authz MUST come before router
                  - name: envoy.filters.http.ext_authz
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.ExtAuthz
                      grpc_service:
                        envoy_grpc:
                          cluster_name: opa_authz
                        timeout: 50ms
                      failure_mode_allow: false        # deny on OPA timeout
                      transport_api_version: V3
                      with_request_body:
                        max_request_bytes: 8192
                        allow_partial_message: false
                      include_peer_certificate: true   # forwards mTLS cert to OPA
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

  clusters:
    - name: opa_authz
      connect_timeout: 0.25s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      typed_extension_protocol_options:
        envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
          "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
          explicit_http_config:
            http2_protocol_options: {}
      load_assignment:
        cluster_name: opa_authz
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: 127.0.0.1
                      port_value: 9191

Corresponding OPA Rego policy (policy.rego):

package envoy.authz

import future.keywords.if

default allow := false

# Allow when: cert SAN matches expected SPIFFE ID AND JWT scope covers the method
allow if {
    spiffe_id := input.attributes.source.certificate.san
    startswith(spiffe_id, "spiffe://cluster.local/ns/")

    token := io.jwt.decode_verify(
        trim_prefix(input.attributes.request.http.headers.authorization, "Bearer "),
        { "iss": "https://auth.internal", "cert_pem": data.jwks_pem }
    )
    token[1].scope == required_scope
}

required_scope := "billing:write" if {
    input.attributes.request.http.method == "POST"
}

required_scope := "billing:read" if {
    input.attributes.request.http.method == "GET"
}

The key design decision here is failure_mode_allow: false. If the OPA sidecar is unavailable or times out, Envoy returns a 503 — the request is blocked rather than passed through. This is the correct default for a zero trust posture; availability concerns should be addressed by running OPA as a local sidecar (not a remote service) to minimize the failure window.

Secondary Concept: Claim-Based Routing and Header Sanitization

Once a request has passed cryptographic verification and policy evaluation, the gateway can use the validated JWT claims or mTLS Subject Alternative Names (SANs) to drive routing decisions. This is more reliable than routing by source IP or request path alone, because claims are signed and non-forgeable (assuming the signing key is protected). The broader mechanics of path and header-based routing show how these signals compose with versioning and tenant selection.

Tyk 5.x — Claim-Based Routing with Virtual Endpoints

Tyk 5.x virtual endpoints execute JavaScript at the gateway, giving you access to the validated session metadata (which Tyk populates from the JWT after its own validation step):

{
  "api_definition": {
    "name": "billing-gateway",
    "version_data": {
      "not_versioned": true,
      "versions": {
        "Default": {
          "extended_paths": {
            "virtual": [
              {
                "response_function_name": "routeByScope",
                "function_source_type": "blob",
                "function_source_uri": "",
                "path": "/api/billing",
                "method": "ANY",
                "function_source_value": "ZnVuY3Rpb24gcm91dGVCeVNjb3BlKHJlcXVlc3QsIHNlc3Npb24sIGNvbmZpZykge..."
              }
            ]
          }
        }
      }
    },
    "auth": {
      "auth_header_name": "Authorization",
      "use_param": false,
      "use_cookie": false
    },
    "jwt_signing_method": "rsa",
    "jwt_source": "https://auth.internal/.well-known/jwks.json",
    "jwt_identity_base_field": "sub",
    "jwt_policy_field_name": "policy_id",
    "strip_auth_data": true
  }
}

The JavaScript virtual endpoint function (decoded from the base64 function_source_value above):

// Tyk 5.x virtual endpoint — claim-based upstream routing
function routeByScope(request, session, config) {
  var scope = session.jwt_data && session.jwt_data.scope;
  var upstream;

  // Strip any client-injected identity headers before routing
  delete request.Headers["X-Real-Forwarded-For"];
  delete request.Headers["X-Service-Identity"];

  if (scope === "billing:write") {
    upstream = "https://billing-write.internal.svc.cluster.local:8443";
  } else if (scope === "billing:read") {
    upstream = "https://billing-read.internal.svc.cluster.local:8443";
  } else {
    return TykJsResponse({
      Body: JSON.stringify({ error: "insufficient_scope" }),
      Code: 403,
      Headers: { "Content-Type": ["application/json"] }
    }, session.meta_data);
  }

  // Forward validated identity downstream as a signed header
  request.Headers["X-Verified-Identity"] = [session.alias];
  request.Headers["X-Forwarded-Scope"] = [scope];

  return TykJsResponse({
    Body: request.Body,
    Code: 200,
    Headers: request.Headers
  }, session.meta_data);
}

The critical step here is the header sanitization block: any header that encodes identity information and originates from the client (rather than from the gateway’s own verification) must be stripped before routing. Common injection vectors include X-Real-Forwarded-For, X-User-ID, X-Service-Identity, and X-Auth-Token — clients that can set these can impersonate any identity if the upstream blindly trusts them. The routing by API key vs JWT claims page covers the comparative trust model in detail.

Comparative Implementation Table

Gateway	mTLS Approach	JWT Validation	Policy Engine Integration	Key Trade-off
Kong 3.x	`mtls-auth` plugin (priority 1006); CA cert registered via Admin API	`jwt` plugin (priority 1005); JWKS endpoint or pre-shared secret	KongPlugin `pre-function` calling OPA HTTP API, or Konnect OPA integration	Easiest ops model; plugin priorities make chain ordering explicit but not arbitrary
Envoy 1.32+	`transport_socket` with `downstream_tls_context`; `require_client_certificate: true`	`jwt_authn` HTTP filter with remote JWKS; caches keys in cluster	`ext_authz` filter with gRPC OPA sidecar; `include_peer_certificate: true` passes cert to OPA	Most flexible and composable; filter ordering is manual — wrong order causes silent bypasses
Tyk 5.x	Mutual TLS enforced via `mutual_tls_auth` API definition field; CA certs uploaded to Tyk Dashboard	JWT middleware with RSA/ECDSA key; `jwt_policy_field_name` maps claims to Tyk policies	Virtual endpoint JS functions or OPA plugin (Enterprise); policy per JWT claim value	Fastest time-to-production; JavaScript virtual endpoints add latency (~1–3 ms per request)
NGINX + lua-resty-jwt	`ssl_verify_client on`; `ssl_client_certificate /etc/nginx/ca.crt` in server block	`lua-resty-jwt` library in `access_by_lua_block`; manual JWKS fetch or cached secret	OPA called via `ngx.location.capture` to a proxied OPA endpoint; or inline Rego via `lua-resty-opa`	Maximum configurability; no built-in plugin system means every control is hand-authored Lua

For a deeper gateway-by-gateway comparison across capability dimensions, see Kong vs Tyk vs Envoy for microservices.

Operational Gotchas

1. Certificate Chain Validation Order Is Not Obvious

Gateways validate client certificates against the CA bundle in configuration, but the precise validation semantics differ. Kong checks the full chain up to any registered CA — if you only register an intermediate CA and the client sends a leaf cert without the intermediate, validation fails silently. Always register the full chain or both root and intermediate. Envoy with verify_certificate_spki will bypass subject validation entirely, checking only the public key hash — useful for pinning but dangerous if you forget that SAN checks require verify_subject_alt_name to be set separately.

2. OPA Timeout on `ext_authz` Causes Cascading Failures

failure_mode_allow: false is the correct posture, but it means an OPA sidecar that becomes unresponsive turns 100% of traffic into 503. Run OPA as a sidecar co-located in the same pod as Envoy (not as a separate deployment) so the failure domain matches the gateway instance. Set the ext_authz timeout to 50–100 ms and configure OPA’s decision log to async (not synchronous disk write) to avoid disk I/O blocking policy evaluation under load.

3. JWT `aud` Claim Validation Is Commonly Skipped

Most gateway JWT plugins validate exp and iss by default but do not enforce aud (audience) unless explicitly configured. Without aud validation, a JWT issued for your internal analytics service is valid for your billing API — a classic confused deputy attack. Always configure claims_to_verify: ["exp", "iss", "aud"] in Kong, audiences in Envoy’s jwt_authn filter, and the jwt_allowed_audiences field in Tyk. See implementing JWT validation in Kong plugins for the Kong-specific configuration path.

4. Header Injection via X-Forwarded-Client-Cert

Envoy sets the x-forwarded-client-cert (XFCC) header to propagate mTLS identity downstream so internal services can read the peer certificate without re-validating it. If your gateway sits behind another proxy that also sets XFCC, Envoy’s default behavior is to append to the existing header, not replace it — an upstream proxy can inject a forged identity. Set forward_client_cert_details: SANITIZE_SET in Envoy’s HttpConnectionManager to ensure the header is always replaced with the value Envoy itself verified, not accumulated from untrusted upstream hops.

5. Rate Limiting Scope Must Follow Identity, Not IP

Standard rate limiting and throttling keyed to client IP breaks behind NAT or shared egress gateways — hundreds of services sharing one IP hit the limit collectively. Key rate limit counters to the mTLS certificate subject or the JWT sub claim. In Kong, the rate-limiting plugin’s limit_by: credential setting uses the authenticated consumer (populated by mtls-auth) as the rate limit key. In Envoy, ext_authz can return x-rate-limit-key response headers that the ratelimit filter consumes as the quota identifier.

6. SPIFFE SAN Mismatch After Pod Rescheduling

SPIRE issues workload certificates whose SANs encode the Kubernetes namespace and service account — for example spiffe://cluster.local/ns/billing/sa/invoicer. If a pod is rescheduled into a different namespace (after a namespace migration or a Helm chart rename), the SAN no longer matches the CA bundle policy entries, and all service-to-service calls start returning 401. Automate a post-deployment check that compares expected SPIFFE IDs against the SVID the workload actually received (spire-agent api fetch x509), and surface mismatches in your deployment pipeline before traffic shifts.

Production Configuration Checklist

TLS 1.3 enforced at the gateway listener; TLS 1.0 and 1.1 explicitly disabled
require_client_certificate: true set on every mTLS-protected listener; no fallback to unauthenticated
CA bundle includes both root and all intermediate CAs; rotation overlap window is at least 24 hours
revocation_check_mode set to STRICT once CRL or OCSP endpoint is operational
OPA deployed as a sidecar (same pod/node as gateway), not as a remote cluster dependency
failure_mode_allow: false on all ext_authz configurations
JWT aud claim validated explicitly, not just exp and iss
All client-supplied identity headers (X-User-ID, X-Service-Identity, X-Auth-Token) stripped before upstream proxying
forward_client_cert_details: SANITIZE_SET on Envoy HttpConnectionManager
Rate limit keys bound to authenticated credential (sub or cert subject), not source IP
SPIFFE SVID SAN validated in CI/CD deployment check after pod rescheduling events
OpenTelemetry spans emitted for each middleware stage; http.auth.status and tls.handshake.duration captured
Circuit breaker thresholds configured on upstream clusters; consecutive_5xx: 5 with 30-second ejection window
Scaling limits and capacity planning reviewed for cryptographic operation overhead (TLS handshake CPU, OPA eval latency)

FAQ

What is the difference between mTLS and JWT for zero trust at the gateway?

mTLS authenticates the transport layer — it verifies that both the client and server hold certificates signed by a trusted CA, making it suitable for service-to-service identity. JWT operates at the application layer, carrying user or service identity claims that the gateway validates cryptographically. Production zero trust deployments use both: mTLS to establish connection-level trust, JWT to carry fine-grained authorization scopes that drive routing and policy decisions.

Where should OPA policy evaluation sit in the gateway middleware chain?

OPA (or any external authorization sidecar) must execute after TLS termination and client certificate validation but before JWT parsing or upstream proxying. If OPA runs after JWT parsing, a malformed token can trigger application errors before policy blocks the request. The correct order is: TLS handshake → mTLS cert validation → OPA ext_authz call → JWT claim extraction → rate limiting → proxy to upstream.

How do you rotate certificates without dropping live connections in production?

Use SPIFFE/SPIRE for workload identity and configure gateway certificate stores to accept certificates from both the expiring and the new CA bundle during the rotation window. In Kong 3.x, update the ca_certificates object via the Admin API — it takes effect immediately without a reload. In Envoy 1.32+, SDS (Secret Discovery Service) delivers rotated certs to the data plane without connection disruption. Maintain a minimum 24-hour overlap between old and new CA validity windows.

What failure signal indicates a misconfigured mTLS policy on Envoy?

The most common signal is a flood of TLS alert code 42 (bad_certificate) in Envoy access logs combined with upstream_cx_destroy_local_with_active_rq incrementing. This means Envoy terminated connections before the upstream could respond, typically because the client sent a certificate signed by a CA not present in the trusted_ca bundle, or because downstream_tls_context is set to REQUIRE_CLIENT_CERT but no ca_certificate is attached.

Gateway Selection Criteria — capability matrix for choosing between Kong, Tyk, Envoy, and NGINX across security, performance, and ops dimensions
High Availability Topologies — maintaining policy consistency across geo-distributed ingress points
Authentication Proxying & Token Validation — how the gateway passes verified identity downstream to internal services
Protocol Translation Patterns — preserving security context when translating between gRPC, REST, and WebSocket
Multi-Tenant Routing Strategies — extending identity-based enforcement to per-tenant isolation and routing policies

Up: API Gateway Fundamentals & Architecture