How Envoy Proxy Works: The Service Mesh Data Plane

Envoy is a high-performance, C++ service proxy designed for cloud-native applications. Originally built at Lyft to solve the problem of observability and reliability across a sprawling microservices architecture, it was donated to the CNCF in 2017 and has since become the default data plane for Istio, the foundation of AWS App Mesh, the core of Contour and Gloo Edge, and the proxy behind Google Cloud's Traffic Director. Envoy is not a traditional reverse proxy like Nginx or HAProxy. It was built from the ground up around the idea of dynamic configuration via APIs, first-class HTTP/2 and gRPC support, deep observability, and extensibility through a filter chain architecture. Internally, Envoy uses a non-blocking event loop per worker thread model — each worker thread runs its own event loop bound to a CPU core, and there are no locks in the request processing path. This architecture eliminates contention between threads and is a key contributor to Envoy's high throughput and low tail latency under load. Understanding how Envoy works internally is essential for anyone operating a service mesh, building a platform on Kubernetes, or designing a modern load balancing tier.

Architecture: Listeners, Filter Chains, and Clusters

Envoy's architecture revolves around three core primitives: listeners, filter chains, and clusters. A listener binds to a network address (IP + port) and accepts incoming connections. Each connection passes through a filter chain — an ordered pipeline of processing stages. At the end of the pipeline, traffic is routed to a cluster, which is a named group of upstream endpoints (backend servers). Routes map incoming requests to clusters based on matching criteria like URL path, headers, or gRPC method name.

This separation is fundamental. Listeners define where traffic enters. Filter chains define how it is processed. Clusters define where it goes. Routes connect the processing to the destination. Each of these can be configured independently and updated at runtime via the xDS API without restarting the proxy.

Listeners in Detail

A listener is Envoy's entry point for traffic. Each listener binds to a specific socket address and protocol. A typical Envoy deployment might have a listener on port 8080 for HTTP traffic, another on port 8443 for HTTPS with TLS termination, and an admin listener on port 9901 for the admin interface and health checks.

Listeners support listener filters that run before the connection is dispatched to a filter chain. The most common listener filter is envoy.filters.listener.tls_inspector, which peeks at the initial bytes of a connection to detect whether it is TLS and, if so, extracts the SNI (Server Name Indication) value. This allows a single listener to serve multiple domains with different TLS certificates, routing each connection to the appropriate filter chain based on the SNI hostname.

Filter chain matching on a listener uses criteria like the destination port, SNI, ALPN (Application-Layer Protocol Negotiation), source IP CIDR range, and transport protocol. This is how Envoy implements the equivalent of Nginx's server blocks or HAProxy's bind lines — but with dynamic reconfiguration via LDS (Listener Discovery Service).

The HTTP Connection Manager

The HTTP Connection Manager (HCM) is Envoy's most important network filter. It bridges the gap between L4 (raw TCP bytes) and L7 (HTTP semantics). When a connection arrives on a listener and matches a filter chain, the HCM takes over: it parses the raw byte stream into HTTP requests, manages the codec (HTTP/1.1, HTTP/2, or HTTP/3), handles request/response lifecycle, and dispatches each request through the HTTP filter pipeline.

The HCM is where most of Envoy's HTTP intelligence lives. It handles:

Codec negotiation — Automatically detects and handles HTTP/1.1, HTTP/2, and HTTP/3 (QUIC). For HTTP/2, it manages the full multiplexing lifecycle, tracking streams, flow control windows, and HPACK header compression. This is why Envoy has native gRPC support — gRPC is HTTP/2 with Protocol Buffer payloads, and Envoy handles the framing natively.
Access logging — Every request/response pair can be logged with configurable format strings. Envoy's access log supports hundreds of format variables: response code, upstream response time, downstream TLS version, gRPC status, retry count, and more.
Request ID generation and tracing — HCM generates a unique x-request-id for each request and propagates distributed tracing headers (Zipkin B3, W3C Trace Context, OpenTelemetry). It can also initiate trace spans and report them to a tracing backend.
Timeout management — Configurable idle timeouts, request timeouts, and stream timeouts. These are independent: a request timeout limits how long a single request can take, while an idle timeout closes connections with no active streams.
HTTP/1.1 upgrade handling — Supports WebSocket upgrades and HTTP CONNECT for tunneling, allowing WebSocket connections to pass through the proxy transparently.

HTTP Filter Chains

Within the HCM, each request passes through an ordered list of HTTP filters. Filters are the primary extension mechanism in Envoy. They can inspect, modify, or reject requests and responses at the L7 layer. The last filter in every HTTP filter chain must be envoy.filters.http.router, which performs the actual upstream routing based on the route configuration.

Some of the most commonly used HTTP filters:

envoy.filters.http.rbac — Role-based access control. Evaluates policies to allow or deny requests based on source identity, headers, or paths. This is how Istio implements mTLS-based authorization policies.
envoy.filters.http.jwt_authn — JWT authentication. Validates JWT tokens against configurable JWKS endpoints, extracts claims, and makes them available for downstream filters and routing decisions.
envoy.filters.http.ratelimit — Calls an external rate limiting service (typically implementing the Envoy rate limit proto) to enforce global rate limits across the fleet.
envoy.filters.http.cors — CORS handling. Adds appropriate CORS headers based on configurable origins, methods, and headers.
envoy.filters.http.fault — Fault injection. Injects configurable delays or aborts into the request path for chaos testing. You can inject a 5-second delay on 10% of requests to a specific route to test timeout handling in callers.
envoy.filters.http.ext_authz — External authorization. Calls an external gRPC or HTTP service to authorize each request. The external service receives request headers and can add/remove headers before forwarding.
envoy.filters.http.lua — Runs Lua scripts inline for custom request/response manipulation without compiling a custom Envoy binary.
envoy.filters.http.wasm — Runs WebAssembly modules for custom logic, which is the modern alternative to Lua for extensibility.

The filter ordering matters. RBAC runs before the router to reject unauthorized requests early. Rate limiting runs before the router to prevent overloaded upstreams from receiving excess traffic. Fault injection runs before the router so it can short-circuit the request without ever contacting an upstream. Getting the order wrong can lead to subtle bugs — for instance, placing rate limiting after ext_authz means unauthenticated requests still consume rate limit quota.

Clusters: Upstream Management

A cluster in Envoy represents a group of logically equivalent upstream hosts. Each cluster has its own load balancing algorithm, health checking configuration, connection pool settings, circuit breaking thresholds, and outlier detection parameters. Clusters are what Envoy routes traffic to after the filter chain and route matching have determined the destination.

Envoy supports several cluster types that determine how endpoints are discovered:

STATIC — Endpoints are hardcoded in the configuration. Useful for fixed infrastructure like database proxies or external APIs.
STRICT_DNS — Envoy resolves a DNS name and uses all returned A/AAAA records as endpoints. It re-resolves on a configurable interval, picking up changes as DNS records are updated.
LOGICAL_DNS — Like STRICT_DNS, but only uses the first resolved address. Used for large DNS-based pools where you want connection affinity to a single host.
EDS — Endpoints are provided dynamically via the Endpoint Discovery Service. This is the standard mode in service mesh deployments, where the control plane pushes the current set of healthy endpoints.
ORIGINAL_DST — Routes to the original destination of the connection before it was redirected (by iptables) to Envoy. This is how the sidecar proxy in a service mesh handles connections to services that are not in the mesh.

The load balancing algorithms available per cluster include: round robin, least request (weighted by active request count), ring hash (consistent hashing for session affinity), Maglev (Google's consistent hashing algorithm for high performance), and random. For gRPC load balancing, least request is often preferred because gRPC uses long-lived HTTP/2 connections with multiplexed streams, and round-robin over connections would lead to badly skewed load distribution.

The xDS API: Dynamic Configuration at Scale

The xDS (x Discovery Service) protocol is what makes Envoy fundamentally different from traditional proxies. Instead of reading a static config file and requiring a reload signal, Envoy connects to a management server (the control plane) via gRPC streams and receives configuration updates in real time. The entire proxy configuration — listeners, routes, clusters, endpoints, secrets — can be pushed from the control plane without any restarts or connection drops.

The xDS protocol family consists of:

LDS (Listener Discovery Service) — Configures what ports Envoy listens on, which filter chains to use, and how to match incoming connections. When a new virtual host needs to be added, LDS pushes an updated listener configuration.
RDS (Route Discovery Service) — Configures the routing table within the HTTP Connection Manager. Route configurations map incoming requests (by path, headers, gRPC method) to clusters. RDS updates allow traffic shifting (canary deployments, A/B testing) without changing listener configuration.
CDS (Cluster Discovery Service) — Configures upstream clusters, their load balancing policies, health check parameters, and circuit breaking thresholds. When a new microservice is deployed, CDS pushes a new cluster definition.
EDS (Endpoint Discovery Service) — Provides the actual IP addresses and ports of upstream hosts within each cluster. This is the most frequently updated xDS resource — as pods scale up and down in Kubernetes, EDS pushes the new endpoint list. EDS also supports locality-aware load balancing by tagging endpoints with zone/region metadata.
SDS (Secret Discovery Service) — Delivers TLS certificates and private keys to Envoy. This is how service meshes implement automatic mTLS certificate rotation — the control plane generates short-lived certificates from a CA and pushes them to each sidecar via SDS, with no restart required.
ECDS (Extension Config Discovery Service) — Pushes individual HTTP filter configurations independently of the listener or HCM. Allows updating a single Wasm filter or rate limit config without touching the rest of the configuration.

xDS uses two transport variants: State of the World (SotW) and Delta (Incremental). In SotW mode, every update sends the entire resource set — all clusters, all endpoints. This is simple but scales poorly when there are thousands of services. Delta xDS sends only the resources that changed (additions, modifications, removals), which is critical for large deployments. Istio switched to delta xDS in recent versions to reduce control plane load and Envoy memory churn.

There is also an Aggregated Discovery Service (ADS) that multiplexes all xDS resource types over a single gRPC stream. ADS provides ordering guarantees: the control plane can ensure that a CDS update (new cluster) is applied before the corresponding EDS update (endpoints for that cluster), and before the RDS update (routes pointing to that cluster). Without ADS, xDS updates are eventually consistent across a fleet — each Envoy instance processes updates from independent streams at its own pace, meaning two sidecars may briefly have different routing configurations. Race conditions between independent xDS streams can cause brief periods where routes reference clusters that do not yet exist. ADS eliminates these ordering issues at the cost of serializing all updates through a single stream.

xDS in Practice: Istio's Control Plane

In an Istio service mesh, istiod is the xDS control plane. It watches the Kubernetes API server for changes — new deployments, services, endpoints, Istio custom resources (VirtualService, DestinationRule, AuthorizationPolicy) — and translates them into Envoy configuration pushed via xDS to every sidecar proxy in the mesh.

When you create an Istio VirtualService with a traffic split (90% to v1, 10% to v2), istiod generates an RDS update with weighted cluster routes and pushes it to every relevant Envoy sidecar. When a pod scales down and its endpoint is removed from the Kubernetes Endpoints object, istiod pushes an EDS update removing that endpoint. When an Istio PeerAuthentication policy requires mTLS, istiod generates the appropriate filter chain configuration via LDS and the certificates via SDS.

The xDS protocol also supports NACK (negative acknowledgment) — if an Envoy sidecar receives an invalid configuration, it rejects the update and continues running with the last known good configuration. This is a critical safety mechanism that prevents a bad control plane push from taking down the entire data plane.

Circuit Breaking

Circuit breaking in Envoy prevents cascading failures by limiting the resources any single upstream cluster can consume. Unlike application-level circuit breakers (Hystrix-style open/closed/half-open state machines), Envoy's circuit breakers are resource-based thresholds that limit concurrent connections, concurrent requests, pending requests, and retries.

The four circuit breaker thresholds per cluster are:

max_connections — Maximum number of TCP connections to the cluster. Once reached, new connections are queued or rejected. Default: 1024.
max_pending_requests — Maximum number of requests waiting for an available connection from the connection pool. Once this queue fills, additional requests receive a 503. Default: 1024.
max_requests — Maximum total active requests to the cluster. For HTTP/2 and gRPC, this limits total concurrent streams across all connections. Default: 1024.
max_retries — Maximum concurrent retries to the cluster. This prevents retry storms from amplifying failures. Default: 3.

These are separate priority levels: Envoy distinguishes between default and high-priority traffic, each with their own thresholds. Health check traffic uses the high-priority pool so it is never starved by application traffic.

Circuit breaker events are exposed as counters in Envoy's statistics (cluster.<name>.circuit_breakers.default.cx_open, etc.) and can trigger alerts. When debugging a 503 cascade, checking whether any circuit breaker threshold has been hit is the first thing to look at. A common production issue is setting max_connections too low for a gRPC cluster — since gRPC multiplexes many streams over few connections, you might only need 2-3 connections, but the max_requests limit matters much more.

Outlier Detection

While circuit breaking limits the total resources consumed by a cluster, outlier detection (also called outlier ejection) removes individual unhealthy hosts from the load balancing pool. It is Envoy's equivalent of a more traditional circuit breaker at the host level — automatically detecting misbehaving endpoints and temporarily removing them from rotation.

Outlier detection works by monitoring the behavior of each host in a cluster and ejecting hosts that exceed configured failure thresholds. The detection modes include:

Consecutive errors — If a host returns 5 consecutive 5xx errors (or connection failures), it is ejected. The default threshold is 5.
Success rate — Envoy calculates the success rate (percentage of non-5xx responses) for each host and ejects hosts whose success rate is more than one standard deviation below the cluster mean. This is statistical outlier detection — a host that is significantly worse than its peers gets removed, even if its absolute error rate might seem acceptable in isolation.
Failure percentage — Similar to success rate, but uses a fixed threshold instead of statistical deviation. A host is ejected if its failure percentage exceeds a configured value (e.g., 30%).
Locally-originated errors — Tracks connection failures (TCP connect errors, timeouts) separately from HTTP errors. This catches backends that are network-unreachable even if they never send an HTTP 5xx.

Ejected hosts are removed from the load balancing rotation for a configurable base ejection time (default: 30 seconds), which increases exponentially with each subsequent ejection. There is a maximum ejection percentage (default: 10%) that prevents Envoy from ejecting all hosts in a cluster, which would leave no backends to handle traffic. This safeguard means that if 50% of your backends are failing, outlier detection will eject up to 10% and continue routing to the remaining unhealthy hosts rather than dropping all traffic.

The interplay between outlier detection and circuit breaking is important. Outlier detection removes individual bad hosts, reducing the error rate seen by the circuit breaker. If enough hosts are ejected that the remaining hosts become overloaded and start hitting circuit breaker thresholds, the circuit breaker prevents additional requests from being sent. Together, they provide defense in depth: outlier detection handles partial failures, circuit breaking handles total overload.

Connection Pooling and HTTP/2 Multiplexing

Envoy manages connection pools to upstream clusters differently for HTTP/1.1 and HTTP/2. For HTTP/1.1, Envoy maintains a pool of connections to each upstream host and dispatches one request per connection at a time (unless HTTP pipelining is enabled, which it rarely is). For HTTP/2, Envoy uses a single connection (or a small number of connections) per upstream host and multiplexes all requests as concurrent streams on that connection.

This distinction matters for gRPC services. gRPC exclusively uses HTTP/2, so a single TCP connection carries all RPC streams. If you set max_connections: 1 on a gRPC cluster, Envoy will open exactly one HTTP/2 connection per host and multiplex potentially thousands of streams on it. The max_requests circuit breaker then controls how many concurrent streams are allowed. This is why gRPC clusters should have a high max_requests value and a relatively low max_connections value — the opposite of what you'd configure for an HTTP/1.1 REST API.

Envoy also supports upstream HTTP/2 connection pooling with max_concurrent_streams limits. If you configure a maximum of 100 streams per connection, Envoy will automatically open additional connections when the stream count exceeds the limit. This is useful for backends that cannot handle unbounded stream counts on a single connection.

Observability: Stats, Tracing, and Access Logs

Envoy was designed from the start with observability as a first-class concern, not an afterthought. Every connection, request, failure, retry, and timeout is tracked. The three pillars of Envoy observability are metrics (stats), distributed tracing, and access logs.

Statistics

Envoy emits thousands of metrics covering every aspect of its operation. Stats are hierarchically named with dot-separated components and come in three types: counters (monotonically increasing totals), gauges (instantaneous values), and histograms (distribution of values, typically latencies).

Key stat prefixes and what they measure:

cluster.<name>.upstream_rq_total — Total requests to the cluster
cluster.<name>.upstream_rq_2xx — Responses with 2xx status codes
cluster.<name>.upstream_rq_5xx — 5xx responses (server errors)
cluster.<name>.upstream_rq_time — Histogram of upstream response times
cluster.<name>.upstream_cx_active — Current active connections to the cluster
cluster.<name>.outlier_detection.ejections_active — Currently ejected hosts
cluster.<name>.circuit_breakers.default.rq_open — Whether the request circuit breaker is open
listener.<address>.downstream_cx_total — Total connections accepted on this listener
http.<stat_prefix>.downstream_rq_total — Total HTTP requests received
server.live — Whether Envoy considers itself healthy

Stats can be exported to Prometheus (via the built-in /stats/prometheus endpoint), StatsD, DogStatsD, or any other backend via stats sinks. In production, Prometheus scraping is by far the most common approach: each Envoy sidecar exposes a /stats/prometheus endpoint, and a Prometheus server scrapes it on a configured interval.

Distributed Tracing

Envoy supports distributed tracing out of the box. The HCM generates a trace span for each request and reports it to a configured tracing backend (Zipkin, Jaeger, Datadog, or OpenTelemetry Collector). In a service mesh, each sidecar generates a span, and when all spans for a request are assembled, they form a complete trace showing the request's path through the system with latency at each hop.

Envoy propagates trace context headers automatically — it reads and writes B3 headers (Zipkin), W3C Trace Context, and OpenTelemetry baggage. The application does not need to instrument its code for basic inter-service tracing. However, for intra-service tracing (spans within the application itself), the application must propagate the trace context headers on its outgoing requests. This is the main limitation of sidecar-based tracing: Envoy can only see inter-service hops, not what happens inside the application.

Access Logging

Envoy's access log subsystem records details about every request and response. The log format is configurable with command operators that expand to request attributes. A typical access log entry includes: response code, response flags (whether the request was retried, timed out, or circuit-broken), upstream host, upstream response time, downstream TLS version, and gRPC status code.

Response flags are particularly useful for debugging. Some critical flags:

UH — No healthy upstream (all hosts ejected or cluster empty)
UF — Upstream connection failure
UO — Upstream overflow (circuit breaker open)
NR — No route configured for the request
RL — Rate limited by the rate limit service
DC — Downstream connection termination
UT — Upstream request timeout
LR — Connection local reset (Envoy reset the connection)

When a user reports a failed request, checking the Envoy access log for the response flag immediately tells you whether it was a routing problem (NR), a backend failure (UF), a capacity issue (UO), or a timeout (UT).

Health Checking

Envoy supports active and passive health checking. Active health checks send periodic probes to each upstream host — HTTP requests to a health endpoint, TCP connections, or gRPC health check RPCs (using the grpc.health.v1.Health protocol). Passive health checking is essentially outlier detection, as described above.

Active health checks and outlier detection interact: an actively health-checked host that fails becomes unhealthy and is removed from the load balancing rotation. An outlier-ejected host continues to receive active health checks, and if the active checks pass after the ejection period expires, the host is returned to service.

For gRPC services, Envoy can use the gRPC health checking protocol, sending grpc.health.v1.Health/Check RPCs to each upstream and evaluating the response. This is preferable to HTTP health checks for gRPC backends because it validates the entire gRPC stack (HTTP/2, gRPC framing, application logic), not just TCP connectivity.

Wasm Extensions: Custom Logic Without Recompilation

Envoy's Wasm (WebAssembly) extension system allows developers to write custom proxy logic in any language that compiles to Wasm (Rust, Go, C++, AssemblyScript) and load it into Envoy at runtime. This is the most significant extensibility feature added to Envoy since its inception, replacing the need to maintain custom Envoy builds with compiled-in C++ filters.

Wasm extensions run in a sandboxed VM within the Envoy process. They interact with Envoy through a well-defined ABI (the proxy-wasm ABI), which provides functions for reading and modifying request/response headers, body, trailers, accessing shared state, making HTTP calls, and emitting metrics and logs. The sandbox ensures that a buggy extension cannot crash the entire proxy — it can be terminated without affecting other requests.

A typical Wasm extension lifecycle:

The extension is compiled to a .wasm binary (typically a few hundred KB to a few MB).
The binary is loaded into Envoy via static configuration or dynamically via ECDS (Extension Config Discovery Service).
Envoy instantiates a Wasm VM (V8 or Wasmtime) and loads the module.
For each request, Envoy calls the extension's on_request_headers, on_request_body, on_response_headers, and on_response_body callbacks.
The extension can read headers, modify them, add new headers, pause processing to make an async HTTP call, or return an immediate response (e.g., 403 Forbidden).

Common use cases for Wasm extensions include: custom authentication protocols, request transformation and enrichment, data masking in access logs, custom rate limiting logic, and A/B testing header injection. Istio uses Wasm extensions for its telemetry and stats collection, replacing what used to be compiled-in Mixer adapters.

The performance overhead of Wasm compared to native C++ filters is approximately 2-10x slower for CPU-bound operations, but for typical filter workloads (header inspection, a few string comparisons, metric emission), the per-request cost is negligible — typically under 100 microseconds. The operational benefits of not maintaining custom Envoy builds almost always outweigh the performance cost.

Retry Policies

Envoy supports fine-grained retry policies at both the route and virtual host level. Retries are not a simple "retry on failure" — Envoy allows you to specify exactly which conditions trigger a retry, how many retries to attempt, and how long to wait between them.

Retry conditions can be combined:

5xx — Retry on any 5xx response from the upstream
gateway-error — Retry on 502, 503, or 504
connect-failure — Retry on TCP connection failures
retriable-4xx — Retry on 409 Conflict (useful for optimistic concurrency)
reset — Retry when the upstream resets the connection
retriable-status-codes — Retry on a configurable set of status codes
retriable-headers — Retry based on response header values

The retry budget (controlled by max_retries in circuit breaking) prevents retry storms. If the cluster's retry budget is exhausted, Envoy stops retrying even if individual routes have retries configured. This is a critical safeguard: without it, a failing backend could receive 3x its normal traffic from retries, making the failure worse.

Envoy also supports retry backoff with configurable base interval and maximum interval (defaulting to 25ms base with a 10x max). The actual delay is jittered randomly between 0 and the current backoff interval to prevent thundering herds. For hedged requests (speculative retries), Envoy can send a retry before the original request has failed, useful for latency-sensitive services where a P99 tail latency from one backend can be masked by a fast retry to another.

Envoy in the Network: Real-World Deployments

Envoy operates at multiple layers of the network stack in production deployments. As a sidecar proxy in Kubernetes, it intercepts all pod traffic via iptables rules injected by an init container. As an ingress gateway, it terminates external TLS connections and routes traffic into the mesh. As an egress gateway, it controls and monitors outbound traffic from the mesh to external services.

In a typical Istio deployment on a cloud provider, external traffic hits a cloud load balancer (like AWS NLB or GCP External LB), which forwards to the Istio Ingress Gateway — an Envoy deployment running as a Kubernetes Deployment with a LoadBalancer Service. The ingress Envoy terminates TLS, applies routing rules from Istio Gateway and VirtualService resources, and forwards traffic to the destination pod's sidecar Envoy, which then delivers it to the application on localhost.

The BGP routing that underlies all of this is transparent to Envoy. When Envoy connects to an upstream endpoint at 10.244.1.5:8080, the Kubernetes CNI plugin handles the routing — possibly via VXLAN overlays, direct BGP peering (as with Calico), or eBPF programs. You can examine how traffic reaches a cloud provider's network by looking up the provider's ASN (e.g., Google's AS15169 or Amazon's AS16509) to understand the upstream network topology your Envoy proxies are operating within.

Rate Limiting

Envoy supports two rate limiting models: local rate limiting and global rate limiting. Local rate limiting uses a token bucket per Envoy instance — simple, fast, no external dependencies, but each instance limits independently so the aggregate limit scales linearly with the number of Envoy proxies. Global rate limiting calls an external gRPC rate limit service that maintains centralized counters, providing accurate fleet-wide limits at the cost of an additional network hop per request.

The global rate limit service receives descriptors from Envoy — key-value pairs that identify what to rate limit. For example, a descriptor might be [{"key": "remote_address", "value": "203.0.113.10"}, {"key": "path", "value": "/api/v1/orders"}]. The rate limit service looks up the configured limit for that descriptor set and returns either OK or OVER_LIMIT. Envoy then either forwards the request or returns 429 Too Many Requests.

Rate limit descriptors can be generated from request attributes: source IP, destination cluster, request headers, path, and more. This allows complex rate limiting policies like "100 requests per minute per API key to the /search endpoint" without any application-level rate limiting code.

Envoy vs. Other Proxies

Envoy occupies a distinct niche in the proxy landscape. Compared to Nginx, Envoy has dynamic configuration (no reloads needed), native HTTP/2 support on both client and upstream sides, and built-in distributed tracing and rich metrics. Nginx is faster for static file serving and has a larger ecosystem of modules, but lacks native xDS support (Nginx's dynamic configuration requires the commercial NGINX Plus or manual API calls).

Compared to HAProxy, Envoy has better HTTP/2 and gRPC support, the xDS API for dynamic configuration, and Wasm extensibility. HAProxy has superior raw throughput for TCP proxying, more mature connection-level features (like stick tables for session persistence), and a simpler operational model for teams that do not need dynamic configuration.

Compared to Linkerd's proxy (linkerd2-proxy, written in Rust), Envoy is larger, more configurable, and more general-purpose. Linkerd's proxy is purpose-built for service mesh sidecar use cases and is significantly smaller in binary size and memory footprint. The trade-off is flexibility: Envoy can be configured to do almost anything, while linkerd2-proxy does one thing well.

The reason Envoy won the service mesh proxy war is the xDS API. By defining a standard configuration protocol, Envoy enabled an ecosystem of control planes (Istio, Consul Connect, AWS App Mesh, Gloo) that all speak the same language to the data plane. This is similar to how BGP became the dominant routing protocol not because it was the fastest, but because its standardized protocol allowed interoperability between every vendor and network.

Debugging Envoy in Production

When things go wrong with Envoy, the debugging process follows a consistent pattern. Start with the admin interface (typically on port 9901):

/config_dump — Dumps the entire running configuration as JSON. Use this to verify that xDS pushed what you expected. Filter by resource type: /config_dump?resource=dynamic_listeners.
/clusters — Shows all clusters, their endpoints, health status, and outstanding requests. This immediately reveals whether the right endpoints are registered and whether any are unhealthy.
/stats — All counters, gauges, and histograms. Filter with /stats?filter=cluster.my_service. Look for upstream_rq_5xx, circuit_breakers.*.rq_open, and outlier_detection.ejections_active.
/logging?level=debug — Temporarily enable debug logging. This is extremely verbose but shows every xDS update, connection event, and routing decision. Always set it back to info or warning after debugging.
/ready — Whether Envoy is ready to accept traffic. During startup, Envoy will not report ready until it has received its initial xDS configuration from the control plane.

A common debugging scenario: a service returns 503s. Check the access log for the response flag. If it is UO (upstream overflow), the circuit breaker is open — check /stats for which threshold was hit. If it is UH (no healthy upstream), check /clusters to see if all endpoints were ejected by outlier detection. If it is NR (no route), check /config_dump to verify the RDS configuration includes a route for the request's path and Host header. If it is UF (upstream connection failure), the problem is at the network layer — check whether the upstream pod is running, whether the network policy allows the connection, and whether the upstream port is correct.

Understanding Envoy's internals is essential for anyone building or operating modern distributed systems. Whether you are running a service mesh, designing a load balancing tier, or building a platform that needs dynamic traffic management, Envoy's architecture — listeners, filter chains, clusters, xDS, circuit breaking, outlier detection, and Wasm extensibility — provides the building blocks. Its adoption across the industry, from Lyft's original deployment to Google's Traffic Director to every major service mesh, validates that the architecture is sound and the abstraction boundaries are right.