How Reverse Proxies Work: TLS Termination, Routing, and Connection Pooling

A reverse proxy is a server that sits in front of one or more backend servers, intercepting every client request before it reaches the origin. Unlike a forward proxy, which acts on behalf of clients to access external resources, a reverse proxy acts on behalf of servers to receive inbound traffic. Clients connect to the reverse proxy's IP address and never learn the identity or location of the actual backend. This fundamental indirection enables TLS termination, request routing, connection pooling, caching, load balancing, and security enforcement at a single choke point in the network path. Every major website and API on the internet today uses a reverse proxy -- whether it is Nginx, HAProxy, Envoy, Caddy, or a cloud-managed load balancer. Understanding how reverse proxies work at the protocol level is essential for anyone building or operating production infrastructure.

Forward Proxy vs Reverse Proxy

The terms "forward proxy" and "reverse proxy" describe opposite orientations of the same concept: an intermediary that terminates a connection on one side and opens a new connection on the other. A forward proxy sits in front of clients. A reverse proxy sits in front of servers. The distinction matters because it determines who configures the proxy, who trusts it, and what it can see.

A forward proxy is configured by the client (or the client's network administrator). The client explicitly sends requests to the proxy, which then forwards them to the destination server. The server sees the proxy's IP address, not the client's. Corporate web filters, Tor exit nodes, and SOCKS proxies are all forward proxies. The client knows it is using a proxy; the server does not necessarily know.

A reverse proxy is configured by the server operator. The client connects to the proxy's public IP address, typically without any awareness that a proxy exists. DNS resolves the domain name to the reverse proxy's IP, not the backend's. The proxy terminates the client connection, inspects the request, makes routing decisions, and opens a separate connection to the chosen backend. The backend sees the proxy's IP address, not the client's -- which is why headers like X-Forwarded-For exist.

Forward Proxy vs Reverse Proxy Forward Proxy Client A Client B Forward Proxy client-configured hides client identity Server X Server Y knows proxy sees proxy IP Reverse Proxy Client A Client B Reverse Proxy server-configured hides backend identity Backend 1 Backend 2 unaware of proxy sees proxy IP Key Difference Forward: client-side Reverse: server-side Both terminate + re-open

The critical architectural point is that both types of proxy create two independent connections. The proxy terminates the inbound connection fully -- including TLS, if present -- and establishes a new connection to the destination. This connection boundary is what enables every capability a reverse proxy provides: it can inspect, modify, cache, rate-limit, and route traffic because it has full visibility into the decrypted request on both sides.

TLS Termination

TLS termination is the single most common reason organizations deploy a reverse proxy. The proxy holds the server's TLS certificate and private key, performs the TLS handshake with the client, and decrypts the traffic. Backend servers receive plaintext HTTP (or optionally re-encrypted traffic over a separate TLS connection, known as TLS re-encryption or backend TLS).

Centralizing TLS at the reverse proxy has several advantages. Certificate management is consolidated -- you rotate certificates in one place instead of on every backend. The proxy can use hardware acceleration (AES-NI instructions on modern CPUs) and optimized TLS libraries (BoringSSL, OpenSSL 3.x) to handle handshakes at scale. The proxy can enforce a uniform TLS policy: minimum TLS version (1.2 or 1.3), allowed cipher suites, OCSP stapling, and HSTS headers. Backends do not need to be TLS-aware at all, simplifying their configuration and reducing their attack surface.

The proxy terminates TLS at the socket level. When a client connects, the proxy performs the TCP three-way handshake, then the TLS handshake (ServerHello, certificate exchange, key exchange, Finished). Once the TLS session is established, every byte the client sends is decrypted by the proxy, parsed as HTTP, and then forwarded to the backend over a separate connection. If the backend connection also uses TLS, the proxy re-encrypts the data -- this is called TLS bridging. If the backend connection is plaintext, it is called TLS offloading.

# Nginx TLS termination example
server {
    listen 443 ssl http2;
    server_name api.example.com;

    ssl_certificate     /etc/ssl/certs/api.example.com.pem;
    ssl_certificate_key /etc/ssl/private/api.example.com.key;
    ssl_protocols       TLSv1.2 TLSv1.3;
    ssl_ciphers         ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384;
    ssl_session_cache   shared:SSL:10m;
    ssl_session_timeout 1h;

    location / {
        proxy_pass http://backend_pool;  # plaintext to backend
    }
}

TLS session resumption (session tickets or session IDs) is critical at scale. A full TLS 1.2 handshake requires two round trips; a resumed session requires one. TLS 1.3 reduces a full handshake to one round trip and supports 0-RTT resumption, though 0-RTT is vulnerable to replay attacks and must be used carefully. The reverse proxy manages session state and ticket keys, rotating them periodically for forward secrecy.

In multi-proxy deployments (multiple Nginx or HAProxy instances behind a load balancer), TLS session ticket keys must be synchronized across all instances. Otherwise, a client resuming a session on a different proxy instance will fall back to a full handshake, negating the performance benefit. Some deployments use shared ticket key files distributed via configuration management; others use a centralized key management service.

Request Routing: Host-Based, Path-Based, and Header-Based

Once TLS is terminated and the HTTP request is parsed, the reverse proxy must decide where to send it. This routing decision is the core intelligence of a reverse proxy. Routing rules can match on virtually any property of the HTTP request: the Host header, the URL path, query parameters, HTTP method, custom headers, cookies, source IP, or combinations of all of these.

Host-Based Routing (Virtual Hosting)

Host-based routing maps different domain names to different backend pools, all served from the same IP address. This is the modern equivalent of name-based virtual hosting in Apache. The reverse proxy inspects the Host header (or the :authority pseudo-header in HTTP/2) and selects the appropriate backend.

# HAProxy host-based routing
frontend https_front
    bind *:443 ssl crt /etc/ssl/certs/
    use_backend api_servers  if { hdr(host) -i api.example.com }
    use_backend web_servers  if { hdr(host) -i www.example.com }
    use_backend admin_servers if { hdr(host) -i admin.example.com }
    default_backend web_servers

With SNI (Server Name Indication), the reverse proxy can also route at the TLS layer -- before even decrypting the traffic. This enables TLS passthrough, where the proxy forwards the encrypted connection to a backend that terminates TLS itself. SNI-based routing inspects the ClientHello message to extract the requested hostname and routes accordingly, without needing the server's private key.

Path-Based Routing

Path-based routing sends different URL paths to different backend pools. This is the standard pattern for decomposing a monolith into microservices without changing the public-facing URL structure:

# Nginx path-based routing
location /api/ {
    proxy_pass http://api_upstream;
}
location /static/ {
    proxy_pass http://cdn_upstream;
}
location /ws {
    proxy_pass http://websocket_upstream;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
}
location / {
    proxy_pass http://web_upstream;
}

Path matching semantics vary between proxies. Nginx uses a priority-based system: exact matches (=) beat prefix matches, and regex matches (~) beat generic prefix matches. Envoy evaluates routes in order and uses the first match. HAProxy uses ACLs with explicit use_backend directives. Understanding the matching semantics of your proxy is critical -- a misconfigured path rule can route traffic to the wrong backend or expose internal services.

Header-Based and Advanced Routing

Beyond host and path, reverse proxies can route on arbitrary headers, cookies, and request properties. Common patterns include routing based on Accept headers (content negotiation), API version headers (X-API-Version), A/B testing cookies, geographic headers injected by an upstream CDN, or client certificate properties in mTLS deployments.

Envoy supports particularly expressive routing via its route configuration, including weighted routing (send 5% of traffic to a canary deployment), header-based matching with regex, and runtime-configurable route tables updated via xDS APIs without proxy restarts.

Connection Pooling and Multiplexing

Connection pooling is one of the most impactful performance optimizations a reverse proxy provides. Without a proxy, every client that connects to a backend server requires a dedicated TCP connection (and TLS handshake, if encrypted). With thousands of clients, the backend must maintain thousands of connections, each consuming kernel memory for socket buffers, file descriptors, and TLS state.

A reverse proxy collapses this fan-in. It terminates all client connections and multiplexes them over a small, persistent pool of connections to each backend. If 10,000 clients are connected to the proxy, the proxy might maintain only 50 connections to each backend server, reusing them across requests. This is possible because HTTP requests are short-lived: a client sends a request, the proxy forwards it over an existing backend connection, receives the response, and the backend connection is immediately available for the next request from any client.

# Nginx upstream connection pooling
upstream api_pool {
    server 10.0.1.10:8080;
    server 10.0.1.11:8080;
    server 10.0.1.12:8080;

    keepalive 64;              # maintain 64 idle connections per worker
    keepalive_requests 1000;   # max requests per connection before recycling
    keepalive_timeout 60s;     # close idle connections after 60s
}

Connection pooling interacts with HTTP protocol versions. With HTTP/1.1, each backend connection can process one request at a time (head-of-line blocking). The proxy needs enough pooled connections to handle concurrent requests to each backend. With HTTP/2, a single connection can multiplex hundreds of concurrent streams, so the proxy may need only one or two connections per backend. Envoy supports HTTP/2 to backends natively and can maintain a single multiplexed connection per upstream host, drastically reducing connection overhead.

Connection lifecycle management matters. Long-lived pooled connections can encounter issues: backends may silently close idle connections, TCP keepalive timers may expire, or load balancers between the proxy and backend may drop idle connections. Production configurations set keepalive_timeout values lower than the backend's idle timeout to avoid sending requests on stale connections. Some proxies implement connection health checks at the pool level, proactively replacing connections that have been open too long.

Buffering and Streaming

Reverse proxies must decide how to handle request and response bodies: buffer them entirely before forwarding, or stream them byte-by-byte as they arrive. This decision has profound implications for latency, memory usage, and backend protection.

Full buffering means the proxy reads the entire request body from the client before forwarding any data to the backend. This protects backends from slow clients (the "slowloris" pattern) -- the backend connection is occupied only for the time it takes to process the request, not for the time it takes a slow mobile client to upload the body. Nginx buffers request bodies by default (controlled by proxy_request_buffering) and also buffers responses (controlled by proxy_buffering).

Streaming (unbuffered mode) forwards data as soon as it is received. This is necessary for use cases where buffering would break the protocol: WebSocket connections, server-sent events (SSE), gRPC streaming RPCs, chunked transfer encoding with unknown body sizes, and large file uploads where buffering would exhaust proxy memory.

# Nginx: disable buffering for streaming endpoints
location /events {
    proxy_pass http://sse_backend;
    proxy_buffering off;           # do not buffer responses
    proxy_request_buffering off;   # do not buffer requests
    proxy_http_version 1.1;
    proxy_set_header Connection "";
}

# Nginx: configure buffering limits for normal traffic
location /api/ {
    proxy_pass http://api_backend;
    proxy_buffering on;
    proxy_buffer_size 8k;         # buffer for first part of response (headers)
    proxy_buffers 16 8k;          # 16 buffers of 8k each = 128k max buffered
    proxy_busy_buffers_size 16k;  # send to client while still buffering
}

Response buffering has a particularly important interaction with backend performance. When the proxy buffers the response, the backend can write the entire response quickly and release its resources (thread, database connection, memory). The proxy then drains the buffered response to the client at whatever speed the client can accept. Without buffering, a slow client holds the backend connection open for the duration of the transfer, consuming backend resources. This is why Nginx enables response buffering by default -- it allows backends to serve more requests by decoupling backend processing time from client download speed.

Client Identity: X-Forwarded-For, X-Real-IP, and the Forwarded Header

When a reverse proxy terminates the client connection and opens a new connection to the backend, the backend sees the proxy's IP address as the source, not the original client's. This is a fundamental consequence of the two-connection architecture. To preserve client identity, the proxy injects headers that carry the original client information.

X-Forwarded-For (XFF)

X-Forwarded-For is the most widely used client identity header. It contains a comma-separated list of IP addresses, where each proxy in the chain appends the IP of the host that connected to it:

X-Forwarded-For: 203.0.113.50, 198.51.100.10, 10.0.0.1

In this example, 203.0.113.50 is the original client, 198.51.100.10 is the first proxy (perhaps a CDN edge), and 10.0.0.1 is the second proxy (perhaps an internal load balancer). The backend reads the leftmost untrusted IP to determine the client's real address.

XFF is trivially spoofable. A malicious client can send a request with a pre-populated X-Forwarded-For header. If the backend naively reads the leftmost IP, the client controls what it sees. The correct approach is to configure the backend (or proxy) with a list of trusted proxy IPs and strip or ignore XFF entries from untrusted sources. Nginx implements this via the realip module with set_real_ip_from directives that define trusted proxy ranges.

X-Real-IP

X-Real-IP is a simpler, non-standard header that contains a single IP address -- the original client's. Unlike XFF, it does not accumulate entries as the request traverses multiple proxies. The first proxy in the chain sets it, and subsequent proxies typically pass it through unchanged. This header is popular in Nginx deployments:

proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

The Standardized Forwarded Header (RFC 7239)

RFC 7239 introduced the Forwarded header as a standardized replacement for the ad-hoc X-Forwarded-* headers. It uses a structured syntax that encodes the client address, the protocol, the original host, and an identifier for each proxy in the chain:

Forwarded: for=203.0.113.50;proto=https;host=api.example.com;by=10.0.0.1

Multiple proxies add entries separated by commas. The for parameter can include IPv6 addresses (quoted and bracketed: for="[2001:db8::1]") and port numbers (for="203.0.113.50:8443"). Despite being an RFC standard, adoption of the Forwarded header has been slow -- most production deployments still rely on X-Forwarded-For because of its universal support across backend frameworks and middleware.

X-Forwarded-Proto and X-Forwarded-Host

X-Forwarded-Proto tells the backend whether the original client connection used HTTP or HTTPS. This is essential when the proxy terminates TLS and forwards plaintext to the backend -- without this header, the backend cannot generate correct redirect URLs or set secure cookie flags. X-Forwarded-Host preserves the original Host header when the proxy changes it (for example, when rewriting api.example.com to an internal hostname like api-v2.internal:8080).

PROXY Protocol

The X-Forwarded-For family of headers works at Layer 7 (HTTP). But what about Layer 4 proxying, where the proxy forwards raw TCP without parsing HTTP? The PROXY protocol, developed by HAProxy's Willy Tarreau, solves this by prepending a small header to the TCP connection that carries the original client's IP address and port.

PROXY protocol version 1 is a human-readable text line prepended to the TCP stream:

PROXY TCP4 203.0.113.50 198.51.100.10 56324 443\r\n

This single line tells the backend: the original client was 203.0.113.50:56324, connecting to 198.51.100.10:443, over TCP/IPv4. After this line, the rest of the stream is the raw application data (which might be a TLS ClientHello, an SMTP greeting, or any other protocol).

PROXY protocol version 2 uses a binary encoding that is more efficient to parse and supports additional metadata via TLV (Type-Length-Value) extensions. These extensions can carry information like the SNI hostname, the TLS version, the client certificate's distinguished name, and custom vendor-specific data.

# HAProxy: send PROXY protocol v2 to backend
backend mail_servers
    mode tcp
    server mail1 10.0.1.20:25 send-proxy-v2

# Nginx: accept PROXY protocol from upstream load balancer
server {
    listen 80 proxy_protocol;
    set_real_ip_from 10.0.0.0/8;
    real_ip_header proxy_protocol;
}

PROXY protocol is essential in architectures where an L4 load balancer (like an AWS Network Load Balancer or a bare-metal HAProxy in TCP mode) sits in front of the reverse proxy. Without it, the reverse proxy would see the load balancer's IP as the client IP and have no way to recover the original client address. The protocol is supported by Nginx, HAProxy, Envoy, AWS ALB/NLB, and most modern proxy software.

Health Checks

A reverse proxy that routes traffic to multiple backends must know which backends are healthy. If a backend is down and the proxy sends traffic to it, every request fails. Health checks continuously probe backends and remove unhealthy ones from the routing pool.

Passive (Reactive) Health Checks

Passive health checks infer backend health from real traffic. If a backend returns errors (5xx responses, connection timeouts, connection refusals), the proxy marks it as unhealthy after a configurable threshold. Nginx implements this with the max_fails and fail_timeout parameters:

upstream api_pool {
    server 10.0.1.10:8080 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:8080 max_fails=3 fail_timeout=30s;
}

After 3 failures within 30 seconds, the server is marked down for the remainder of the 30-second window. The limitation of passive checks is that they require actual traffic to detect failures -- a low-traffic backend might not be probed frequently enough to detect a failure promptly. And the requests that trigger the failure detection are themselves failed requests, which means some users experience errors before the proxy reacts.

Active Health Checks

Active health checks send synthetic probe requests to backends at regular intervals, independent of real traffic. These probes can be TCP connection checks (can the proxy open a connection?), HTTP checks (does GET /health return 200?), or custom checks (does the response body contain a specific string? does the response time meet an SLA?).

# HAProxy active health check
backend api_servers
    option httpchk GET /health HTTP/1.1\r\nHost:\ api.internal
    http-check expect status 200
    server srv1 10.0.1.10:8080 check inter 5s fall 3 rise 2
    server srv2 10.0.1.11:8080 check inter 5s fall 3 rise 2

The inter 5s parameter sends a probe every 5 seconds. fall 3 marks the server down after 3 consecutive failures. rise 2 marks it back up after 2 consecutive successes. This hysteresis prevents flapping -- a server that is intermittently failing will not bounce rapidly between up and down states.

Envoy supports a particularly sophisticated health checking model: it can perform HTTP health checks with custom headers, gRPC health checks using the standard gRPC health checking protocol, and TCP health checks that verify expected response bytes. Envoy also supports outlier detection, which is a form of passive health checking that ejects hosts based on statistical analysis of error rates and latencies -- if a host's error rate is significantly higher than the cluster average, it is ejected temporarily.

Caching at the Reverse Proxy

A reverse proxy is an ideal location for caching HTTP responses. It sits between all clients and all backends, so a single cached response can serve thousands of clients without hitting the backend at all. This reduces backend load, decreases response latency, and improves availability -- if the backend goes down, the proxy can continue serving cached content.

Reverse proxy caching follows the HTTP caching model defined in RFC 7234 (now RFC 9111). The proxy respects Cache-Control directives from the backend: max-age sets the cache TTL, no-store prevents caching entirely, private prevents shared caches (like a proxy) from caching the response, s-maxage overrides max-age specifically for shared caches, and must-revalidate requires the proxy to check with the backend before serving a stale cached response.

# Nginx caching configuration
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=app_cache:100m
                 max_size=10g inactive=60m use_temp_path=off;

server {
    location /api/ {
        proxy_cache app_cache;
        proxy_cache_valid 200 10m;        # cache 200 responses for 10 minutes
        proxy_cache_valid 404 1m;         # cache 404s for 1 minute
        proxy_cache_use_stale error timeout updating http_500 http_502;
        proxy_cache_lock on;              # coalesce concurrent requests for same resource
        proxy_cache_key $scheme$host$request_uri;
        add_header X-Cache-Status $upstream_cache_status;
        proxy_pass http://api_upstream;
    }
}

The proxy_cache_use_stale directive is particularly important for availability. When a backend is down (error, timeout, http_502), the proxy serves the stale cached response instead of returning an error to the client. The proxy_cache_lock directive prevents the thundering herd problem: when multiple clients request the same uncached resource simultaneously, only one request is forwarded to the backend while the others wait for the cache to be populated.

Cache key design matters. The default key typically includes the scheme, host, and URI. But if your application varies responses based on headers (like Accept-Language or Authorization), the cache key must include those headers -- otherwise, the proxy will serve a cached English response to a French-speaking user, or a cached response for user A to user B. The Vary response header instructs the proxy which request headers affect the response.

CDNs like Cloudflare, Fastly, and Akamai are essentially globally distributed reverse proxy caches. They operate on the same principles -- TLS termination, caching, request routing -- but distributed across hundreds of edge locations worldwide, using anycast and BGP to route clients to the nearest edge.

Reverse Proxy Deployment Patterns

Reverse proxies appear in several distinct deployment patterns, each serving a different architectural purpose.

Edge Proxy

The edge proxy is the most traditional pattern: a reverse proxy at the perimeter of the network, exposed to the public internet. It terminates TLS, enforces rate limits, blocks malicious requests, and routes traffic to internal services. The edge proxy is the single point of ingress for all external traffic. Nginx, HAProxy, and cloud load balancers (AWS ALB, GCP HTTPS LB) typically serve this role.

In a Kubernetes environment, the edge proxy is the Ingress controller. It maps external HTTP(S) traffic to internal Kubernetes Services based on Ingress resource rules (host-based and path-based routing). Popular Ingress controllers include Nginx Ingress, Contour (built on Envoy), Traefik, and HAProxy Ingress.

Reverse Proxy Deployment Patterns Edge Proxy Internet Edge Proxy TLS + WAF + routing API service Web app Static files Single ingress point. All external traffic enters here. Nginx, HAProxy, ALB, Cloudflare Sidecar Proxy (Service Mesh) Pod / Container group Envoy App A Pod / Container group Envoy App B mTLS, retries, observability Proxy per service instance. Transparent interception via iptables. Istio, Linkerd. Each sidecar handles mTLS, retries, circuit breaking. API Gateway Mobile Web SPA Partner API Gateway Auth, rate limit, transform versioning, analytics request/response mapping Users svc Orders svc Billing svc Reverse proxy + business logic. Handles auth (OAuth, API keys), rate limiting, request shaping, response aggregation. Kong, Ambassador, Apigee

Sidecar Proxy

In the sidecar pattern, every service instance has its own dedicated reverse proxy running alongside it in the same network namespace (same pod in Kubernetes, same VM, or same container group). The sidecar intercepts all inbound and outbound traffic for its paired application, typically via iptables rules that redirect traffic transparently.

This is the foundation of the service mesh architecture. Envoy is the dominant sidecar proxy, used by Istio, and configured dynamically via xDS APIs. Each sidecar proxy handles mutual TLS, retry logic, circuit breaking, timeout enforcement, and telemetry collection. The application code does not need to implement any of these concerns -- the sidecar handles them transparently at the network layer.

The trade-off is latency and resource overhead. Every request traverses two additional proxy hops (source sidecar to destination sidecar), adding a few hundred microseconds to a few milliseconds of latency per hop. Each sidecar consumes CPU and memory. In large deployments with thousands of services, the aggregate cost of running thousands of Envoy sidecars is non-trivial.

API Gateway

An API gateway is a reverse proxy with application-aware intelligence layered on top. Beyond basic routing and TLS termination, an API gateway enforces authentication (OAuth 2.0 token validation, API key checks, JWT verification), rate limiting (per-client, per-endpoint, per-plan), request and response transformation (field filtering, format conversion, schema validation), API versioning (routing /v1/ and /v2/ to different backends), and analytics (request logging, latency tracking, error rate monitoring).

API gateways like Kong, AWS API Gateway, and Apigee are reverse proxies at their core -- they terminate connections, route traffic, and proxy to backends. The distinction is in the higher-level policies they enforce. Some API gateways are built directly on top of existing reverse proxies: Kong is built on Nginx, Ambassador is built on Envoy, and Gloo Edge is also Envoy-based.

Load Balancing Algorithms

When a reverse proxy has multiple backends for a given route, it must choose which backend receives each request. The selection algorithm is the load balancing strategy. Common algorithms include:

Round Robin distributes requests sequentially across backends. Simple and effective when backends are homogeneous. Weighted round robin assigns proportional traffic shares to backends with different capacities.

Least Connections sends each request to the backend with the fewest active connections. This naturally adapts to backends with varying response times -- slow backends accumulate connections and receive fewer new requests.

IP Hash hashes the client's IP address to select a backend, ensuring the same client always reaches the same server. Useful for session affinity without cookies, but produces uneven distribution when many clients share an IP (corporate NATs, CGNAT).

Random with Two Choices (P2C) picks two random backends and sends the request to the one with fewer active connections. This provides near-optimal load distribution with minimal coordination overhead and is the default algorithm in Envoy.

Consistent Hashing maps requests to backends using a hash ring, minimizing disruption when backends are added or removed. Only requests that hash to the added or removed segment of the ring are redistributed; all others continue to reach the same backend. This is critical for caching use cases where backend affinity determines cache hit rates.

Connection Draining and Graceful Shutdown

When a backend needs to be taken out of rotation (for deployment, maintenance, or scaling down), active requests to that backend must not be interrupted. Connection draining is the process of allowing in-flight requests to complete while preventing new requests from being routed to the backend.

The proxy marks the backend as "draining" -- it stops sending new requests but does not forcibly close existing connections. A configurable timeout determines how long the proxy waits before forcibly closing remaining connections. In HAProxy, setting a server's weight to 0 or administrative state to "drain" achieves this. In Kubernetes, when a pod enters the Terminating state, the Ingress controller removes it from the backend pool while the pod's terminationGracePeriodSeconds allows in-flight requests to complete.

For WebSocket connections and long-lived streaming connections, draining can take minutes or hours. Production deployments set a hard-stop-after timeout that forcibly terminates connections that have not completed within the drain window, balancing availability against deployment velocity.

Security at the Reverse Proxy

The reverse proxy is the natural enforcement point for security policies because it sees every request before backends do.

Rate limiting restricts the number of requests a client can make within a time window. The proxy can rate-limit by source IP, API key, authenticated user, or any combination. Rate limiting at the proxy protects backends from abuse, DoS attacks, and misbehaving clients.

WAF (Web Application Firewall) rules inspect request bodies, headers, and URL parameters for attack patterns: SQL injection, cross-site scripting, path traversal, and command injection. ModSecurity is a widely deployed WAF engine that integrates with Nginx and HAProxy.

Request size limits prevent clients from sending excessively large bodies that could exhaust backend memory. The proxy rejects oversized requests before they reach the backend.

Header sanitization strips or overwrites dangerous headers. The proxy should always overwrite X-Forwarded-For (not append to an attacker-controlled value), strip internal routing headers that clients should not be able to set, and enforce Host header validation to prevent DNS rebinding and host header injection attacks.

Client certificate validation in mTLS deployments verifies that the connecting client presents a valid certificate signed by a trusted CA. The proxy can extract the client identity from the certificate and pass it to the backend via a header, enabling strong authentication without the backend needing to handle TLS client certificate logic.

Observability and Logging

Because a reverse proxy handles every request, it is the single best vantage point for observability. The proxy can emit structured access logs with fields that no individual backend can provide: the total client-facing latency (including TLS handshake and proxy processing time), the upstream response time, cache hit/miss status, the backend server that handled the request, retry count, client TLS version and cipher suite, and request/response body sizes.

Envoy goes further with built-in support for distributed tracing (Zipkin, Jaeger, OpenTelemetry), Prometheus metrics export (histograms of latency, counters of requests by status code, gauges of active connections), and access log filtering (log only 5xx errors, or only requests above a latency threshold). In a service mesh, the sidecar proxies provide a complete map of service-to-service communication without any application instrumentation.

Key metrics to monitor on a reverse proxy include: requests per second (by backend, status code, and path), p50/p95/p99 latency, active connections, connection pool utilization, cache hit ratio, backend health check status, TLS handshake latency, and error rates (4xx, 5xx, connection resets). These metrics directly inform capacity planning, alerting, and incident response.

Reverse Proxies and BGP: The Network Layer Connection

Reverse proxies operate at Layer 7, but their reachability depends entirely on Layer 3 -- BGP routing. The proxy's public IP address is announced via BGP to the global routing table. If the BGP announcement is withdrawn (due to a misconfiguration, a fiber cut, or a BGP hijack), no client can reach the proxy, regardless of how well it is configured.

Large-scale reverse proxy deployments use anycast -- the same IP address is announced from multiple locations via BGP, and clients are routed to the topologically nearest instance. Cloudflare (AS13335) operates one of the largest anycast reverse proxy networks, with the same IP addresses announced from over 300 cities worldwide. When a client connects to a Cloudflare-proxied domain, BGP routing directs them to the nearest Cloudflare edge, where a reverse proxy terminates TLS, applies WAF rules, checks the cache, and if necessary, forwards the request to the origin server.

You can trace this entire path using BGP tools. The god.ad BGP Looking Glass lets you look up any IP address to see which AS announces it, what the AS path looks like, and how traffic routes across the internet to reach the reverse proxy that fronts the service. Understanding both the application layer (how the proxy handles the request) and the network layer (how BGP routes the packet to the proxy) gives you the complete picture of how internet services operate.

See BGP routing data in real time

Open Looking Glass
More Articles
What is DNS? The Internet's Phone Book
What is an IP Address?
IPv4 vs IPv6: What's the Difference?
What is a Network Prefix (CIDR)?
How Does Traceroute Work?
What is a CDN? Content Delivery Networks Explained