How HAProxy Works: High-Performance Load Balancing

HAProxy (High Availability Proxy) is the most widely deployed open-source load balancer and reverse proxy on the internet. It handles millions of concurrent connections for companies like GitHub, Reddit, Stack Overflow, Tumblr, and Airbnb. Unlike general-purpose web servers that bolt on proxying as an afterthought, HAProxy was designed from the ground up as a traffic management engine — its architecture reflects that singular focus. Understanding how HAProxy works at the connection, protocol, and configuration level is essential for anyone designing production infrastructure.

Architecture: The Event-Driven Engine

HAProxy runs as a single multi-threaded process (or optionally multiple processes in older versions) using an event-driven, non-blocking I/O model. Each thread runs its own event loop, processing connection events (accepts, reads, writes, timeouts) without blocking. This is fundamentally different from thread-per-connection models like Apache's prefork MPM — a single HAProxy process can handle hundreds of thousands of concurrent connections with minimal memory overhead.

The core abstraction is the session. When a client connects, HAProxy creates a session that tracks the full lifecycle: client-side connection, server-side connection (for L7), buffers, stick table entries, ACL evaluations, and logging metadata. Sessions flow through a pipeline of analyzers that inspect and transform traffic at each protocol layer.

Memory management is pool-based. HAProxy pre-allocates pools of fixed-size buffers and recycles them across connections. The default buffer size is 16KB, which means a single connection consumes roughly 32KB (one buffer for each direction). At 100,000 concurrent connections, that is about 3.2GB — well within the capacity of a modern server. Buffer sizes are tunable via tune.bufsize, but changing them affects all connections uniformly.

HAProxy supports seamless reloads: when you send SIGUSR2 or use the -sf flag, the master process spawns a new worker that inherits listening sockets via SO_REUSEPORT. The old worker continues serving existing connections until they drain or hit hard-stop-after. No connections are dropped during reload — this is critical for zero-downtime deployments.

L4 vs L7: Two Modes of Operation

HAProxy operates in two distinct modes, configured per-proxy section: mode tcp (Layer 4) and mode http (Layer 7). The mode determines what HAProxy can see, what it can do, and the performance characteristics of the proxy.

Mode TCP (Layer 4)

In TCP mode, HAProxy operates as a Layer 4 load balancer. It accepts TCP connections from clients and forwards them to backend servers without inspecting the application-layer payload. The proxy sees the 4-tuple (source IP, source port, destination IP, destination port) and can make routing decisions based on that, plus any information gleaned from the TCP handshake itself.

TCP mode is used for non-HTTP protocols: database connections (MySQL, PostgreSQL, Redis), mail servers (SMTP, IMAP), gRPC without HTTP inspection, custom binary protocols, and TLS passthrough where the backend must terminate TLS itself. In TCP mode, HAProxy can still inspect the TLS ClientHello to extract the SNI (Server Name Indication) field, enabling routing based on the requested hostname without decrypting the traffic.

frontend mysql_front
    mode tcp
    bind :3306
    default_backend mysql_servers

backend mysql_servers
    mode tcp
    balance roundrobin
    server db1 10.0.1.10:3306 check
    server db2 10.0.1.11:3306 check backup

Mode HTTP (Layer 7)

In HTTP mode, HAProxy fully parses the HTTP request and response. It terminates the client's TCP connection (and optionally TLS), reads the HTTP headers and body, applies rules, selects a backend, and opens a separate connection to the chosen server. This is a full reverse proxy architecture with two independent TCP connections.

HTTP mode enables content-aware routing: decisions based on URL path, Host header, query parameters, cookies, HTTP method, request body content, and custom headers. It also enables connection pooling, where HAProxy maintains persistent connections to backends and multiplexes many client requests over fewer server connections. With HTTP/2 support, HAProxy can accept HTTP/2 from clients while speaking HTTP/1.1 to backends (or vice versa), acting as a protocol translator.

Configuration Model: Frontend, Backend, Listen

HAProxy's configuration is organized around three core proxy sections:

frontend — Defines a listening socket and the rules for accepting client connections. A frontend binds to one or more IP:port pairs, applies ACLs and rate limits, and routes requests to backends via use_backend rules.
backend — Defines a pool of servers and the algorithm used to distribute requests among them. Backends contain server definitions, health check configuration, load balancing algorithm, and connection settings.
listen — A shorthand that combines a frontend and backend into a single section. Useful for simple configurations where you do not need separate frontend routing logic.

A frontend can route to multiple backends based on ACL conditions. This is where HAProxy's power as a traffic router becomes apparent — a single frontend listening on port 443 can serve dozens of different applications, each with its own backend pool, health checks, and balancing algorithm.

frontend https_in
    bind :443 ssl crt /etc/ssl/certs/combined.pem alpn h2,http/1.1
    mode http

    # ACL definitions
    acl is_api path_beg /api/
    acl is_grpc req.hdr(content-type) -m beg application/grpc
    acl is_websocket hdr(Upgrade) -i websocket
    acl rate_abuse sc0_http_req_rate gt 100

    # Rate limiting via stick table
    http-request track-sc0 src table per_ip_rates
    http-request deny deny_status 429 if rate_abuse

    # Routing rules (evaluated in order, first match wins)
    use_backend grpc_servers if is_grpc
    use_backend ws_servers if is_websocket
    use_backend api_servers if is_api
    default_backend web_servers

Load Balancing Algorithms

HAProxy implements several load balancing algorithms, each suited to different traffic patterns. The algorithm is configured per-backend with the balance directive.

Round Robin

balance roundrobin cycles through servers sequentially. It respects server weights: a server with weight 2 receives twice as many connections as one with weight 1. This is the default and works well when requests are roughly uniform in cost. HAProxy's roundrobin is dynamic — servers can be added, removed, or have their weights changed at runtime via the stats socket without restarting.

Least Connections

balance leastconn routes each new connection to the server with the fewest active connections. This is superior to round robin when request durations vary significantly (e.g., some API calls take 50ms and others take 5 seconds). It naturally avoids overloading slow servers. Weights are also respected: a server with weight 2 is considered "half full" compared to its raw connection count.

Source Hash

balance source hashes the client's source IP address to deterministically select a server. The same client IP always reaches the same backend server (assuming the server pool is stable). This provides a crude form of session persistence without cookies, but it breaks when clients share IP addresses (corporate NATs, mobile carriers) and causes uneven distribution when traffic is concentrated in a few source IPs.

URI Hash

balance uri hashes the request URI to route requests for the same resource to the same server. This is valuable when backends maintain per-URI caches — hashing ensures that cache hits are maximized by keeping the same URIs on the same servers. The whole parameter hashes the full URI including query string; without it, only the path is hashed.

Consistent Hashing

balance hash with hash-type consistent uses a consistent hashing ring. When a server is added or removed, only 1/N of the keys are remapped (where N is the number of servers). Without consistent hashing, adding a server remaps nearly all keys, which is catastrophic for cache-heavy workloads. The hash input can be a header, cookie, URL parameter, or arbitrary sample expression.

Random

balance random selects a server randomly, weighted by server weights. The random(N) variant picks N random servers and selects the one with fewest connections (power of N choices). random(2) gives near-optimal distribution with minimal overhead — it avoids the thundering-herd problem of leastconn where many connections simultaneously route to the same server.

ACLs: The Decision Engine

ACLs (Access Control Lists) are HAProxy's mechanism for making conditional decisions about traffic. An ACL defines a test that evaluates to true or false based on the current request, connection, or session state. ACLs can then be combined with boolean logic and referenced in routing, blocking, and header manipulation rules.

ACL Syntax and Fetch Functions

An ACL has three components: a name, a fetch function (what data to examine), and a match (what value to compare against). Fetch functions pull data from every layer of the connection:

Layer 3/4: src (client IP), dst (destination IP), src_port, dst_port
TLS: ssl_fc_sni (SNI hostname), ssl_fc_protocol (TLS version), ssl_c_s_dn (client cert subject DN for mTLS)
HTTP request: path, path_beg, path_end, hdr(name), method, url_param(name), req.body
HTTP response: status, res.hdr(name)
Stick tables: sc0_http_req_rate, sc0_conn_cur, sc0_gpc0 (general purpose counters)
Environment: nbsrv(backend) (number of active servers in a backend), avg_queue, connslots

Match functions support exact match, prefix (-m beg), suffix (-m end), substring (-m sub), regex (-m reg), and IP address/CIDR matching (-m ip). You can also load match values from files for large rule sets:

# Block known bad IPs loaded from a file
acl bad_actors src -f /etc/haproxy/blocklist.txt
http-request deny if bad_actors

# Route based on path prefix
acl is_api path_beg /api/ /v2/ /graphql
acl is_static path_end .css .js .png .jpg .woff2

# Combine ACLs with boolean logic
use_backend api_servers if is_api !is_static
use_backend cdn_origin if is_static

ACL Processing Order

HAProxy evaluates use_backend rules in the order they appear in the configuration. The first matching rule wins. If no rule matches, the default_backend is used. This sequential evaluation model is simple but powerful — complex routing logic is expressed as an ordered list of rules rather than a nested conditional tree.

For http-request and http-response rules, the processing order is: http-request rules execute before backend selection, and http-response rules execute after the backend responds. This allows you to modify requests before routing and responses before returning them to the client.

Stick Tables: Stateful Traffic Intelligence

Stick tables are HAProxy's in-memory key-value stores that track per-client or per-session state across connections. They are the foundation for rate limiting, abuse detection, session persistence, and connection tracking. Unlike external tools (Redis, memcached), stick tables are built into HAProxy's event loop with zero external dependencies and sub-microsecond lookup times.

Table Structure

A stick table is defined with a key type, a maximum size, and an expiry. Supported key types include ip, ipv6, integer, string, and binary. Each entry can store multiple counters and rates:

backend per_ip_rates
    stick-table type ip size 1m expire 10m \
        store http_req_rate(10s),conn_cur,conn_rate(10s),gpc0,gpc0_rate(60s),bytes_out_rate(60s)

This creates a table keyed by client IP address with room for 1 million entries, each expiring after 10 minutes of inactivity. Each entry tracks:

http_req_rate(10s) — HTTP requests per second over a 10-second sliding window
conn_cur — Current number of concurrent connections from this IP
conn_rate(10s) — New TCP connections per second
gpc0 / gpc0_rate — General purpose counter (used for custom logic like failed login tracking)
bytes_out_rate(60s) — Bandwidth consumption per second

Rate Limiting with Stick Tables

The combination of stick tables and ACLs creates a powerful rate limiting system. Here is a real-world example that implements tiered rate limiting:

frontend https_in
    bind :443 ssl crt /etc/ssl/cert.pem

    # Track client IP in stick table
    http-request track-sc0 src table per_ip_rates

    # Tiered rate limiting
    acl rate_soft sc0_http_req_rate(per_ip_rates) gt 50
    acl rate_hard sc0_http_req_rate(per_ip_rates) gt 200
    acl too_many_conns sc0_conn_cur(per_ip_rates) gt 30
    acl bandwidth_hog sc0_bytes_out_rate(per_ip_rates) gt 10000000

    # Hard block at 200 req/s
    http-request deny deny_status 429 if rate_hard
    # Tarpit (slow down) at 50 req/s
    http-request tarpit if rate_soft
    # Block connection floods
    tcp-request connection reject if too_many_conns

The tarpit action is notable: instead of immediately rejecting the request, HAProxy holds the connection open for a configurable period (default: the client timeout) before returning a 500 error. This ties up the attacker's resources without consuming significant HAProxy resources, since tarpitted connections are in a near-idle state.

Stick Table Replication

In multi-instance deployments, stick tables can be replicated between HAProxy peers. Each HAProxy instance pushes table updates to its configured peers, keeping rate counters and session data synchronized across the cluster. This is configured with a peers section:

peers mycluster
    peer haproxy1 10.0.1.1:1024
    peer haproxy2 10.0.1.2:1024

backend per_ip_rates
    stick-table type ip size 1m expire 10m \
        store http_req_rate(10s),conn_cur peers mycluster

Replication is asynchronous and eventually consistent — there is a small window where a client can exceed limits by splitting traffic across instances before counters sync. For most rate limiting use cases, this is acceptable.

Health Checks

HAProxy's health checking determines which backend servers are eligible to receive traffic. It supports multiple check types at different protocol layers, and the health check configuration directly affects failover behavior and responsiveness.

TCP Health Checks

The simplest check is a TCP connection attempt (check on the server line). HAProxy opens a TCP connection to the server; if the three-way handshake completes, the server is considered up. If the connection is refused or times out, it is marked down. This verifies that the process is listening but tells you nothing about application health.

HTTP Health Checks

An HTTP check sends a real HTTP request and inspects the response:

backend api_servers
    mode http
    balance leastconn
    option httpchk
    http-check send meth GET uri /health ver HTTP/1.1 hdr Host api.example.com
    http-check expect status 200

    server api1 10.0.1.10:8080 check inter 3s fall 3 rise 2
    server api2 10.0.1.11:8080 check inter 3s fall 3 rise 2
    server api3 10.0.1.12:8080 check inter 3s fall 3 rise 2

The inter 3s parameter sets the check interval to 3 seconds. fall 3 means the server is marked down after 3 consecutive failed checks (9 seconds total). rise 2 means it is marked back up after 2 consecutive successes (6 seconds). These parameters control the trade-off between failover speed and false positives.

Agent Health Checks

Agent checks connect to a separate port on the backend server where an agent process reports the server's health and desired weight. The agent returns a simple text string like up 75% (up, at 75% weight) or drain (stop sending new connections). This allows the application to dynamically adjust its own load: if a server is experiencing garbage collection pressure, it can reduce its weight temporarily without requiring a full restart.

Advanced Check Patterns

HAProxy supports multi-step health checks using http-check connect, http-check send, and http-check expect sequences. This lets you implement checks that verify database connectivity through the application: send a request to /health/deep, expect a 200 with body containing "db_ok", and only then consider the server healthy.

backend api_servers
    option httpchk
    http-check connect
    http-check send meth GET uri /health/ready ver HTTP/1.1 hdr Host internal
    http-check expect rstatus ^2
    http-check connect port 3306
    http-check expect rstring mysql

This checks both the HTTP application health and that the backend can reach its database — if either fails, the server is pulled from rotation.

Connection Handling and Timeouts

HAProxy's timeout model is one of the most critical aspects of its configuration and one of the most commonly misconfigured. Each timeout controls a different phase of the connection lifecycle, and setting them correctly is essential for both reliability and resource management.

Key Timeouts

timeout connect — How long HAProxy waits for a TCP connection to the backend server to establish. Typically 5–10 seconds. If the backend is on the same LAN, even 1 second is generous; if it is across the internet, allow more.
timeout client — The maximum inactivity time on the client side. If the client sends no data for this duration, HAProxy closes the connection. For web traffic, 30–60 seconds is typical. For WebSocket or long-polling, this must be much longer or handled with timeout tunnel.
timeout server — The maximum inactivity time on the server side. If the backend sends no data for this duration, HAProxy closes the server connection and returns a 504 to the client. This should match your application's expected maximum response time.
timeout http-request — The maximum time to receive the complete HTTP request headers. This defends against slowloris attacks where clients send headers very slowly to tie up connections. Set this to 5–10 seconds.
timeout http-keep-alive — How long to wait for a new request on a keep-alive connection. Shorter than timeout client — typically 5–10 seconds, since idle keep-alive connections consume memory but provide little value.
timeout queue — How long a request can wait in the backend queue (when all servers are at their maxconn limit) before HAProxy returns a 503. Without this, requests can queue indefinitely.
timeout tunnel — Applied after a connection is upgraded (WebSocket, CONNECT). Replaces both timeout client and timeout server for the tunneled connection. Set this to hours or days for WebSocket connections.

Connection Limits and Queuing

Each server can have a maxconn setting that limits concurrent connections. When all servers in a backend are at their limit, new requests enter a queue. The timeout queue determines how long they wait. This is crucial for protecting backends from being overwhelmed:

backend api_servers
    balance leastconn
    # Global queue timeout
    timeout queue 30s

    server api1 10.0.1.10:8080 check maxconn 100
    server api2 10.0.1.11:8080 check maxconn 100
    server api3 10.0.1.12:8080 check maxconn 100

If all three servers have 100 active connections and a new request arrives, it enters the queue. If a connection to any server finishes within 30 seconds, the queued request is dispatched. Otherwise, the client receives a 503. This back-pressure mechanism prevents cascading failures where an overloaded backend becomes slower, causing more connections to pile up, causing it to become even slower.

Connection Pooling

In HTTP mode, HAProxy maintains pools of idle connections to backend servers. When a new request needs to be forwarded, HAProxy first checks its pool for an existing connection to the selected server. This avoids the overhead of a new TCP handshake and TLS negotiation for every request. With HTTP/2 backends, HAProxy can multiplex many requests over a single backend connection, dramatically reducing the number of connections backends need to handle.

The http-reuse directive controls this behavior: http-reuse safe (default) reuses connections only for requests from the same client, http-reuse always reuses connections across different clients (higher efficiency, but must ensure backends are stateless), and http-reuse aggressive reuses connections even before the previous response is complete (HTTP pipelining).

TLS Termination and SNI Routing

HAProxy supports full TLS termination, including TLS 1.3, OCSP stapling, client certificate authentication (mTLS), and multi-certificate SNI-based routing. TLS termination at the load balancer is the most common pattern, as it offloads CPU-intensive cryptographic operations from backends and centralizes certificate management.

frontend https_in
    bind :443 ssl crt /etc/ssl/certs/ alpn h2,http/1.1
    bind :443 ssl crt /etc/ssl/certs/ v4v6

    # Force TLS 1.2+
    ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets

    # HSTS
    http-response set-header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload"

    # Client certificate authentication for admin endpoints
    acl is_admin path_beg /admin/
    acl has_client_cert ssl_c_used
    acl valid_cert ssl_c_verify 0
    http-request deny if is_admin !has_client_cert
    http-request deny if is_admin !valid_cert

When the crt parameter points to a directory, HAProxy loads all certificate files and automatically selects the correct certificate based on the SNI in the client's TLS ClientHello. This makes multi-domain hosting simple: drop a new certificate file in the directory and reload HAProxy.

For deployments where TLS must be terminated at the backend (regulatory requirements, end-to-end encryption mandates), HAProxy can operate in TLS passthrough mode using mode tcp with SNI-based routing:

frontend tls_passthrough
    mode tcp
    bind :443
    tcp-request inspect-delay 5s
    tcp-request content accept if { req_ssl_hello_type 1 }

    # Route based on SNI without decrypting
    use_backend app1_servers if { req_ssl_sni -i app1.example.com }
    use_backend app2_servers if { req_ssl_sni -i app2.example.com }
    default_backend default_tls_servers

HTTP/2 and gRPC Support

HAProxy supports HTTP/2 on both the client-facing (frontend) and server-facing (backend) sides. On the frontend, HTTP/2 multiplexing allows many concurrent requests over a single TCP connection, reducing connection overhead and improving page load times. On the backend, HTTP/2 connections can be shared across clients via connection pooling.

HAProxy can also act as a protocol bridge: accepting HTTP/2 from clients and forwarding as HTTP/1.1 to backends (common when backends do not support HTTP/2), or accepting HTTP/1.1 from clients and forwarding as HTTP/2 to backends (useful for gRPC backends).

gRPC is HTTP/2-based, so HAProxy's HTTP/2 support extends naturally to gRPC load balancing. This includes content-based routing on gRPC service and method names (which appear as HTTP/2 path), gRPC health checking (the gRPC health protocol), and proper handling of gRPC trailers. For gRPC-specific load balancing considerations, see how load balancers work — the key challenge is that gRPC uses long-lived HTTP/2 connections, so L4 balancing fails because all requests on a multiplexed connection go to the same backend.

Maps and Dynamic Configuration

HAProxy maps are external files that define key-value mappings loaded into memory at startup. They provide a way to manage large routing tables, hostname-to-backend mappings, and rate limit tiers without embedding everything in the configuration file:

# /etc/haproxy/domain_backends.map
api.example.com    api_servers
www.example.com    web_servers
admin.example.com  admin_servers
app.example.com    app_servers

frontend https_in
    bind :443 ssl crt /etc/ssl/certs/
    use_backend %[req.hdr(host),lower,map(/etc/haproxy/domain_backends.map,web_servers)]

Maps can be updated at runtime via the HAProxy stats socket (Unix socket or TCP) without reloading the process:

echo "set map /etc/haproxy/domain_backends.map new-app.example.com app_v2_servers" | socat stdio /var/run/haproxy/admin.sock

This enables dynamic reconfiguration for blue-green deployments, canary releases, and automated scaling: an orchestrator can add or remove server entries and update routing maps in real time.

The Stats Socket and Runtime API

HAProxy exposes a runtime API via a Unix socket or TCP port. This API allows you to inspect and modify nearly every aspect of HAProxy's state without reloading:

show stat — Dump current statistics for all frontends, backends, and servers (request counts, error rates, queue depths, response times)
show info — Display process-level information (uptime, memory usage, connection counts, SSL rate)
disable server / enable server — Take a server in or out of rotation for maintenance
set server weight — Dynamically change a server's weight for gradual traffic shifting
set server addr — Change a server's IP address (useful with service discovery)
show table — Inspect stick table contents, seeing per-IP rate counters and session data
clear table — Remove entries from stick tables (e.g., unblock a rate-limited IP)
show sess — List active sessions with detailed state information

The stats socket is also how monitoring systems like Prometheus, Datadog, and Grafana collect metrics from HAProxy. The built-in Prometheus endpoint (http-request use-service prometheus-exporter if { path /metrics }) exposes all metrics in Prometheus format directly, without needing a separate exporter.

Session Persistence (Sticky Sessions)

Session persistence ensures that requests from the same client are routed to the same backend server. This is necessary when backends maintain local state (in-memory sessions, local caches, WebSocket connections). HAProxy supports several persistence mechanisms:

Cookie-Based Persistence

HAProxy can insert a cookie that identifies the backend server. On subsequent requests, HAProxy reads the cookie and routes the request to the same server:

backend web_servers
    balance roundrobin
    cookie SERVERID insert indirect nocache httponly secure
    server web1 10.0.1.10:8080 check cookie w1
    server web2 10.0.1.11:8080 check cookie w2
    server web3 10.0.1.12:8080 check cookie w3

The insert parameter tells HAProxy to add the cookie. indirect means the cookie is removed before forwarding to the backend (the backend never sees it). nocache adds Cache-Control: no-cache to prevent intermediaries from caching the response with the cookie. httponly and secure set the corresponding cookie flags for security.

If a server goes down, HAProxy automatically re-routes the client to another server. The old cookie value becomes invalid, and HAProxy inserts a new one. This means session data on the failed server is lost — which is why stateless architectures with external session stores (Redis, database) are preferred for high-availability deployments.

Stick Table Persistence

For protocols that do not support cookies (TCP mode, non-browser HTTP clients), stick tables can provide persistence based on source IP or other connection attributes:

backend tcp_servers
    mode tcp
    balance roundrobin
    stick-table type ip size 100k expire 30m
    stick on src
    server srv1 10.0.1.10:5432 check
    server srv2 10.0.1.11:5432 check

The stick on src directive records which server was selected for each client IP. Subsequent connections from the same IP are routed to the same server. This is commonly used for database connection routing where the client library does not support cookies.

Logging and Observability

HAProxy's logging is structured and comprehensive. Each log line captures the full lifecycle of a request with timing information at every phase. The default HTTP log format includes:

%ci:%cp [%tr] %ft %b/%s %TR/%Tw/%Tc/%Tr/%Ta %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r

This cryptic-looking format encodes:

%TR — Time to receive the full HTTP request (detects slow clients)
%Tw — Time waiting in the queue (detects backend saturation)
%Tc — Time to connect to the backend (detects network issues)
%Tr — Backend response time (detects slow applications)
%Ta — Total active time for the request
%tsc — Termination state (who closed the connection and why — client abort, server timeout, HAProxy limit)
%ac/%fc/%bc/%sc/%rc — Connection counts at each layer when the log line was generated

The termination state codes are particularly valuable for debugging. CD means the client disconnected before the server responded. sH means the server sent an invalid HTTP response. cD means the connection was in the data phase and the client timed out. These codes immediately tell you where in the pipeline a problem occurred.

HAProxy vs NGINX vs Envoy

HAProxy, NGINX, and Envoy are the three dominant reverse proxies in production use. Each has distinct strengths:

HAProxy excels at pure load balancing and traffic management. Its stick tables, advanced health checking, and connection management are unmatched. Configuration is declarative and static (with runtime modifications via the stats socket). It is the go-to choice for L4/L7 load balancing where reliability and performance are paramount.
NGINX excels as a web server that also does reverse proxying. It handles static file serving, FastCGI, and content caching better than HAProxy. It is often used as the outer edge (TLS termination, static assets, rate limiting) with HAProxy or Envoy behind it for backend routing.
Envoy excels in service mesh and cloud-native environments. Its xDS API enables fully dynamic configuration from control planes (Istio, Consul Connect). It has native support for gRPC, circuit breaking, distributed tracing (Zipkin, Jaeger), and automatic retries. It is the default data plane proxy in most service mesh implementations.

In many production deployments, these tools coexist at different layers. HAProxy or NGINX at the edge handling external traffic, Envoy as a sidecar handling inter-service communication within a Kubernetes cluster. The choice is not either/or — it is about using each tool where its strengths are most valuable.

Production Patterns

Blue-Green and Canary Deployments

HAProxy supports gradual traffic shifting through server weights. A canary deployment starts by adding the new version as a server with weight 1 while existing servers have weight 100. Traffic to the canary is approximately 1%. If metrics look good, the weight is increased via the stats socket — no reload needed. For blue-green, you swap the backend entirely using a map file update or the set server addr command.

Circuit Breaking

While HAProxy does not have Envoy-style circuit breakers as a first-class feature, the maxconn per-server, maxqueue, and timeout queue parameters achieve similar results. When a server becomes slow, its connection count rises to maxconn, new requests queue up, and if the queue exceeds maxqueue or timeout queue expires, requests are failed fast with a 503. The server's health checks will eventually mark it down if the problem persists.

Graceful Server Drain

Before taking a server offline for maintenance, you can drain it via the stats socket: set server backend/server state drain. In drain state, no new connections are routed to the server, but existing connections are allowed to complete. Monitor show stat until the server's scur (current sessions) reaches zero, then safely shut it down.

Performance Characteristics

HAProxy's performance is well-documented with published benchmarks. On modern hardware (64-core, 256GB RAM), HAProxy 2.x can handle:

2 million concurrent connections in TCP mode
400,000+ HTTP requests per second per core in L7 mode
40 Gbps+ throughput with large transfers
Sub-millisecond latency overhead for L7 proxying under typical loads

The key to HAProxy's performance is its event-driven architecture: no thread-per-connection overhead, no context switching for I/O, and pool-based memory allocation that avoids fragmentation. Multi-threading was added in HAProxy 1.8 and refined through 2.x, with each thread running an independent event loop on its own CPU core. The threads share the process's configuration and stick tables but maintain independent connection state.

For TLS termination, HAProxy uses OpenSSL and can offload to hardware accelerators (AES-NI, QAT) when available. TLS resumption via session tickets or session IDs reduces the cost of repeat connections. With TLS 1.3's reduced handshake, HAProxy can sustain 50,000+ new TLS connections per second per core on modern CPUs.

HAProxy and BGP: Where Load Balancing Meets Routing

Load balancers and BGP routing are complementary layers of traffic management. At large scale, HAProxy instances are themselves distributed across multiple data centers and points of presence, and BGP is the mechanism that directs client traffic to the nearest or healthiest HAProxy instance.

ECMP and BGP Anycast

In a typical deployment, multiple HAProxy instances behind a Top-of-Rack (ToR) switch announce the same Virtual IP (VIP) via BGP. The upstream router uses Equal-Cost Multi-Path (ECMP) routing to distribute flows across all announcing instances. If an HAProxy instance fails, it stops announcing the VIP, the router's BGP session drops, and traffic automatically shifts to remaining instances. This is the same anycast pattern used by Cloudflare (AS13335) and Google (AS15169) for their global infrastructure.

The same pattern extends to multi-site deployments. HAProxy instances in each data center announce the service VIP via BGP to their upstream providers. BGP's path selection algorithm routes each client to the topologically closest data center. When a site goes down, its BGP announcements are withdrawn and traffic converges on remaining sites within seconds.

Health-Aware BGP Announcements

Sophisticated deployments tie HAProxy's backend health to BGP announcements. If all backends in a critical backend pool go down, a health-check script withdraws the BGP route for the VIP. This ensures that traffic is not directed to an HAProxy instance that has no healthy backends to serve it. Tools like ExaBGP, BIRD, or FRRouting can be scripted to announce or withdraw routes based on HAProxy's health check status, queried via the stats socket.

Observing HAProxy Infrastructure via BGP

Many of the services you use daily are fronted by HAProxy instances discoverable through BGP. You can look up the routing infrastructure behind them:

AS36459 (GitHub) — GitHub uses HAProxy extensively for load balancing across its infrastructure
AS54113 (Fastly) — CDN edge that uses HAProxy-derived technology
AS14618 (Amazon) — AWS ELB Classic was originally based on HAProxy
AS32934 (Facebook/Meta) — Uses a custom HAProxy fork (Proxygen) for edge load balancing

Look up any IP address or ASN to see the BGP routes and AS paths behind the HAProxy-fronted services you depend on. Understanding both the load balancing layer (HAProxy) and the routing layer (BGP) gives you the full picture of how internet traffic reaches its destination.