How HTTP/2 Works
The Hypertext Transfer Protocol version 2 — HTTP/2 — is a major revision of the protocol that powers the web. Standardized in 2015 as RFC 7540, HTTP/2 addressed deep structural problems in HTTP/1.1 that had constrained web performance for over fifteen years. Rather than changing HTTP's semantics (the familiar methods, headers, and status codes remained the same), HTTP/2 replaced the text-based wire format with a binary framing layer, introduced multiplexed streams over a single TCP connection, added header compression, and offered server push. These changes fundamentally altered how browsers and servers communicate, producing measurable improvements in page load times across the web.
The Problems with HTTP/1.1
HTTP/1.1, defined in RFC 2616 (1999) and later refined in RFC 7230 (2014), is a text-based request-response protocol. A client sends a request, waits for the complete response, and only then can send the next request on the same connection. This creates a fundamental bottleneck: head-of-line (HOL) blocking.
Imagine a browser needs to load a page with a large CSS file, twenty images, and several JavaScript bundles. With HTTP/1.1, it opens a connection to the server and requests the CSS file. Until that entire response is received, no other request can be sent on that connection. If the CSS file is large or the server takes time generating it, every other resource is stuck waiting in line.
HTTP/1.1 did introduce pipelining as an optimization: a client could send multiple requests without waiting for each response. But responses still had to arrive in the exact order they were requested (FIFO). If the first response was slow, all subsequent responses were blocked behind it — the head-of-line blocking problem simply moved from requests to responses. In practice, pipelining was so buggy and poorly supported by intermediaries that most browsers never enabled it.
To work around this limitation, browsers adopted a crude workaround: they opened multiple parallel TCP connections to the same server. Most browsers cap this at 6 connections per origin. This means a browser loading resources from example.com would maintain up to 6 separate TCP connections, each with its own TLS handshake, TCP slow start phase, and congestion window. This is wasteful — it increases server load, competes with other connections for bandwidth rather than sharing it cooperatively, and doesn't fundamentally solve the problem.
Web developers responded with increasingly elaborate workarounds:
- Domain sharding — Serving assets from
img1.example.com,img2.example.com, etc. to circumvent the per-origin connection limit. This added DNS lookups and TLS handshakes for each shard. - Spriting — Combining dozens of small images into a single large sprite sheet to reduce the number of requests, even though the browser might only need a few of the images on any given page.
- Inlining — Embedding CSS and JavaScript directly into HTML to avoid extra requests, defeating caching in the process.
- Concatenation — Merging all JavaScript or CSS files into a single bundle, even when only a small part was needed for a given page.
These hacks traded one set of problems for another. HTTP/2 was designed to make them unnecessary.
The Binary Framing Layer
The most fundamental change in HTTP/2 is the replacement of HTTP/1.1's text-based protocol with a binary framing layer. In HTTP/1.1, messages are plain ASCII text — you can literally read a request in a packet capture: GET / HTTP/1.1\r\nHost: example.com\r\n\r\n. Parsing this text is slow, error-prone, and ambiguous (how much whitespace is allowed? which header-folding rules apply?).
HTTP/2 encodes everything into binary frames — compact, fixed-format structures that are efficient to parse and unambiguous in meaning. A frame is the smallest unit of communication in HTTP/2. Each frame has a fixed 9-byte header followed by a variable-length payload:
The key fields in the frame header are:
- Length (24 bits) — The size of the frame payload in bytes. The maximum payload size defaults to 16,384 bytes (16 KB), but can be increased up to 16,777,215 bytes (16 MB) via the
SETTINGS_MAX_FRAME_SIZEparameter. - Type (8 bits) — Identifies the frame type. There are ten defined types:
DATA,HEADERS,PRIORITY,RST_STREAM,SETTINGS,PUSH_PROMISE,PING,GOAWAY,WINDOW_UPDATE, andCONTINUATION. Unknown types are ignored, allowing future extensibility. - Flags (8 bits) — Type-specific boolean flags. For example,
END_STREAM(0x1) on a DATA or HEADERS frame signals that this is the last frame the sender will send on this stream.END_HEADERS(0x4) on a HEADERS frame indicates the header block is complete. - Stream Identifier (31 bits) — Identifies which stream this frame belongs to. Stream 0 is reserved for connection-level frames (SETTINGS, PING, GOAWAY). Client-initiated streams use odd numbers (1, 3, 5, ...), and server-initiated streams (for push) use even numbers (2, 4, 6, ...).
The binary format has several advantages beyond parsing speed. It enables efficient multiplexing (since frame boundaries are unambiguous), makes the protocol more compact, and eliminates many classes of parsing vulnerabilities that plagued HTTP/1.1 implementations.
Multiplexed Streams
Multiplexing is HTTP/2's signature feature. Within a single TCP connection, HTTP/2 can carry any number of streams simultaneously. A stream is an independent, bidirectional sequence of frames exchanged between the client and server. Each HTTP request-response pair occupies its own stream.
Streams are lightweight and cheap to create — they don't require a separate handshake. A client initiates a new stream simply by sending a HEADERS frame with an unused odd-numbered stream ID. The server responds on the same stream ID. Hundreds of concurrent streams can be active at once (servers typically advertise a limit via SETTINGS_MAX_CONCURRENT_STREAMS, with common values ranging from 100 to 256).
The key difference from HTTP/1.1 pipelining is that streams are fully interleaved. The server can send frames from different streams in any order and can intersperse them freely. If the response to stream 1 is slow, the server can continue sending data on streams 3, 5, and 7 without waiting. If stream 9 encounters an error, it can be reset with a RST_STREAM frame without affecting any other stream — something impossible in HTTP/1.1, where an error on a connection ruins everything in flight.
A stream goes through a well-defined lifecycle with the following states:
- idle — The initial state before any frames are sent
- open — Both sides can send frames (after HEADERS is sent/received)
- half-closed (local) — This side sent END_STREAM; it can receive but not send
- half-closed (remote) — The other side sent END_STREAM; this side can send but not receive
- closed — Both sides are done; no more frames on this stream
- reserved (local/remote) — Reserved via PUSH_PROMISE for server push
This multiplexing eliminates the need for domain sharding, spriting, and the other HTTP/1.1 workarounds. A single connection is sufficient to load an entire page's worth of resources concurrently.
Header Compression: HPACK
HTTP headers are repetitive. Every request to the same server carries similar Host, User-Agent, Accept, Cookie, and other headers. In HTTP/1.1, these are sent as uncompressed text on every request. A typical set of headers can be 800 bytes to several kilobytes (especially with large cookies), and they are repeated identically across dozens of requests. On a page load with 80 requests, that is 64 KB or more of redundant header data.
HTTP/2 introduced HPACK (RFC 7541), a compression format designed specifically for HTTP headers. HPACK uses three complementary techniques:
1. Static Table. HPACK defines a table of 61 commonly used header field name-value pairs. For example, entry 2 is :method: GET, entry 4 is :path: /, and entry 17 is accept-encoding: gzip, deflate. If a header matches a static table entry, the entire header can be encoded as a single byte — the table index.
2. Dynamic Table. Each connection maintains a dynamic table that starts empty and accumulates headers as they are sent. When the encoder sends a new header, it can add it to the dynamic table. Subsequent references to the same header need only transmit the table index, not the full value. The dynamic table has a maximum size (default 4,096 bytes, configurable via SETTINGS_HEADER_TABLE_SIZE) and uses FIFO eviction when full.
3. Huffman Encoding. Header values that aren't in either table can be Huffman-coded using a static Huffman table optimized for HTTP header characters. This typically reduces the size of literal header values by 20-30%.
The result is dramatic compression. The first request on a connection sees modest savings (30-50%), but subsequent requests — which share most headers — can be compressed by 85-95%. HPACK was carefully designed to be resistant to CRIME-style compression oracle attacks. Unlike general-purpose compressors (gzip, deflate), HPACK never compresses across different header fields, making it safe to use even with sensitive data like cookies.
Both the encoder and decoder must maintain identical copies of the dynamic table. If they get out of sync (for example, due to dropped frames), the connection becomes unusable and must be torn down with a GOAWAY frame. This is one reason why HTTP/2 headers must be received in order on each stream, even though data frames can be reordered.
Stream Prioritization and Dependency Trees
Not all resources are equally important. The CSS that controls page layout should load before decorative images. The JavaScript that makes a page interactive should load before analytics tracking scripts. HTTP/2 provides a mechanism for clients to express these priorities to the server.
Each stream can declare a dependency on another stream and a weight (1-256). Dependencies form a tree where child streams should only receive resources after their parent streams are satisfied. Among siblings with the same parent, resources are allocated proportionally to their weights.
For example, a browser might structure priorities like this:
- Stream 1 (HTML document) — root, highest priority
- Stream 3 (CSS) — depends on stream 1, weight 256
- Stream 5 (JavaScript) — depends on stream 1, weight 220
- Stream 7 (hero image) — depends on stream 3, weight 128
- Stream 9, 11, 13 (other images) — depend on stream 5, weight 32 each
This tells the server: finish sending CSS and JavaScript first, then the hero image, then split remaining bandwidth among the other images. The server is not required to follow these priorities (they are advisory), but a well-implemented server can use them to significantly improve perceived page load time by delivering render-critical resources first.
In practice, stream prioritization has been one of HTTP/2's most inconsistently implemented features. Different browsers build radically different priority trees, and many server implementations ignore priorities entirely or implement them poorly. Chrome, Firefox, and Safari all used different prioritization strategies, making it hard for server developers to optimize. This led the IETF to develop a simpler priority scheme — Extensible Priority (RFC 9218) — which HTTP/3 adopted.
Server Push
Server push allows a server to send responses before the client has even requested them. When a server knows that a client requesting /index.html will also need /style.css and /app.js, it can proactively push those resources alongside the HTML response.
The mechanism works through PUSH_PROMISE frames. The server sends a PUSH_PROMISE on the stream of the original request (e.g., the HTML page), containing the headers that the pushed response would have been requested with. This reserves a new even-numbered stream for the pushed resource. The server then sends the pushed response on that reserved stream.
The client can reject a push by sending a RST_STREAM frame on the promised stream. This is important when the client already has the resource cached — without this escape valve, push would waste bandwidth resending resources the client already has.
Despite its theoretical appeal, server push has been widely regarded as a disappointment. The problems are numerous:
- Cache awareness — The server has no way to know what the client already has cached. It might push resources the browser already has, wasting bandwidth.
- Timing — By the time the server decides what to push, the client may have already sent requests for those resources, creating redundant transfers.
- Complexity — Implementing push correctly in servers and CDNs proved difficult. Getting the timing, priority, and resource selection right requires deep integration with application logic.
- Browser support — Chrome removed server push support in version 106 (2022), effectively killing the feature for most of the web.
The 103 Early Hints status code (RFC 8297) has largely replaced server push. Early Hints lets the server send a preliminary response with Link headers that tell the browser to start fetching critical resources, without the complexity of push.
Flow Control
HTTP/2 implements flow control at two levels: per-stream and per-connection. Flow control prevents a fast sender from overwhelming a slow receiver. It also allows a receiver to allocate bandwidth between multiple streams — for example, pausing a low-priority stream to let a high-priority one complete.
Flow control in HTTP/2 works through a credit-based system using WINDOW_UPDATE frames:
- Each stream and the overall connection has a flow control window, initially set to 65,535 bytes (configurable via
SETTINGS_INITIAL_WINDOW_SIZE). - When a sender transmits DATA frames, the window shrinks by the amount of data sent.
- When the window reaches 0, the sender must stop sending DATA frames on that stream (or connection) until the receiver sends a WINDOW_UPDATE frame to grant more credit.
- The receiver sends WINDOW_UPDATE frames to indicate it has consumed data and is ready for more.
- HEADERS, SETTINGS, and other control frames are not subject to flow control — only DATA frames are.
This design lets a receiver independently throttle each stream. A media player streaming two video qualities could pause the lower-quality stream while continuing the higher-quality one, without affecting other streams on the same connection.
The initial window size of 65,535 bytes is often too small for high-bandwidth connections. Servers and clients commonly increase it to several megabytes at connection startup using SETTINGS frames. Getting flow control tuning right is critical for performance — an undersized window causes the sender to stall, while an oversized window defeats the purpose of flow control.
Connection Establishment and ALPN
HTTP/2 requires TLS in practice (all major browsers require it, though the spec allows cleartext). During the TLS handshake, the client and server use Application-Layer Protocol Negotiation (ALPN) to agree on using HTTP/2. ALPN is a TLS extension (RFC 7301) that lets the client send a list of supported protocols and the server picks one.
The ALPN negotiation works as follows:
- The client sends a
ClientHellomessage with an ALPN extension listing supported protocols:["h2", "http/1.1"] - The server checks whether it supports HTTP/2 and, if so, responds with
"h2"in itsServerHello - Both sides now know to speak HTTP/2 on this connection
This negotiation happens during the TLS handshake itself — no extra round trips are needed. With TLS 1.3, the entire connection setup (TCP handshake + TLS handshake + ALPN) completes in just two round trips (1 RTT for TCP SYN/SYN-ACK, 1 RTT for TLS).
Once TLS is established, the client sends a connection preface: the 24-byte magic string PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n followed by a SETTINGS frame. This magic string was chosen because it would cause any HTTP/1.1 server that accidentally received it to return an error, providing a clean detection mechanism. The server responds with its own SETTINGS frame, and both sides acknowledge each other's settings with SETTINGS frames carrying the ACK flag. After this exchange, the connection is fully operational.
HTTP/2 over Cleartext (h2c)
While browsers only support HTTP/2 over TLS (h2), the protocol also defines a cleartext variant called h2c. This is used primarily for server-to-server communication within trusted networks — for example, between a reverse proxy and a backend application server, or in gRPC services on internal networks.
There are two ways to initiate h2c:
HTTP Upgrade mechanism. The client sends an HTTP/1.1 request with an Upgrade: h2c header and an HTTP2-Settings header containing base64-encoded HTTP/2 settings. If the server supports h2c, it responds with 101 Switching Protocols, and both sides switch to HTTP/2. This adds a round trip but works through HTTP/1.1-aware proxies.
Prior knowledge. If the client knows in advance that the server supports h2c (through configuration, not discovery), it can send the HTTP/2 connection preface directly on a bare TCP connection. This is the most common approach for gRPC over cleartext and avoids any upgrade overhead. The --http2-prior-knowledge flag in curl uses this method.
h2c does not provide encryption, so it should never be used over untrusted networks. In production, it is typically used only when TLS termination happens at a load balancer or reverse proxy, and the backend communication occurs over a private network.
Performance Characteristics
HTTP/2 delivers its most significant performance improvements in scenarios with high latency and many small resources — exactly the profile of a typical web page. The gains come from several compounding effects:
Reduced connection overhead. Instead of 6 parallel TCP connections (each requiring a TCP handshake and TLS handshake), HTTP/2 uses a single connection. On a 100ms RTT link, this saves 500ms of handshake time for the 5 additional connections that HTTP/1.1 would open. The single connection also means one TLS session, one congestion window, and one set of keep-alive timers.
Better congestion control. A single TCP connection develops one congestion window that accurately reflects the available bandwidth. With 6 parallel connections competing for bandwidth, each has a smaller, less accurate congestion window, and together they are more aggressive than a single connection would be — effectively cheating TCP's congestion control fairness.
Header compression savings. On a page with 80 requests sharing similar headers, HPACK can save 50-100 KB of header data. This matters on mobile networks where every byte counts.
Elimination of HOL blocking. At the HTTP layer, multiplexing means no resource blocks another. A slow API call on stream 3 does not prevent images on streams 5 through 19 from loading. This is the most visible performance improvement — pages with a mix of fast and slow resources load substantially faster.
However, HTTP/2 is not universally faster. In some scenarios, it can actually be slower than HTTP/1.1:
- Very low latency connections — When latency is already low (e.g., on a LAN), the overhead savings from fewer connections are minimal, and multiplexing provides little benefit since requests complete quickly anyway.
- Single large downloads — Downloading a single large file sees no benefit from multiplexing and may even suffer from the overhead of framing.
- Packet loss — This is HTTP/2's Achilles' heel, discussed in the next section.
TCP Head-of-Line Blocking: HTTP/2's Fundamental Limitation
HTTP/2 solved head-of-line blocking at the HTTP layer through multiplexing, but it introduced a new, arguably worse form of HOL blocking at the TCP layer. This single limitation motivated the development of HTTP/3 and QUIC.
TCP is a reliable, in-order byte stream. When a TCP segment is lost, the operating system's TCP stack buffers all subsequently received segments until the lost one is retransmitted and arrives. This is by design — TCP guarantees in-order delivery to the application. But for HTTP/2, this behavior is catastrophic.
Consider an HTTP/2 connection carrying 10 multiplexed streams. Frames from all 10 streams are interleaved in the TCP byte stream. If a single TCP segment is lost — containing perhaps a fragment of stream 3's DATA frame — all 10 streams are blocked. Streams 1, 2, 4 through 10 have complete, ready-to-process data sitting in the kernel's receive buffer, but the application cannot read any of it until stream 3's lost segment is retransmitted and received. This retransmission takes at minimum one RTT.
On a clean, low-loss connection (like a wired network), this rarely matters. But on a lossy connection with 1-2% packet loss (typical for mobile networks), HTTP/2 can be slower than HTTP/1.1. With 6 separate HTTP/1.1 connections, a lost packet only blocks 1 of the 6 connections — the other 5 continue unaffected. With HTTP/2's single connection, one lost packet blocks everything.
This is not a flaw in HTTP/2's design per se — it is an inherent limitation of building a multiplexed protocol on top of TCP. The only solution is to replace TCP itself, which is exactly what QUIC does. QUIC runs over UDP and implements its own reliable delivery with per-stream ordering. A lost QUIC packet only blocks the stream it belongs to; other streams continue unimpeded. This was the primary motivation for HTTP/3.
HTTP/2 in Practice
HTTP/2 adoption has been widespread. As of 2025, over 60% of all websites support HTTP/2, and all major browsers, web servers, and CDNs support it. The protocol has been particularly impactful for:
- API traffic — Multiplexing benefits applications making many concurrent API calls (e.g., GraphQL over HTTP/2, microservice communication).
- gRPC — Google's RPC framework is built on HTTP/2, using its multiplexing for concurrent RPCs and its framing for message serialization.
- Server-sent events — Long-lived streams for real-time updates work naturally within HTTP/2's multiplexing model.
- CDNs — Content delivery networks were early adopters, as HTTP/2 reduces connection overhead between edge servers and origin servers.
Web developers can largely treat HTTP/2 as a transparent upgrade — existing applications work without modification. However, some HTTP/1.1-era optimizations should be reconsidered:
- Stop domain sharding. It is counterproductive with HTTP/2 because it prevents multiplexing and forces multiple connections.
- Stop spriting and concatenation. Granular resources are better with HTTP/2 because they can be individually cached and loaded on demand.
- Use HTTP/2 connection coalescing. If multiple hostnames resolve to the same IP and share a TLS certificate, the browser can reuse a single HTTP/2 connection for all of them.
From HTTP/2 to HTTP/3
HTTP/2 was a massive step forward from HTTP/1.1, but the TCP head-of-line blocking problem and the inability to migrate connections between networks (e.g., switching from Wi-Fi to cellular) motivated the next evolution. HTTP/3 replaces TCP with the QUIC transport protocol, which runs over UDP and provides per-stream reliable delivery, built-in TLS 1.3 encryption, and connection migration. HTTP/3 also replaces HPACK with QPACK, a header compression scheme adapted for QUIC's out-of-order delivery model.
HTTP/2 remains the workhorse of the web, and understanding its framing layer, multiplexing model, and flow control mechanisms provides the foundation for understanding both gRPC (which builds directly on HTTP/2) and HTTP/3 (which adopted HTTP/2's concepts while solving its transport-layer limitations). The protocol's design decisions — binary framing, stream multiplexing, header compression, and advisory priorities — represent the state of the art in application-layer protocol design, even as the transport layer beneath it continues to evolve.