How WebSockets Work

WebSockets provide persistent, full-duplex communication channels between a client and a server over a single TCP connection. Unlike the request-response model of HTTP, where the client must initiate every exchange, a WebSocket connection allows either side to send data at any time. This makes WebSockets the foundation for real-time web applications — chat systems, live dashboards, collaborative editors, multiplayer games, and streaming feeds like the RIS Live BGP feed that powers this site.

The WebSocket protocol (RFC 6455, standardized in 2011) was designed to solve a specific problem: HTTP was never meant for real-time, bidirectional communication. Before WebSockets, developers resorted to hacks like long polling and hidden iframes to simulate push communication. WebSockets replaced all of that with a clean, standardized protocol that begins life as an HTTP request and then upgrades into something entirely different.

The HTTP Upgrade Handshake

Every WebSocket connection starts as a normal HTTP/1.1 request. The client sends a special Upgrade request asking the server to switch protocols from HTTP to WebSocket. This reuse of port 80 (or 443 for TLS) was a deliberate design choice — it allows WebSocket traffic to traverse firewalls and proxies that only permit HTTP.

The client's opening handshake looks like this:

GET /stream HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Origin: https://example.com

The critical headers are Upgrade: websocket and Connection: Upgrade, which signal the protocol switch. Sec-WebSocket-Version: 13 identifies the protocol version (13 is the only version in use today). The Sec-WebSocket-Key is a base64-encoded 16-byte random nonce that prevents caching proxies from replaying old WebSocket handshakes.

Sec-WebSocket-Key and Sec-WebSocket-Accept

The Sec-WebSocket-Key / Sec-WebSocket-Accept exchange is not about security or authentication. It serves a narrow purpose: proving that the server understands the WebSocket protocol and is not a naive HTTP server accidentally accepting an upgrade request.

The server takes the client's key, concatenates it with the magic GUID 258EAFA5-E914-47DA-95CA-C5AB0DC85B11, computes the SHA-1 hash, and returns the base64-encoded result as Sec-WebSocket-Accept. The client verifies this value. If a caching proxy were to replay an old HTTP response for the upgrade request, the accept key would not match, and the client would reject the connection.

The server's successful response:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

After this handshake, the HTTP connection is done. The underlying TCP socket is now a WebSocket connection, and both sides communicate using the WebSocket frame format.

The WebSocket Frame Format

WebSocket data is transmitted in frames, not as a raw byte stream. Every message sent over a WebSocket — whether text, binary, or a control signal — is wrapped in a frame with a structured header. Understanding this format is essential for debugging WebSocket issues and implementing the protocol.

FIN Bit and Opcode

The first bit of a frame is the FIN flag. When set to 1, it indicates this is the final fragment of a message. When 0, more fragments follow. The opcode occupies the next 4 bits and defines the type of frame:

0x0 — Continuation frame (part of a fragmented message)
0x1 — Text frame (payload is UTF-8 text)
0x2 — Binary frame (payload is arbitrary bytes)
0x8 — Close frame (initiates connection shutdown)
0x9 — Ping frame (keepalive check)
0xA — Pong frame (response to ping)

Payload Length Encoding

The payload length uses a variable-width encoding. If the 7-bit length field contains 0-125, that is the actual payload length. If it is 126, the following 2 bytes contain the actual length as a 16-bit unsigned integer (up to 65,535 bytes). If it is 127, the following 8 bytes contain the length as a 64-bit unsigned integer. This scheme keeps the frame header compact for small messages while supporting payloads up to 2^63 bytes.

Masking

The MASK bit and masking key are among the most misunderstood parts of the protocol. All frames sent from client to server must be masked. Frames from server to client must not be masked. The 32-bit masking key is a random value, and each byte of the payload is XORed with a byte from the key in a cycling pattern: payload[i] ^= mask[i % 4].

Masking is not encryption — the key is sent in the clear right before the payload. Its purpose is to prevent a specific attack against infrastructure: without masking, a malicious JavaScript application could craft WebSocket frames whose byte patterns resemble valid HTTP requests. A transparent proxy that does not understand WebSocket could interpret these bytes as HTTP, leading to cache poisoning. The random masking key ensures that the on-the-wire bytes of a WebSocket payload are unpredictable to the sender, making this attack infeasible.

Text vs Binary Frames

WebSocket distinguishes between text frames (opcode 0x1) and binary frames (opcode 0x2). Text frames must contain valid UTF-8. If a text frame contains invalid UTF-8, the receiving side must close the connection with status code 1007 (Invalid frame payload data).

Binary frames have no such constraint and can carry any byte sequence. The choice between text and binary is a contract between client and server. JSON-based protocols like RIS Live (which streams BGP updates to this site in real time) use text frames. Protocols like Protocol Buffers or MessagePack use binary frames for more compact encoding.

Fragmentation

A single logical message can be split across multiple frames using fragmentation. The first fragment uses the message's opcode (0x1 or 0x2) with FIN=0. Subsequent fragments use opcode 0x0 (continuation) with FIN=0. The last fragment uses opcode 0x0 with FIN=1.

Fragmentation exists so that intermediaries can forward frames without buffering the entire message. A server streaming a large binary payload can send it in chunks, and each chunk can be forwarded by a proxy as soon as it arrives. Control frames (ping, pong, close) can be interleaved between fragments of a data message — they are never fragmented themselves.

Control Frames: Ping, Pong, and Close

Control frames manage the lifecycle and health of the connection. They have opcodes 0x8 through 0xA, must have a payload of 125 bytes or fewer, and must not be fragmented.

Ping and Pong

Ping (0x9) is a keepalive mechanism. Either side can send a ping at any time, and the other side must respond with a pong (0xA) carrying the same payload as the ping. This serves two purposes: detecting dead connections where the TCP socket appears open but the remote end is gone, and keeping the connection alive through NAT gateways and load balancers that time out idle connections.

Unsolicited pongs (sent without a preceding ping) are also allowed and must be silently ignored. This enables a useful pattern: a client can send periodic unsolicited pongs as a unidirectional heartbeat, keeping the connection alive without requiring the server to track ping/pong state.

Close

The close frame (0x8) initiates a graceful shutdown. The payload, if present, begins with a 2-byte status code followed by an optional UTF-8 reason string. Common status codes include:

1000 — Normal closure
1001 — Going away (server shutting down, or user navigated away)
1002 — Protocol error
1003 — Unsupported data type
1006 — Abnormal closure (no close frame received — connection dropped)
1008 — Policy violation
1009 — Message too big
1011 — Unexpected server error

The close handshake is two-way: when one side sends a close frame, the other side must respond with its own close frame and then close the TCP connection. If the close handshake does not complete within a reasonable timeout, the TCP connection is terminated anyway.

WebSocket over TLS (wss://)

Just as HTTPS wraps HTTP in TLS, the wss:// scheme wraps the WebSocket protocol in TLS. The TLS handshake happens first at the transport layer, and then the WebSocket handshake proceeds over the encrypted channel. This means the HTTP upgrade request, including the Sec-WebSocket-Key, is encrypted and invisible to network observers.

Using wss:// is critical in production for several reasons. First, many corporate proxies and middleboxes intercept and sometimes corrupt unencrypted WebSocket traffic they do not understand. TLS prevents this interference because the proxy cannot inspect or modify the encrypted bytes. Second, web browsers enforce mixed content rules: a page loaded over HTTPS cannot connect to a ws:// endpoint. Third, TLS provides authentication and integrity, ensuring the client is actually talking to the intended server and not a man-in-the-middle.

Comparison: WebSocket vs SSE vs Long Polling

WebSockets are not the only way to push data from server to client. Understanding the alternatives helps clarify when each approach is appropriate.

Long Polling

Long polling is the oldest technique. The client sends an HTTP request, and the server holds the request open until it has data to send. When data arrives, the server responds, and the client immediately sends a new request. This creates a cycle of request-response pairs that approximates server push.

Long polling works everywhere HTTP works — no special protocol support needed. But it adds latency (each message requires a new HTTP round-trip), consumes more server resources (each pending request ties up a connection), and produces heavier traffic (HTTP headers on every exchange).

Server-Sent Events (SSE)

SSE (EventSource API) is a standard for server-to-client streaming over plain HTTP. The server sends a text/event-stream response that never closes, streaming events as lines of text. SSE is simpler than WebSockets: it uses standard HTTP, supports automatic reconnection with last-event-ID tracking, and works with HTTP/2 multiplexing.

The tradeoff: SSE is unidirectional — only the server can push data. The client cannot send messages through the SSE connection. If you need to send data to the server, you make separate HTTP requests. For many use cases (news feeds, stock tickers, notification streams), this is perfectly adequate. SSE also only supports UTF-8 text, not binary data.

When to Use Each

Use long polling when you need maximum compatibility and very low message rates. Use SSE when you only need server-to-client push, want automatic reconnection, and are happy with text data — it is simpler and more HTTP-friendly than WebSockets. Use WebSockets when you need true bidirectional communication, binary data, or very high message rates where the per-message overhead of HTTP is unacceptable.

Scaling WebSockets

Scaling stateless HTTP is well understood: add more servers behind a load balancer and any server can handle any request. WebSockets break this model because they are stateful — each connection is a long-lived session bound to a specific server process. This introduces two fundamental scaling challenges.

Sticky Sessions

If a WebSocket client reconnects (due to a network hiccup, server restart, or load balancer timeout), it may land on a different server that has no knowledge of its previous session state. Solutions include:

Sticky sessions (session affinity) — The load balancer routes the same client to the same backend server, often based on a cookie or the client's IP. This works but creates uneven load distribution and complicates failover.
Stateless design — Store all session state externally (in Redis, a database, etc.) so any server can resume the session. The client re-subscribes to its channels on reconnection. This is the more robust approach.

Redis Pub/Sub Fan-Out

When you have N servers each holding a fraction of total WebSocket connections, a message that needs to reach all connected clients must be broadcast across all servers. A common architecture uses Redis Pub/Sub (or similar message brokers like NATS, Kafka, or RabbitMQ) as a fan-out layer.

The pattern works as follows: each WebSocket server subscribes to relevant Redis channels. When an event occurs (a new BGP update, a chat message, a price change), it is published to Redis. Redis fans the message out to all subscribed servers, and each server pushes it to its local WebSocket clients. This decouples the event source from the connection layer, allowing each to scale independently.

A single modern server can handle hundreds of thousands of concurrent WebSocket connections. The bottleneck is usually not raw connection count but the rate of messages being broadcast — if every connection receives every message, bandwidth becomes the constraint. Channelized subscriptions (where clients only receive messages for topics they care about) dramatically reduce the fan-out factor.

WebSocket Compression (permessage-deflate)

RFC 7692 defines the permessage-deflate extension, which compresses WebSocket message payloads using the DEFLATE algorithm (the same compression used in gzip and zlib). The extension is negotiated during the handshake via the Sec-WebSocket-Extensions header:

GET /stream HTTP/1.1
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits

HTTP/1.1 101 Switching Protocols
Sec-WebSocket-Extensions: permessage-deflate; server_no_context_takeover

The RSV1 bit in the frame header is repurposed to indicate whether a frame is compressed. When set, the payload is DEFLATE-compressed and must be decompressed before processing.

Two parameters control memory usage: server_no_context_takeover and client_no_context_takeover. Without context takeover, the compression dictionary is reset between messages, reducing memory usage but worsening compression ratio. With context takeover (the default if not specified), the compressor maintains its dictionary across messages, achieving better compression because later messages benefit from patterns seen in earlier ones — but at the cost of per-connection memory for the compression state.

For text-heavy protocols like JSON, permessage-deflate typically achieves 60-80% compression. For already-compressed binary data, it provides little benefit. Servers with many concurrent connections should carefully consider the memory cost: maintaining compression context for 100,000 connections at ~32KB each requires 3.2GB of memory just for compression state.

Security Considerations

WebSockets introduce security concerns that do not exist in plain HTTP.

Origin Checking

The browser sends an Origin header in the WebSocket handshake, just like in a CORS request. The server should validate this header and reject connections from unexpected origins. Unlike HTTP CORS, WebSocket has no preflight mechanism — the connection is established before the server can reject it. If the server does not check the origin, any webpage on the internet can open a WebSocket to your server using the user's cookies, leading to cross-site WebSocket hijacking (CSWSH).

Authentication

WebSocket connections carry cookies from the originating domain, but the WebSocket API in browsers does not support custom headers. This means you cannot pass an Authorization: Bearer <token> header in the upgrade request from browser JavaScript. Common workarounds include:

Cookie-based auth — Authenticate via normal HTTP first, set a session cookie, and the WebSocket handshake will include it. The server validates the cookie during the upgrade.
Query parameter token — Pass a short-lived token as a query parameter: wss://example.com/ws?token=abc123. The server validates it during the handshake. Be aware that query parameters may appear in server logs and intermediary caches.
Post-connection auth — Establish the WebSocket first, then send an authentication message as the first frame. The server must buffer or reject all other messages until authentication completes.

Rate Limiting and Abuse

A single WebSocket connection can send an unlimited number of messages. Without rate limiting, a malicious client can flood a server with messages. Servers should enforce per-connection message rate limits and maximum message sizes (rejecting frames that exceed the limit with status code 1009).

Socket.IO and Its Fallback Mechanism

Socket.IO is a popular JavaScript library that provides a higher-level abstraction over WebSockets. It is not WebSocket — it is a separate protocol that uses WebSocket as its primary transport, but can fall back to HTTP long polling when WebSocket is unavailable.

Socket.IO's key features beyond raw WebSocket include:

Automatic transport selection — Socket.IO starts with HTTP long polling (for reliability) and upgrades to WebSocket once it confirms WebSocket is working. This two-phase approach handles hostile network environments where WebSocket handshakes fail.
Automatic reconnection — If the connection drops, Socket.IO reconnects with exponential backoff, re-authenticating and re-subscribing as needed.
Rooms and namespaces — Built-in support for pub/sub channels, making it easy to broadcast messages to subsets of connected clients.
Message acknowledgements — Request-response semantics on top of the fire-and-forget WebSocket model, with callbacks for confirming message delivery.
Binary support with multiplexing — Socket.IO messages can contain mixed text and binary attachments.

The tradeoff: Socket.IO is not compatible with plain WebSocket. A Socket.IO client cannot connect to a standard WebSocket server, and vice versa. Socket.IO adds its own framing protocol on top of WebSocket (the Engine.IO protocol layer), so you are locked into the Socket.IO ecosystem on both client and server. If you do not need the fallback or the higher-level features, using the native WebSocket API directly is simpler and avoids the dependency.

WebTransport: The Future Beyond WebSockets

WebTransport is a next-generation API for client-server communication that addresses WebSocket's fundamental limitations. While WebSocket runs over TCP, WebTransport can run over HTTP/3 and QUIC, unlocking capabilities that are impossible with TCP.

Key differences from WebSocket:

No head-of-line blocking — TCP guarantees ordered delivery, so a single dropped packet stalls all subsequent data on the connection (head-of-line blocking). WebTransport over QUIC supports multiple independent streams, where a dropped packet on one stream does not affect others. This is transformative for applications that multiplex independent data channels.
Unreliable datagrams — WebTransport supports unreliable, unordered messages (like UDP). For real-time applications like gaming or live video where stale data is worse than missing data, this eliminates the latency penalty of TCP retransmission.
Multiple concurrent streams — A single WebTransport session can open many bidirectional or unidirectional streams without establishing new connections.
Native HTTP/3 integration — WebTransport shares the QUIC connection with other HTTP/3 traffic, benefiting from 0-RTT connection establishment and connection migration (surviving network changes like switching from Wi-Fi to cellular).

As of 2026, WebTransport is supported in Chromium-based browsers and Firefox, with Safari support in development. Server-side support is growing but not yet as ubiquitous as WebSocket. For most applications today, WebSockets remain the pragmatic choice, but WebTransport is the clear direction for latency-sensitive, high-throughput real-time communication.

WebSockets in Practice: Real-Time BGP Data

This site uses WebSockets to consume the RIPE RIS Live feed — a continuous stream of BGP update messages from route collectors around the world. The RIS Live service connects via wss://ris-live.ripe.net/v1/ws/ and streams JSON-encoded BGP updates including route announcements, withdrawals, and AS path changes.

Each update contains the prefix, the origin AS, the full AS path, the collector that observed it, and the peer that sent it. By maintaining a persistent WebSocket connection, the site receives updates within seconds of a route change occurring anywhere on the internet.

This is a textbook use case for WebSockets: high-frequency, server-pushed data that the client needs in real time, over a long-lived connection. An HTTP polling approach would either miss updates between polls or waste bandwidth polling when nothing has changed.

Summary

WebSocket is a mature, well-supported protocol that fills a real gap in the HTTP model. It starts with a clever HTTP upgrade handshake that traverses existing infrastructure, then switches to a compact binary framing protocol optimized for bidirectional, low-latency communication. The protocol handles fragmentation, keepalives, and graceful shutdown through well-defined control frames. Masking protects network infrastructure from cache-poisoning attacks. TLS (wss://) protects the channel itself.

For scaling, the stateful nature of WebSocket connections demands different architectural patterns than stateless HTTP — sticky sessions, external state stores, and message brokers for fan-out. Libraries like Socket.IO add reliability features at the cost of protocol lock-in. Looking forward, WebTransport over QUIC will eventually supersede WebSockets for applications that need unreliable messaging, multi-stream multiplexing, or freedom from TCP's head-of-line blocking.

You can see WebSockets in action on this site: the BGP data displayed when you look up an IP address or ASN is collected via a persistent WebSocket connection to RIPE's global route collectors.