gRPC Streaming: Patterns and Best Practices

gRPC streaming transforms a simple request-response protocol into a persistent, bidirectional data channel built on top of HTTP/2. Where a unary RPC call mirrors the familiar pattern of REST — send a request, get a response — streaming RPCs keep the connection open and allow either side (or both simultaneously) to send a sequence of messages over time. This capability unlocks patterns that are awkward or impossible with traditional HTTP: real-time event feeds, incremental file uploads, live collaboration, and telemetry pipelines that aggregate thousands of data points before flushing them to the server.

Understanding streaming requires understanding the transport layer beneath it. gRPC does not invent its own framing or multiplexing. It delegates that entirely to HTTP/2 — and the properties of HTTP/2 streams, frames, and flow control directly shape how gRPC streaming behaves in production. Every decision you make about stream lifecycle, backpressure, error handling, and keepalives is ultimately a decision about HTTP/2 behavior.

HTTP/2 Streams and Frames: The Foundation

HTTP/2 multiplexes multiple logical streams over a single TCP connection. Each stream is identified by an integer ID and carries an independent sequence of frames. A gRPC call — whether unary or streaming — maps to exactly one HTTP/2 stream. The key frame types that matter for gRPC are:

gRPC Streaming over HTTP/2: Frame Sequence Single TCP Connection Stream 1 (Unary RPC) HEADERS DATA HEADERS DATA TRAILERS response Stream 3 (Server Streaming RPC) HEADERS DATA HEADERS DATA DATA DATA ... TRAILERS Stream 5 (Bidi Streaming RPC) HEADERS DATA DATA DATA DATA DATA ... TRAILERS client data server data end-of-stream

The critical insight is that HTTP/2 stream IDs are cheap. Opening a new stream costs a single HEADERS frame — no new TCP handshake, no TLS negotiation. This is why gRPC can multiplex hundreds of concurrent RPCs on a single connection. But it also means that all those RPCs share the same TCP congestion window, and a single slow stream can cause head-of-line blocking at the TCP level. This is one motivation for QUIC and HTTP/3, which eliminates this coupling by giving each stream its own congestion state.

Server Streaming: Real-Time Feeds and Pagination Alternatives

In a server-streaming RPC, the client sends a single request and the server responds with a stream of messages. The protobuf service definition looks like this:

service RouteService {
  rpc WatchRoutes (WatchRequest) returns (stream RouteUpdate);
}

The client sends one WatchRequest, and the server pushes RouteUpdate messages as they become available. The stream stays open until the server closes it (by sending trailers), the client cancels it, or an error occurs.

Pattern: Real-Time Event Feeds

Server streaming is the natural fit for real-time data feeds: BGP route updates, stock price tickers, log tailing, database change streams. The server maintains a long-lived stream and pushes events as they occur. Compared to polling, this eliminates wasted requests during quiet periods and reduces latency during busy periods (no polling interval delay).

The key design decisions for event feed streams are:

Pattern: Pagination Alternative

Server streaming can replace traditional pagination for large result sets. Instead of the client making N requests for N pages (each requiring the server to re-execute and skip rows), the server opens a stream and sends results as fast as the client can consume them. This is particularly effective when:

However, server streaming for pagination has a tradeoff: if the client disconnects mid-stream, all progress is lost unless you implement cursoring. With traditional pagination, each page request is independent and stateless.

Server Streaming: Real-Time Feed Pattern Client Server WatchRequest { resource_version: "4521" } RouteUpdate { type: ADDED, rv: "4522" } RouteUpdate { type: MODIFIED, rv: "4523" } Heartbeat { rv: "4523" } idle... RouteUpdate { type: DELETED, rv: "4530" } ...

Client Streaming: File Upload and Telemetry Batching

Client streaming is the inverse: the client sends a stream of messages and the server responds with a single message after the client closes its half of the stream. The service definition:

service IngestService {
  rpc UploadFile (stream FileChunk) returns (UploadResult);
  rpc ReportMetrics (stream MetricBatch) returns (IngestSummary);
}

Pattern: Chunked File Upload

Client streaming is the idiomatic way to upload large files over gRPC. The client breaks the file into chunks (typically 16KB–64KB, staying well below the default 4MB max message size) and sends each chunk as a separate message in the stream. The server reassembles the chunks, and once the client signals end-of-stream, the server responds with metadata about the upload (hash, size, storage location).

Advantages over a single large message:

The main limitation is that gRPC client streaming does not natively support resumable uploads. If the stream breaks at chunk 500 of 1000, the client must restart from chunk 0 unless you build a resumption protocol into your messages (e.g., including an offset field in the request, and having the server report what it has received so far via a separate unary RPC).

Pattern: Telemetry Batching

High-volume telemetry systems use client streaming to aggregate many small data points into an efficient pipeline. Instead of making a separate RPC for each metric or log entry (which would create enormous overhead from per-call metadata, serialization, and scheduling), the client opens a single stream and sends batches of metrics at regular intervals.

OpenTelemetry's OTLP/gRPC exporter uses a variant of this pattern. The client accumulates spans or metrics in memory, and periodically flushes them to the collector. While OTLP technically uses unary RPCs for each batch, high-throughput deployments benefit from a persistent client stream that avoids the overhead of repeated stream creation.

Design considerations for telemetry streams:

Bidirectional Streaming: Chat, Collaboration, and Beyond

Bidirectional (bidi) streaming is the most powerful and most complex streaming pattern. Both the client and server send independent streams of messages concurrently on the same HTTP/2 stream. Neither side has to wait for the other to finish sending before it can send its own messages.

service CollabService {
  rpc EditDocument (stream EditOp) returns (stream EditOp);
  rpc Chat (stream ChatMessage) returns (stream ChatMessage);
}

The two message streams are logically independent. The client can send 10 messages before the server sends any, or the server can start sending immediately. The ordering guarantee is per-direction: messages from client to server arrive in order, and messages from server to client arrive in order, but there is no guaranteed ordering between the two directions.

Bidirectional Streaming: Concurrent Message Flow Client Server EditOp { insert: "Hello" } EditOp { cursor: user2, pos: 42 } EditOp { delete: range(5,8) } EditOp { insert: "World" } EditOp { transform: ... } EditOp { ack: seq=3 } ...

Pattern: Chat and Messaging

Chat is the textbook bidi streaming use case. Each participant opens a stream to the server. Messages from the client are relayed to other participants, and messages from other participants arrive on the server-to-client half of the stream. The server acts as a fan-out hub, receiving messages from many client streams and distributing them across the appropriate recipient streams.

The practical challenge is managing the lifecycle of many concurrent streams. If one participant's network goes down, their stream breaks. The server must detect this (via keepalive or send failure), clean up the stream's resources, and notify other participants. The dead participant's client must reconnect and re-establish state, possibly replaying missed messages.

Pattern: Live Collaboration

Collaborative editing tools (think Google Docs-style real-time editing) use bidi streaming to exchange operational transforms or CRDTs between participants. The client sends local edits as they happen, and the server sends remote edits from other participants plus any conflict resolution transforms.

This pattern demands careful attention to ordering and acknowledgment. Each edit carries a sequence number. The server maintains an authoritative order and transforms concurrent edits to preserve consistency. The acknowledgment flow (server sending back ack: seq=N) lets the client know which of its edits have been committed and can be removed from the local undo buffer.

Pattern: Interactive Computation

Some RPCs need both sides to send data reactively. Consider a machine learning inference service where the client streams audio frames and the server streams back partial transcription results in real time. Or a database query service where the client sends SQL and the server streams results, but the client can send a cancellation or a follow-up query mid-stream without opening a new RPC.

Flow Control and Backpressure

Flow control is the mechanism that prevents a fast sender from overwhelming a slow receiver. gRPC inherits HTTP/2's flow control, which operates at two levels:

In gRPC, the flow control window sizes are configurable. Go's gRPC implementation defaults to 64KB per stream and 16MB per connection. Java defaults differ. The critical point is that when a stream's flow-control window reaches zero, the sending side blocks — Send() calls will block (or return a "resource exhausted" error in non-blocking mode) until the receiver consumes data and issues WINDOW_UPDATE frames.

HTTP/2 Flow Control: Backpressure in Action Fast Producer Slow Consumer DATA (16KB) — window: 48KB remaining DATA (16KB) — window: 32KB remaining DATA (16KB) — window: 16KB remaining DATA (16KB) — window: 0 (BLOCKED) blocked WINDOW_UPDATE (32KB) — consumer caught up DATA (16KB) — window: 16KB remaining (resumed)

This is backpressure in action. If the server is producing messages faster than the client can consume them, the flow-control window on the client side fills up, and the server's Send() eventually blocks. This prevents unbounded memory growth. But it also means that a single slow consumer on a shared connection can indirectly slow down other streams by consuming the connection-level window.

In practice, you should:

Keepalive and Idle Timeouts

Long-lived streams are particularly vulnerable to silent connection death. TCP keepalive operates at the transport layer with long default intervals (typically 2 hours), which is far too slow for application-level health checking. Load balancers, firewalls, and NAT devices may silently drop idle connections after 30–300 seconds of inactivity. When this happens, the client's Recv() hangs indefinitely — it does not know the connection is gone until it tries to send or the TCP retransmission timeout expires (which can take minutes).

gRPC addresses this with HTTP/2 PING frames. The keepalive mechanism works as follows:

The server side has complementary settings:

A common production misconfiguration: the client sets keepalive to 10 seconds, but the server's minimum ping interval remains at the default 5 minutes. The server perceives the frequent pings as abusive and kills the connection. You see ENHANCE_YOUR_CALM errors in the client logs and unexplained stream resets.

Error Handling in Streams

Error handling in streaming RPCs is fundamentally different from unary RPCs. In a unary call, the error is the response — you get either a result or an error, never both. In a streaming call, you might receive 500 successful messages and then an error on the 501st. The partial results are real and valid; the error applies to the stream going forward, not retroactively.

gRPC defines a set of canonical status codes that map to specific error conditions:

For streaming RPCs, the error arrives in one of two ways:

Best practices for stream error handling:

Stream Lifecycle and Cancellation

Understanding the full lifecycle of a gRPC stream is essential for writing correct streaming code. A stream goes through several states:

  1. Open — the client sends HEADERS, creating the HTTP/2 stream. The server receives the request and begins processing.
  2. Half-closed (client) — the client has sent all its messages and calls CloseSend(). The server's Recv() will eventually return EOF. The server can still send messages.
  3. Half-closed (server) — the server has sent all its messages and sends trailing HEADERS. The client's Recv() will eventually return EOF. The client can still send messages (though the server won't process them).
  4. Closed — both sides are done. The HTTP/2 stream is freed.

Cancellation is the mechanism for either side to abort a stream prematurely. When the client cancels a stream (by calling Cancel() on the context in Go, or equivalent), the gRPC library sends a RST_STREAM frame to the server. The server's handler receives a cancellation signal (via context cancellation in Go, ServerCallStreamObserver.setOnCancelHandler() in Java).

Cancellation propagation is critical for resource management. If a client cancels a server-streaming RPC that is scanning a large database, the server should stop the scan immediately rather than continuing to read rows that will never be sent. Failure to observe cancellation is one of the most common resource leaks in gRPC servers — goroutines or threads that continue executing after the client has gone.

// Go: Observe context cancellation in server handler
func (s *server) WatchRoutes(req *pb.WatchRequest, stream pb.RouteService_WatchRoutesServer) error {
    for {
        select {
        case <-stream.Context().Done():
            // Client cancelled or deadline exceeded
            return stream.Context().Err()
        case update := <-s.routeUpdates:
            if err := stream.Send(update); err != nil {
                return err
            }
        }
    }
}

Comparison: gRPC Streaming vs. WebSockets vs. SSE

gRPC streaming, WebSockets, and Server-Sent Events (SSE) all provide mechanisms for streaming data between clients and servers, but they serve different niches and make different trade-offs.

WebSockets

WebSockets provide a raw, bidirectional byte stream over a single TCP connection. After an HTTP/1.1 upgrade handshake, the connection becomes a full-duplex channel with minimal framing. WebSockets are:

Server-Sent Events (SSE)

SSE is a simple, unidirectional (server-to-client) protocol built on plain HTTP. The server sends a stream of text events, each prefixed with data: lines. SSE is:

When to Use Each

gRPC StreamingWebSocketsSSE
DirectionUnary, server, client, bidiBidirectionalServer-to-client only
SchemaProtobuf (strongly typed)None (bring your own)None (text-based)
MultiplexingYes (HTTP/2)NoLimited
Browser supportVia grpc-web (limited)NativeNative
Flow controlHTTP/2 per-streamTCP onlyTCP only
Best forService-to-serviceBrowser real-timeSimple notifications

gRPC streaming shines in service-to-service communication where both endpoints are controlled infrastructure. You get strong typing, multiplexing, flow control, and a rich ecosystem of load balancers and observability tools that understand gRPC semantics. WebSockets win when you need bidirectional communication with a browser. SSE wins when you need dead-simple server push with automatic reconnection.

Note that HTTP/3 (QUIC) may shift this calculus. QUIC provides stream-level multiplexing without head-of-line blocking, making it an even more attractive transport for gRPC. The gRPC-over-QUIC effort is ongoing, and once mature, it will offer the multiplexing benefits of HTTP/2 streams without the TCP-level coupling that currently limits them.

Real-World Patterns

Kubernetes Watch API

The Kubernetes API server uses server streaming extensively for its Watch mechanism. When a controller or operator calls Watch(), it opens a long-lived stream that receives events about resource changes (pods created, services updated, config maps deleted). This is the foundation of Kubernetes' reconciliation loop architecture.

Key design choices in the Kubernetes Watch API:

The gRPC implementation (since Kubernetes has been migrating toward gRPC for internal communication) maps cleanly: the Watch RPC is a server-streaming call where the server pushes WatchEvent messages containing the event type and the serialized resource.

Google Cloud Spanner Change Streams

Spanner's change streams use server-streaming gRPC to push database mutations to consumers in real time. When you create a change stream on a table, readers can open a gRPC stream that receives a chronologically ordered sequence of data change records, heartbeat records, and partition metadata.

Spanner's design is notable for how it handles partitioning: a single change stream is split into multiple partitions, and the client must open a separate gRPC stream for each partition. Partitions can split and merge over time as the database reshards. The client receives metadata about these splits in-band and must dynamically adjust the number of concurrent streams. This is far more complex than a simple single-stream pattern, but it scales to enormous throughput.

The heartbeat records in Spanner serve a dual purpose: they advance the low watermark (ensuring that the consumer knows no earlier mutations are pending) and they keep the gRPC stream alive through load balancers and proxies that would otherwise kill idle connections.

Google Cloud Pub/Sub Streaming Pull

Pub/Sub's StreamingPull is a bidirectional streaming RPC. The client sends StreamingPullRequest messages that include both the initial subscription binding and ongoing acknowledgments for processed messages. The server sends StreamingPullResponse messages containing batches of published messages.

service Subscriber {
  rpc StreamingPull (stream StreamingPullRequest) returns (stream StreamingPullResponse);
}

This bidirectional design is deliberate. The client needs to continuously acknowledge processed messages (via the client-to-server stream) while simultaneously receiving new messages (via the server-to-client stream). A unary acknowledge RPC would add latency and connection overhead. The streaming acknowledgment flow also allows the server to dynamically adjust delivery rate based on how quickly the client is acknowledging — a form of application-level flow control layered on top of HTTP/2 flow control.

Pub/Sub's implementation also demonstrates a pattern for graceful stream recycling: the server periodically closes streams after a maximum lifetime (roughly 10 minutes) and the client library automatically reopens them. This forces rebalancing across server backends and prevents long-lived streams from pinning to a single server that may be overloaded or about to be drained for maintenance.

Real-World Pattern: Pub/Sub StreamingPull (Bidi) Subscriber Pub/Sub Server StreamingPullRequest { subscription: "projects/..." } StreamingPullResponse { messages: [msg1, msg2, msg3] } StreamingPullRequest { ack_ids: ["id1", "id2"] } StreamingPullResponse { messages: [msg4, msg5] } StreamingPullRequest { ack_ids: ["id3"], modify_deadline: [...] } StreamingPullResponse { messages: [msg6, msg7, msg8] } ... GOAWAY (max connection age reached) New stream: StreamingPullRequest { subscription: "projects/..." }

Production Best Practices

After working through the patterns and mechanisms above, here is a consolidated set of best practices for operating gRPC streams in production.

Connection Management

Observability

Resource Management

Load Balancing

Choosing the Right Streaming Pattern

Not every RPC needs to be a streaming RPC. Streaming adds complexity to error handling, reconnection, and testing. Use the simplest pattern that meets your requirements:

The decision tree is straightforward: if you need real-time push, use server streaming. If you need efficient bulk send, use client streaming. If you need both simultaneously and independently, use bidi. If you are unsure, start with unary and evolve to streaming when you hit its limitations. The upgrade path from unary to streaming is usually clean in gRPC — you add the stream keyword to your proto definition and adjust the handler code.

For browser-facing workloads, remember that gRPC streaming in the browser requires grpc-web, which only supports unary and server streaming (client streaming and bidi streaming require a proxy that translates to full HTTP/2). If you need bidirectional browser communication, WebSockets remain the pragmatic choice until HTTP/3 WebTransport matures.

See BGP routing data in real time

Open Looking Glass
More Articles
How gRPC Works
How Protocol Buffers Work
How gRPC-Web Works
gRPC Load Balancing: Strategies and Patterns
gRPC and Service Mesh: Istio, Envoy, and Linkerd
gRPC Security: Authentication, TLS, and Authorization