gRPC Streaming: Patterns and Best Practices

gRPC streaming transforms a simple request-response protocol into a persistent, bidirectional data channel built on top of HTTP/2. Where a unary RPC call mirrors the familiar pattern of REST — send a request, get a response — streaming RPCs keep the connection open and allow either side (or both simultaneously) to send a sequence of messages over time. This capability unlocks patterns that are awkward or impossible with traditional HTTP: real-time event feeds, incremental file uploads, live collaboration, and telemetry pipelines that aggregate thousands of data points before flushing them to the server.

Understanding streaming requires understanding the transport layer beneath it. gRPC does not invent its own framing or multiplexing. It delegates that entirely to HTTP/2 — and the properties of HTTP/2 streams, frames, and flow control directly shape how gRPC streaming behaves in production. Every decision you make about stream lifecycle, backpressure, error handling, and keepalives is ultimately a decision about HTTP/2 behavior.

HTTP/2 Streams and Frames: The Foundation

HTTP/2 multiplexes multiple logical streams over a single TCP connection. Each stream is identified by an integer ID and carries an independent sequence of frames. A gRPC call — whether unary or streaming — maps to exactly one HTTP/2 stream. The key frame types that matter for gRPC are:

HEADERS frame — carries the gRPC method path (/package.Service/Method), content-type (application/grpc), and metadata (equivalent to HTTP headers). A stream always begins with a HEADERS frame.
DATA frame — carries the serialized protobuf messages. Each gRPC message is preceded by a 5-byte prefix: 1 byte for compression flag and 4 bytes for message length. Multiple gRPC messages can be sent as multiple DATA frames on the same stream.
HEADERS frame (trailers) — sent at the end of the stream to convey the gRPC status code (grpc-status) and any error message (grpc-message). This is how gRPC communicates success or failure for the entire RPC. For failed unary calls, gRPC uses a "Trailers-Only" optimization where the status code and error details are sent in the initial HEADERS frame itself (with the END_STREAM flag set), avoiding the overhead of sending a separate trailers frame for calls that never produce a response body.
RST_STREAM frame — abruptly terminates a single stream without affecting other streams on the same connection. This is the mechanism behind gRPC cancellation.
WINDOW_UPDATE frame — adjusts the flow-control window, controlling how much data a sender is allowed to transmit before the receiver acknowledges it.

The critical insight is that HTTP/2 stream IDs are cheap. Opening a new stream costs a single HEADERS frame — no new TCP handshake, no TLS negotiation. This is why gRPC can multiplex hundreds of concurrent RPCs on a single connection. But it also means that all those RPCs share the same TCP congestion window, and a single slow stream can cause head-of-line blocking at the TCP level. This is one motivation for QUIC and HTTP/3, which eliminates this coupling by giving each stream its own congestion state.

Server Streaming: Real-Time Feeds and Pagination Alternatives

In a server-streaming RPC, the client sends a single request and the server responds with a stream of messages. The protobuf service definition looks like this:

service RouteService {
  rpc WatchRoutes (WatchRequest) returns (stream RouteUpdate);
}

The client sends one WatchRequest, and the server pushes RouteUpdate messages as they become available. The stream stays open until the server closes it (by sending trailers), the client cancels it, or an error occurs.

Pattern: Real-Time Event Feeds

Server streaming is the natural fit for real-time data feeds: BGP route updates, stock price tickers, log tailing, database change streams. The server maintains a long-lived stream and pushes events as they occur. Compared to polling, this eliminates wasted requests during quiet periods and reduces latency during busy periods (no polling interval delay).

The key design decisions for event feed streams are:

Initial snapshot vs. delta-only — Should the stream begin with a full snapshot of current state, followed by incremental updates? Or should the client first make a unary call for the initial state and then open a stream for deltas? Kubernetes Watch API does the former: you can pass resourceVersion=0 to get a synthetic ADDED event for every existing resource, followed by real-time events.
Heartbeats — On a long-lived stream, how does the client distinguish "no events have occurred" from "the connection is dead"? Periodic heartbeat messages solve this. Kubernetes sends a BOOKMARK event; Spanner sends heartbeat timestamps.
Resumption — When a stream breaks, the client needs to reconnect without missing events. This requires some form of cursor: a sequence number, a timestamp, or a resource version. The reconnection request includes the cursor, and the server resumes from that point.

Pattern: Pagination Alternative

Server streaming can replace traditional pagination for large result sets. Instead of the client making N requests for N pages (each requiring the server to re-execute and skip rows), the server opens a stream and sends results as fast as the client can consume them. This is particularly effective when:

The total result count is unknown or expensive to compute
Results are produced incrementally (e.g., a database cursor scan)
The client processes items one at a time rather than needing random access to pages

However, server streaming for pagination has a tradeoff: if the client disconnects mid-stream, all progress is lost unless you implement cursoring. With traditional pagination, each page request is independent and stateless.

Client Streaming: File Upload and Telemetry Batching

Client streaming is the inverse: the client sends a stream of messages and the server responds with a single message after the client closes its half of the stream. The service definition:

service IngestService {
  rpc UploadFile (stream FileChunk) returns (UploadResult);
  rpc ReportMetrics (stream MetricBatch) returns (IngestSummary);
}

Pattern: Chunked File Upload

Client streaming is the idiomatic way to upload large files over gRPC. The client breaks the file into chunks (typically 16KB–64KB, staying well below the default 4MB max message size) and sends each chunk as a separate message in the stream. The server reassembles the chunks, and once the client signals end-of-stream, the server responds with metadata about the upload (hash, size, storage location).

Advantages over a single large message:

Bounded memory — neither client nor server needs to hold the entire file in memory
Progress tracking — the client knows how many chunks it has sent
Streaming validation — the server can compute a rolling hash or reject malformed data early (by sending RST_STREAM) without waiting for the entire file

The main limitation is that gRPC client streaming does not natively support resumable uploads. If the stream breaks at chunk 500 of 1000, the client must restart from chunk 0 unless you build a resumption protocol into your messages (e.g., including an offset field in the request, and having the server report what it has received so far via a separate unary RPC).

Pattern: Telemetry Batching

High-volume telemetry systems use client streaming to aggregate many small data points into an efficient pipeline. Instead of making a separate RPC for each metric or log entry (which would create enormous overhead from per-call metadata, serialization, and scheduling), the client opens a single stream and sends batches of metrics at regular intervals.

OpenTelemetry's OTLP/gRPC exporter uses a variant of this pattern. The client accumulates spans or metrics in memory, and periodically flushes them to the collector. While OTLP technically uses unary RPCs for each batch, high-throughput deployments benefit from a persistent client stream that avoids the overhead of repeated stream creation.

Design considerations for telemetry streams:

Batch size vs. latency — Larger batches are more efficient but increase the delay before data reaches the server. A common approach is to flush on either a batch size threshold (e.g., 1000 items) or a time interval (e.g., every 5 seconds), whichever comes first.
At-least-once semantics — If the stream breaks before the server acknowledges, the client does not know which batches were processed. Including sequence numbers or UUIDs allows the server to deduplicate on reconnection.
Graceful drain — When the client is shutting down, it needs to flush any buffered data and close the stream cleanly before exiting. This requires coordination between the application's shutdown logic and the gRPC client library.

Bidirectional Streaming: Chat, Collaboration, and Beyond

Bidirectional (bidi) streaming is the most powerful and most complex streaming pattern. Both the client and server send independent streams of messages concurrently on the same HTTP/2 stream. Neither side has to wait for the other to finish sending before it can send its own messages.

service CollabService {
  rpc EditDocument (stream EditOp) returns (stream EditOp);
  rpc Chat (stream ChatMessage) returns (stream ChatMessage);
}

The two message streams are logically independent. The client can send 10 messages before the server sends any, or the server can start sending immediately. The ordering guarantee is per-direction: messages from client to server arrive in order, and messages from server to client arrive in order, but there is no guaranteed ordering between the two directions.

Pattern: Chat and Messaging

Chat is the textbook bidi streaming use case. Each participant opens a stream to the server. Messages from the client are relayed to other participants, and messages from other participants arrive on the server-to-client half of the stream. The server acts as a fan-out hub, receiving messages from many client streams and distributing them across the appropriate recipient streams.

The practical challenge is managing the lifecycle of many concurrent streams. If one participant's network goes down, their stream breaks. The server must detect this (via keepalive or send failure), clean up the stream's resources, and notify other participants. The dead participant's client must reconnect and re-establish state, possibly replaying missed messages.

Pattern: Live Collaboration

Collaborative editing tools (think Google Docs-style real-time editing) use bidi streaming to exchange operational transforms or CRDTs between participants. The client sends local edits as they happen, and the server sends remote edits from other participants plus any conflict resolution transforms.

This pattern demands careful attention to ordering and acknowledgment. Each edit carries a sequence number. The server maintains an authoritative order and transforms concurrent edits to preserve consistency. The acknowledgment flow (server sending back ack: seq=N) lets the client know which of its edits have been committed and can be removed from the local undo buffer.

Pattern: Interactive Computation

Some RPCs need both sides to send data reactively. Consider a machine learning inference service where the client streams audio frames and the server streams back partial transcription results in real time. Or a database query service where the client sends SQL and the server streams results, but the client can send a cancellation or a follow-up query mid-stream without opening a new RPC.

Flow Control and Backpressure

Flow control is the mechanism that prevents a fast sender from overwhelming a slow receiver. gRPC inherits HTTP/2's flow control, which operates at two levels:

Connection-level flow control — A single window shared by all streams on the connection. The default is 65,535 bytes (the HTTP/2 spec minimum). When this window is exhausted, no stream on the connection can send data until the receiver issues a WINDOW_UPDATE.
Stream-level flow control — Each stream has its own window. This prevents one stream from monopolizing the connection's bandwidth.

In gRPC, the flow control window sizes are configurable. Go's gRPC implementation defaults to 64KB per stream and 16MB per connection. Java defaults differ. The critical point is that when a stream's flow-control window reaches zero, the sending side blocks — Send() calls will block (or return a "resource exhausted" error in non-blocking mode) until the receiver consumes data and issues WINDOW_UPDATE frames.

This is backpressure in action. If the server is producing messages faster than the client can consume them, the flow-control window on the client side fills up, and the server's Send() eventually blocks. This prevents unbounded memory growth. But it also means that a single slow consumer on a shared connection can indirectly slow down other streams by consuming the connection-level window.

In practice, you should:

Process received messages promptly — The gRPC library buffers incoming messages. If your application is slow to call Recv(), the buffer fills and flow control kicks in. Long-running processing should be done asynchronously after dequeueing.
Tune window sizes for your workload — The default 64KB stream window works for most cases. High-throughput bulk data transfers benefit from larger windows (1MB+). Latency-sensitive streams with many small messages may prefer smaller windows.
Monitor flow control stalls — Most gRPC implementations expose metrics or logs for flow-control events. Frequent stalls indicate a producer-consumer imbalance that needs architectural attention (faster consumers, buffering, or load shedding).

Keepalive and Idle Timeouts

Long-lived streams are particularly vulnerable to silent connection death. TCP keepalive operates at the transport layer with long default intervals (typically 2 hours), which is far too slow for application-level health checking. Load balancers, firewalls, and NAT devices may silently drop idle connections after 30–300 seconds of inactivity. When this happens, the client's Recv() hangs indefinitely — it does not know the connection is gone until it tries to send or the TCP retransmission timeout expires (which can take minutes).

gRPC addresses this with HTTP/2 PING frames. The keepalive mechanism works as follows:

Keepalive time — how often to send PING frames on an idle connection. Default is implementation-dependent (Go defaults to infinity, meaning keepalive is off unless configured). A common production setting is 30–60 seconds.
Keepalive timeout — how long to wait for a PING acknowledgment before considering the connection dead. Default is 20 seconds in most implementations.
Permit without calls — whether to send PING frames even when no RPCs are active on the connection. This is important for connection pools that maintain warm connections.

The server side has complementary settings:

Min time between pings — the server enforces a minimum interval between client PINGs. If a client sends PINGs too aggressively, the server sends a GOAWAY with ENHANCE_YOUR_CALM and closes the connection. The default minimum is 5 minutes — meaning you must configure the server to accept more frequent pings if your client's keepalive interval is less than 5 minutes.
Max connection idle — how long a connection with no active RPCs can remain open. Useful for reclaiming server resources.
Max connection age — a hard limit on connection lifetime, after which the server sends GOAWAY. This forces clients to reconnect, which is useful for load balancing (especially in Kubernetes, where new pods don't receive traffic on existing connections).

A common production misconfiguration: the client sets keepalive to 10 seconds, but the server's minimum ping interval remains at the default 5 minutes. The server perceives the frequent pings as abusive and kills the connection. You see ENHANCE_YOUR_CALM errors in the client logs and unexplained stream resets.

Error Handling in Streams

Error handling in streaming RPCs is fundamentally different from unary RPCs. In a unary call, the error is the response — you get either a result or an error, never both. In a streaming call, you might receive 500 successful messages and then an error on the 501st. The partial results are real and valid; the error applies to the stream going forward, not retroactively.

gRPC defines a set of canonical status codes that map to specific error conditions:

OK (0) — stream completed successfully (sent in trailers)
CANCELLED (1) — the client cancelled the stream
DEADLINE_EXCEEDED (4) — the stream's deadline elapsed
RESOURCE_EXHAUSTED (8) — flow control or rate limit hit
UNAVAILABLE (14) — transient failure, safe to retry
INTERNAL (13) — server bug or unexpected condition

For streaming RPCs, the error arrives in one of two ways:

Graceful error — the server sends trailing HEADERS with grpc-status set to a non-OK value. The client's Recv() returns the error after all buffered messages have been delivered. This is the normal error path.
Abrupt error — the server (or an intermediary like a load balancer) sends RST_STREAM or the TCP connection drops. The client's Recv() returns an error immediately. Any buffered messages that had not yet been delivered are lost.

Best practices for stream error handling:

Always check the error from Recv() — When Recv() returns io.EOF (Go) or equivalent, that means the server closed the stream normally. Check the stream's final status via Trailer() or equivalent to get the grpc-status.
Distinguish retryable from permanent errors — UNAVAILABLE means "try again." INVALID_ARGUMENT means "your request is wrong; trying again won't help." RESOURCE_EXHAUSTED might mean "try again later" (rate limit) or "your message is too big" (permanent).
Implement exponential backoff for reconnection — When a stream fails with UNAVAILABLE, reconnect with jittered exponential backoff (e.g., 100ms, 200ms, 400ms, ... capped at 30s). gRPC client libraries often include built-in retry logic, but it typically applies to unary RPCs. For streaming RPCs, you usually need to implement retry/reconnection yourself.
Use deadlines judiciously on long-lived streams — Setting a 30-second deadline on a real-time event stream will cause the stream to be killed after 30 seconds. For long-lived streams, either set no deadline and rely on keepalive for health checking, or set a very long deadline and renew the stream before it expires.

Stream Lifecycle and Cancellation

Understanding the full lifecycle of a gRPC stream is essential for writing correct streaming code. A stream goes through several states:

Open — the client sends HEADERS, creating the HTTP/2 stream. The server receives the request and begins processing.
Half-closed (client) — the client has sent all its messages and calls CloseSend(). The server's Recv() will eventually return EOF. The server can still send messages.
Half-closed (server) — the server has sent all its messages and sends trailing HEADERS. The client's Recv() will eventually return EOF. The client can still send messages (though the server won't process them).
Closed — both sides are done. The HTTP/2 stream is freed.

Cancellation is the mechanism for either side to abort a stream prematurely. When the client cancels a stream (by calling Cancel() on the context in Go, or equivalent), the gRPC library sends a RST_STREAM frame to the server. The server's handler receives a cancellation signal (via context cancellation in Go, ServerCallStreamObserver.setOnCancelHandler() in Java).

Cancellation propagation is critical for resource management. If a client cancels a server-streaming RPC that is scanning a large database, the server should stop the scan immediately rather than continuing to read rows that will never be sent. Failure to observe cancellation is one of the most common resource leaks in gRPC servers — goroutines or threads that continue executing after the client has gone.

// Go: Observe context cancellation in server handler
func (s *server) WatchRoutes(req *pb.WatchRequest, stream pb.RouteService_WatchRoutesServer) error {
    for {
        select {
        case <-stream.Context().Done():
            // Client cancelled or deadline exceeded
            return stream.Context().Err()
        case update := <-s.routeUpdates:
            if err := stream.Send(update); err != nil {
                return err
            }
        }
    }
}

Comparison: gRPC Streaming vs. WebSockets vs. SSE

gRPC streaming, WebSockets, and Server-Sent Events (SSE) all provide mechanisms for streaming data between clients and servers, but they serve different niches and make different trade-offs.

WebSockets

WebSockets provide a raw, bidirectional byte stream over a single TCP connection. After an HTTP/1.1 upgrade handshake, the connection becomes a full-duplex channel with minimal framing. WebSockets are:

Transport-agnostic in message format — you send raw bytes or text. There is no built-in schema, serialization, or type safety. You bring your own protocol.
Browser-native — every browser supports WebSockets. This is their primary advantage for real-time web applications.
No built-in flow control at the application level — WebSockets rely on TCP flow control. There is no per-message backpressure mechanism.
No multiplexing — each WebSocket is one stream on one TCP connection. For multiple independent streams, you need multiple connections or application-level multiplexing.

Server-Sent Events (SSE)

SSE is a simple, unidirectional (server-to-client) protocol built on plain HTTP. The server sends a stream of text events, each prefixed with data: lines. SSE is:

Trivially simple — it is just a long-lived HTTP response with Content-Type: text/event-stream. Any HTTP server can implement it.
Unidirectional only — the client cannot send messages back on the same connection. For bidirectional communication, you need a separate mechanism.
Auto-reconnect — browsers automatically reconnect SSE streams with a Last-Event-ID header, providing built-in resumption that neither WebSockets nor gRPC offer natively.
Text-only — SSE messages are UTF-8 text. Binary data requires Base64 encoding.

When to Use Each

	gRPC Streaming	WebSockets	SSE
Direction	Unary, server, client, bidi	Bidirectional	Server-to-client only
Schema	Protobuf (strongly typed)	None (bring your own)	None (text-based)
Multiplexing	Yes (HTTP/2)	No	Limited
Browser support	Via grpc-web (limited)	Native	Native
Flow control	HTTP/2 per-stream	TCP only	TCP only
Best for	Service-to-service	Browser real-time	Simple notifications

gRPC streaming shines in service-to-service communication where both endpoints are controlled infrastructure. You get strong typing, multiplexing, flow control, and a rich ecosystem of load balancers and observability tools that understand gRPC semantics. WebSockets win when you need bidirectional communication with a browser. SSE wins when you need dead-simple server push with automatic reconnection.

Note that HTTP/3 (QUIC) may shift this calculus. QUIC provides stream-level multiplexing without head-of-line blocking, making it an even more attractive transport for gRPC. The gRPC-over-QUIC effort is ongoing, and once mature, it will offer the multiplexing benefits of HTTP/2 streams without the TCP-level coupling that currently limits them.

Real-World Patterns

Kubernetes Watch API

The Kubernetes API server uses server streaming extensively for its Watch mechanism. When a controller or operator calls Watch(), it opens a long-lived stream that receives events about resource changes (pods created, services updated, config maps deleted). This is the foundation of Kubernetes' reconciliation loop architecture.

Key design choices in the Kubernetes Watch API:

Resource versioning — every Kubernetes resource has a resourceVersion field that increases monotonically. When reconnecting after a stream break, the client passes its last known resourceVersion and the server resumes from that point. If the requested version is too old (purged from etcd's compaction), the server returns a 410 Gone error and the client must re-list.
Bookmark events — periodic heartbeat events that carry only a resourceVersion, no resource data. These let the client advance its cursor without waiting for actual changes, reducing the window of data that must be replayed on reconnection.
Watch caching — the API server maintains a watch cache (ring buffer of recent events) to serve watch requests without hitting etcd directly. This allows hundreds of controllers to watch the same resource type without multiplying etcd load.

The gRPC implementation (since Kubernetes has been migrating toward gRPC for internal communication) maps cleanly: the Watch RPC is a server-streaming call where the server pushes WatchEvent messages containing the event type and the serialized resource.

Google Cloud Spanner Change Streams

Spanner's change streams use server-streaming gRPC to push database mutations to consumers in real time. When you create a change stream on a table, readers can open a gRPC stream that receives a chronologically ordered sequence of data change records, heartbeat records, and partition metadata.

Spanner's design is notable for how it handles partitioning: a single change stream is split into multiple partitions, and the client must open a separate gRPC stream for each partition. Partitions can split and merge over time as the database reshards. The client receives metadata about these splits in-band and must dynamically adjust the number of concurrent streams. This is far more complex than a simple single-stream pattern, but it scales to enormous throughput.

The heartbeat records in Spanner serve a dual purpose: they advance the low watermark (ensuring that the consumer knows no earlier mutations are pending) and they keep the gRPC stream alive through load balancers and proxies that would otherwise kill idle connections.

Google Cloud Pub/Sub Streaming Pull

Pub/Sub's StreamingPull is a bidirectional streaming RPC. The client sends StreamingPullRequest messages that include both the initial subscription binding and ongoing acknowledgments for processed messages. The server sends StreamingPullResponse messages containing batches of published messages.

service Subscriber {
  rpc StreamingPull (stream StreamingPullRequest) returns (stream StreamingPullResponse);
}

This bidirectional design is deliberate. The client needs to continuously acknowledge processed messages (via the client-to-server stream) while simultaneously receiving new messages (via the server-to-client stream). A unary acknowledge RPC would add latency and connection overhead. The streaming acknowledgment flow also allows the server to dynamically adjust delivery rate based on how quickly the client is acknowledging — a form of application-level flow control layered on top of HTTP/2 flow control.

Pub/Sub's implementation also demonstrates a pattern for graceful stream recycling: the server periodically closes streams after a maximum lifetime (roughly 10 minutes) and the client library automatically reopens them. This forces rebalancing across server backends and prevents long-lived streams from pinning to a single server that may be overloaded or about to be drained for maintenance.

Production Best Practices

After working through the patterns and mechanisms above, here is a consolidated set of best practices for operating gRPC streams in production.

Connection Management

Configure keepalive on both sides — set client keepalive time to 30–60 seconds, and ensure the server's minimum ping interval is lower than or equal to the client's keepalive time. Set keepalive timeout to 10–20 seconds.
Use max connection age on servers — especially in Kubernetes or autoscaling environments. Without it, existing connections stick to old pods and new pods never receive traffic. A max connection age of 30–60 minutes with some jitter provides good rebalancing without excessive reconnection churn.
Implement client-side reconnection logic — gRPC client libraries handle connection-level reconnection automatically, but stream-level reconnection is your responsibility. When a streaming RPC fails, your code must create a new stream, potentially with a resumption token.

Observability

Track stream duration — log or meter how long streams stay open. Unexpectedly short streams indicate a misconfiguration (keepalive mismatch, aggressive load balancer timeouts). Unexpectedly long streams may indicate clients that are not properly recycling.
Count messages per stream — a stream that sends zero messages before closing is probably a health check failure. A stream that sends millions of messages may need investigation for memory or resource leaks.
Monitor error rates by status code — distinguish between client cancellations (CANCELLED, which are normal), deadline violations (DEADLINE_EXCEEDED, which may indicate slow servers), and unexpected failures (INTERNAL, UNAVAILABLE).
Trace individual stream lifecycles — correlate client-side and server-side events for the same stream using trace IDs propagated through gRPC metadata.

Resource Management

Bound concurrent streams per connection — HTTP/2 allows negotiating a SETTINGS_MAX_CONCURRENT_STREAMS value. The default is 100 in most gRPC implementations. If your application opens many concurrent streaming RPCs, you may hit this limit and need to either increase it or use multiple connections (subchannels). This limit can become a silent scaling bottleneck: when the limit is reached, new RPCs queue in the client until a stream slot opens up, manifesting as unexplained latency spikes rather than explicit errors.
Clean up on cancellation — server handlers must observe context cancellation and release resources (database cursors, file handles, goroutines) promptly. A server that ignores cancellation will accumulate zombie streams that consume memory and CPU.
Set message size limits — the default max receive message size in gRPC is 4MB. For streaming RPCs with large messages, you may need to increase this. But be cautious: a single 100MB message in a stream will cause a 100MB allocation.

Load Balancing

Prefer L7 gRPC-aware load balancers — traditional L4 (TCP) load balancers assign a connection to a backend at connection time. Since gRPC multiplexes many RPCs over one connection, all traffic from a single client goes to one backend. L7 load balancers (like Envoy, Istio, or cloud-native gRPC load balancers) can balance at the RPC level.
Be aware that long-lived streams resist rebalancing — even with L7 balancing, a stream that runs for hours is pinned to its server for hours. This is why Pub/Sub recycles streams every 10 minutes and why Kubernetes recommends max connection age.
Use client-side load balancing for service-to-service — gRPC's built-in round-robin and pick-first policies, combined with name resolution (DNS or xDS), allow clients to spread streams across backends without an intermediary proxy.

Choosing the Right Streaming Pattern

Not every RPC needs to be a streaming RPC. Streaming adds complexity to error handling, reconnection, and testing. Use the simplest pattern that meets your requirements:

Unary — the default. Use this unless you have a specific reason to stream. Simple, stateless, easy to retry, easy to load balance.
Server streaming — when the server has an open-ended amount of data to send: real-time feeds, large result sets, event subscriptions. The client makes one request and receives a flow of responses.
Client streaming — when the client has a lot of data to send: file upload, bulk ingestion, telemetry batching. The server processes the stream and responds once at the end.
Bidirectional streaming — when both sides need to send independently: chat, collaborative editing, interactive protocols. This is the most complex pattern and should be chosen deliberately, not as a default.

The decision tree is straightforward: if you need real-time push, use server streaming. If you need efficient bulk send, use client streaming. If you need both simultaneously and independently, use bidi. If you are unsure, start with unary and evolve to streaming when you hit its limitations. The upgrade path from unary to streaming is usually clean in gRPC — you add the stream keyword to your proto definition and adjust the handler code.

For browser-facing workloads, remember that gRPC streaming in the browser requires grpc-web, which only supports unary and server streaming (client streaming and bidi streaming require a proxy that translates to full HTTP/2). If you need bidirectional browser communication, WebSockets remain the pragmatic choice until HTTP/3 WebTransport matures.