How gRPC Works

gRPC is a high-performance, open-source Remote Procedure Call (RPC) framework originally developed at Google. It lets a client application call methods on a server application running on a different machine as if it were a local function call, abstracting away the details of network communication. Unlike REST APIs that exchange JSON over HTTP/1.1, gRPC uses Protocol Buffers for serialization and HTTP/2 as its transport, achieving significantly lower latency and higher throughput for service-to-service communication.

gRPC has become the de facto standard for internal microservice communication at companies running large distributed systems. Understanding how it works means understanding several layers of technology working together: HTTP/2 framing, binary serialization, streaming, and a rich middleware ecosystem.

The Foundation: HTTP/2 as Transport

gRPC is built entirely on top of HTTP/2. This is not an incidental choice — HTTP/2's features map directly onto the capabilities gRPC needs. To understand gRPC, you must first understand what HTTP/2 provides. For deeper background on the security layer that typically wraps HTTP/2, see how TLS and HTTPS work.

Multiplexing

HTTP/1.1 suffers from head-of-line blocking: on a single TCP connection, requests and responses must be processed sequentially. If one response is slow, everything behind it waits. HTTP/2 solves this with streams — multiple logical request/response pairs can be interleaved on a single TCP connection. Each gRPC call maps to one HTTP/2 stream, so hundreds of concurrent RPCs can share one connection without blocking each other.

Header Compression (HPACK)

HTTP/2 uses HPACK header compression, which maintains a shared dictionary of previously sent headers between client and server. Since gRPC calls to the same service repeat the same method paths, content types, and authority headers, HPACK compresses these to just a few bytes after the first request. For high-volume internal services making thousands of RPCs per second, this eliminates significant overhead.

Flow Control

HTTP/2 provides per-stream and per-connection flow control. Each side advertises a window size — the number of bytes it is willing to receive before the sender must pause. This prevents a fast producer from overwhelming a slow consumer. gRPC leverages this for streaming RPCs: if a server is streaming results faster than a client can process them, HTTP/2 flow control automatically applies backpressure without any application-level logic.

Server Push and PING

HTTP/2 also supports server push (though gRPC does not use it) and PING frames for connection keepalive. gRPC uses HTTP/2 PING frames to detect dead connections — if a PING goes unanswered, the connection is considered broken and will be re-established.

Protocol Buffers: Binary Serialization

Where REST APIs typically use JSON — a text-based format that must be parsed character by character — gRPC uses Protocol Buffers (protobuf), a binary serialization format. Protobuf messages are smaller on the wire and significantly faster to serialize and deserialize than JSON.

How Protobuf Encoding Works

Protobuf uses a tag-length-value (TLV) encoding. Each field in a message is identified by its field number (not its name), followed by a wire type that indicates how to parse the value. Integer types use varint encoding — small values take fewer bytes. A field with value 1 takes just one byte; a field with value 300 takes two bytes. Field names are never transmitted; the receiver uses a compiled schema to map field numbers back to names.

This means a protobuf message containing {user_id: 42, name: "Alice"} might be 10 bytes, versus 30+ bytes for the equivalent JSON {"user_id":42,"name":"Alice"}. At scale, these savings compound dramatically.

Schema Evolution

Protobuf's field-number-based encoding enables forward and backward compatibility. You can add new fields to a message type, and old clients that do not know about the new fields simply skip them. You can remove fields (by reserving their numbers) without breaking old clients. This is critical for evolving APIs in large organizations where clients and servers cannot be updated simultaneously.

Service Definitions: The .proto File

Every gRPC service starts with a .proto file — a language-neutral contract defining the service's methods and their request/response types. The protoc compiler then generates client and server code in any supported language (Go, Java, Python, C++, Rust, and many others).

syntax = "proto3";

package routeinfo;

service RouteInfoService {
  // Unary: one request, one response
  rpc GetRoute (RouteRequest) returns (RouteResponse);

  // Server streaming: one request, stream of responses
  rpc WatchRoutes (WatchRequest) returns (stream RouteUpdate);

  // Client streaming: stream of requests, one response
  rpc BatchSubmitRoutes (stream RouteSubmission) returns (BatchResult);

  // Bidirectional streaming: both sides stream simultaneously
  rpc RouteExchange (stream RouteUpdate) returns (stream RouteUpdate);
}

message RouteRequest {
  string prefix = 1;
  uint32 prefix_length = 2;
}

message RouteResponse {
  string prefix = 1;
  uint32 origin_asn = 2;
  repeated uint32 as_path = 3;
  string next_hop = 4;
  int64 last_updated = 5;
}

This single file generates type-safe client stubs and server interfaces for every target language. The generated code handles serialization, deserialization, HTTP/2 framing, and error propagation — the developer just implements business logic.

Four RPC Patterns

gRPC supports four communication patterns, each mapping to different HTTP/2 stream behaviors. Understanding when to use each one is essential for designing efficient APIs.

Unary RPC

The simplest pattern: client sends one request, server sends one response. This is equivalent to a normal function call or a REST API request. Under the hood, the client sends HTTP/2 HEADERS (containing the method path, e.g., /routeinfo.RouteInfoService/GetRoute) followed by a DATA frame with the protobuf-encoded request. The server responds with HEADERS and a DATA frame containing the response, followed by trailers with the gRPC status code.

Server Streaming

The client sends a single request and receives a stream of responses. This is ideal for scenarios like subscribing to real-time updates — for example, a BGP monitoring system that streams route changes as they happen. The server sends multiple DATA frames on the same HTTP/2 stream, each containing one protobuf message prefixed by a 5-byte header (1 byte compression flag + 4 bytes message length). The stream ends when the server sends trailers.

Client Streaming

The client sends a stream of messages and the server responds with a single message after consuming the entire stream. Use this for batch uploads or aggregation: the client streams hundreds of data points, and the server returns a summary when the stream is complete.

Bidirectional Streaming

Both client and server send streams of messages independently. The two streams are on the same HTTP/2 stream but operate completely independently — neither side needs to wait for the other. This is the most powerful pattern and is used for real-time bidirectional communication like chat systems, multiplayer game state synchronization, or collaborative editing. The read and write streams can be consumed in any order, giving maximum flexibility.

The gRPC Wire Format

Every gRPC message on the wire follows a specific framing format that sits between HTTP/2 and protobuf.

The client initiates an RPC by sending HTTP/2 headers that include the :method POST, :path /package.Service/Method, and content-type: application/grpc. The request body is a sequence of length-prefixed protobuf messages. The response follows the same format, with the final HTTP/2 trailers carrying the gRPC status code and any error details.

Channels, Stubs, and Connection Lifecycle

In gRPC client code, you interact with two key abstractions:

A channel represents a virtual connection to a gRPC endpoint. It encapsulates connection management, including DNS resolution, TCP connection establishment, TLS handshake, HTTP/2 setup, load balancing, and reconnection. Channels are heavyweight objects designed to be long-lived — you create one at application startup and reuse it for the entire application lifecycle. A single channel may maintain multiple underlying TCP connections (subchannels) for redundancy and load distribution.

A stub (or client) is a lightweight wrapper around a channel that provides the generated method signatures for a particular service. Stubs are cheap to create — they hold a reference to the channel but maintain no state of their own. You typically create one stub per service interface.

Connection States

A gRPC channel transitions through five connectivity states: IDLE (no RPCs in flight, no connection), CONNECTING (establishing a connection), READY (connection established and usable), TRANSIENT_FAILURE (connection failed, will retry with backoff), and SHUTDOWN (channel has been closed). Applications can subscribe to state changes to implement custom health monitoring or logging.

Deadlines and Cancellation

Unlike REST APIs where timeouts are typically an afterthought — set on the HTTP client and opaque to the server — gRPC makes deadlines a first-class concept propagated across the entire call chain.

When a client sets a deadline (e.g., "this call must complete within 500ms"), that deadline is transmitted to the server as the grpc-timeout header. If the server makes downstream gRPC calls to other services, it propagates the remaining deadline — not the original timeout. If only 200ms remain when the downstream call is made, the downstream service sees a 200ms deadline. This prevents cascading timeouts from consuming resources across the system after the original caller has already given up.

Cancellation propagates similarly. When a client cancels an RPC (or the deadline expires), the server receives a cancellation signal via the HTTP/2 RST_STREAM frame. Well-written servers check for cancellation in their handlers and abort expensive work immediately, freeing resources for other requests. This is in stark contrast to REST, where a client disconnect is often undetectable until the server tries to write the response.

Metadata: gRPC's Headers

gRPC metadata is the equivalent of HTTP headers — key-value pairs sent alongside requests and responses. Metadata carries authentication tokens, request IDs, tracing context, and other cross-cutting concerns. There are two types:

Initial metadata — sent by the client with the request, or by the server before the first response message. These map to HTTP/2 headers.
Trailing metadata — sent by the server after the last response message. These map to HTTP/2 trailers and always include the grpc-status code.

Binary metadata is supported by appending -bin to the key name (e.g., trace-context-bin), which tells gRPC to base64-encode the value for HTTP/2 transport. This is how binary tracing contexts and serialized tokens are efficiently propagated without text conversion overhead.

Interceptors: Middleware for gRPC

Interceptors are gRPC's middleware mechanism — they intercept every RPC call on the client side, the server side, or both. They are the standard place to implement cross-cutting concerns:

Authentication — attach bearer tokens or mTLS certificates to outgoing calls; validate credentials on incoming calls
Logging — record method names, durations, status codes, and payload sizes for every RPC
Metrics — emit latency histograms, error rates, and throughput counters (commonly exported to Prometheus)
Distributed tracing — inject and extract trace context (OpenTelemetry span IDs) so traces flow across service boundaries
Retry logic — automatically retry failed RPCs with exponential backoff for transient errors
Rate limiting — enforce per-client or per-method rate limits on the server side

Interceptors chain together: a request passes through each interceptor in order before reaching the handler. This composability makes it easy to build standardized observability and security without modifying individual service implementations.

Load Balancing

Load balancing gRPC is fundamentally different from load balancing HTTP/1.1 REST APIs, because HTTP/2 connections are long-lived and multiplexed. A naive Layer 4 (TCP) load balancer would distribute connections — but since gRPC reuses a single connection for thousands of RPCs, all traffic from one client would go to a single server. gRPC therefore requires per-RPC load balancing, not per-connection.

Client-Side Load Balancing

gRPC has a built-in load balancing framework. The client discovers all backend addresses (via DNS, a service registry, or a custom resolver) and maintains a subchannel to each one. A load balancing policy then decides which subchannel to use for each RPC. The two built-in policies are pick_first (try addresses in order, stick with the first that works) and round_robin (distribute RPCs evenly across all healthy backends). Custom policies can implement weighted balancing, locality-aware routing, or outlier detection.

Proxy Load Balancing (L7)

An alternative is placing a Layer 7 proxy (like Envoy, which was originally built at Lyft specifically for gRPC) in front of backend servers. The proxy terminates the client's HTTP/2 connection and opens separate connections to each backend, distributing individual RPCs across them. Service meshes like Istio and Linkerd use this approach, deploying a sidecar proxy alongside each service. The tradeoff is an additional network hop but simpler client configuration. For more on load balancing architectures, see how load balancers work.

The xDS Protocol

For large-scale deployments, gRPC supports the xDS protocol (originally from Envoy's "x Discovery Service" APIs). The gRPC client itself can speak xDS, receiving load balancing configuration, routing rules, and backend lists from a control plane — effectively getting service mesh capabilities without a sidecar proxy. This is called "proxyless service mesh" and is increasingly used in production at companies that need the performance of client-side load balancing with the manageability of a service mesh.

Health Checking

gRPC defines a standardized health checking protocol via the grpc.health.v1.Health service. Any gRPC server can implement this service to report its health status, and load balancers and orchestrators (like Kubernetes) can query it to decide whether to send traffic to that server.

service Health {
  rpc Check (HealthCheckRequest) returns (HealthCheckResponse);
  rpc Watch (HealthCheckRequest) returns (stream HealthCheckResponse);
}

message HealthCheckResponse {
  enum ServingStatus {
    UNKNOWN = 0;
    SERVING = 1;
    NOT_SERVING = 2;
    SERVICE_UNKNOWN = 3;
  }
  ServingStatus status = 1;
}

The Check method returns the current status; the Watch method streams status changes. Servers can report per-service health — a server hosting multiple gRPC services might report one as SERVING and another as NOT_SERVING during a graceful shutdown or rolling deployment. gRPC's client-side load balancer can use health checking to automatically remove unhealthy backends from rotation without waiting for connection failures.

Name Resolution

gRPC abstracts name resolution behind a pluggable resolver interface. When you create a channel with a target like dns:///my-service.example.com, the DNS resolver resolves the hostname to IP addresses and creates a subchannel for each one. But the resolver framework supports many schemes:

dns: — standard DNS resolution (A/AAAA records, or SRV records for port information)
xds: — resolve via xDS control plane
unix: — Unix domain socket (for local IPC)
ipv4: / ipv6: — static IP addresses
Custom schemes for service registries like Consul, etcd, or ZooKeeper

The resolver continuously watches for changes — if a DNS entry is updated, the channel automatically creates new subchannels and drains old ones. This is critical for environments like Kubernetes where pod IPs change frequently during deployments.

gRPC Status Codes

gRPC defines its own set of status codes, separate from HTTP status codes. While the HTTP/2 response always carries a 200 OK (even for errors), the actual gRPC status is in the trailers. This is because the error might not be known until the server has finished processing the stream. The key status codes are:

OK (0) — success
CANCELLED (1) — the operation was cancelled by the caller
INVALID_ARGUMENT (3) — client sent an invalid request
DEADLINE_EXCEEDED (4) — the deadline expired before the operation completed
NOT_FOUND (5) — requested entity does not exist
ALREADY_EXISTS (6) — entity already exists (e.g., duplicate creation)
PERMISSION_DENIED (7) — caller lacks permission
RESOURCE_EXHAUSTED (8) — rate limit or quota exceeded
UNIMPLEMENTED (12) — method not implemented by the server
INTERNAL (13) — internal server error
UNAVAILABLE (14) — server is temporarily unavailable (safe to retry)
UNAUTHENTICATED (16) — missing or invalid authentication credentials

The distinction between UNAVAILABLE (transient, retry) and INTERNAL (possibly permanent) lets clients implement intelligent retry policies. gRPC's retry framework can be configured to automatically retry UNAVAILABLE errors with exponential backoff, while treating INTERNAL as a non-retryable failure.

gRPC vs REST/JSON: A Detailed Comparison

The choice between gRPC and REST is not about one being universally better — it is about understanding the tradeoffs for your specific use case.

Performance: gRPC's binary serialization and HTTP/2 transport typically achieve 2-10x lower latency and 5-10x higher throughput than equivalent REST/JSON APIs, depending on message size and structure. For internal service-to-service calls measured in microseconds, this difference matters significantly.

Type safety: The .proto schema generates strongly-typed client and server code. Misspelling a field name, sending the wrong type, or forgetting a required field are caught at compile time, not at runtime. REST APIs with JSON typically rely on runtime validation.

Browser compatibility: This is gRPC's primary weakness. Browsers cannot directly make HTTP/2 requests with the fine-grained control gRPC needs (trailer support, binary framing). The gRPC-Web project addresses this by proxying gRPC through a translation layer, but it only supports unary and server-streaming RPCs, not client or bidirectional streaming. For browser-facing APIs, REST/JSON remains more practical. The emerging QUIC and HTTP/3 standards may eventually improve this situation.

Tooling and debugging: JSON is human-readable; protobuf is not. Debugging gRPC requires tools like grpcurl (a curl-like CLI for gRPC), grpcui (a web UI), or Wireshark with protobuf dissectors. REST APIs can be tested with a browser or curl with no special tooling.

gRPC and the Network Stack

gRPC sits at the top of a deep network stack. Understanding each layer helps when debugging latency or connectivity issues:

At the bottom, IP routing (powered by BGP for inter-network traffic) determines how packets traverse autonomous systems to reach the destination. TCP provides reliable, ordered delivery. TLS encrypts the connection (gRPC strongly recommends TLS in production, and most implementations default to it). HTTP/2 provides multiplexing and flow control. Protocol Buffers handle serialization. And gRPC ties it all together with its RPC semantics, deadline propagation, and interceptor chain.

Security: Authentication and Encryption

gRPC supports several authentication mechanisms:

TLS / mTLS — transport-layer encryption, optionally with mutual authentication via client certificates. This is the most common approach for service-to-service communication in production. In mTLS, both client and server present certificates, enabling zero-trust architectures where every service verifies the identity of every other service.
Token-based — bearer tokens (JWT, OAuth2) passed as metadata. The CallCredentials interface provides a hook to attach tokens to every outgoing RPC, refreshing them as needed.
Google ALTS — Application Layer Transport Security, used within Google's infrastructure as an alternative to TLS optimized for their network.
Custom — the credential framework is pluggable, so organizations can implement proprietary authentication schemes.

gRPC's security model integrates cleanly with service meshes. When running behind an Envoy sidecar or within an Istio mesh, mTLS can be handled transparently by the mesh infrastructure, and the application-level gRPC code does not need to manage certificates at all.

gRPC in Practice: When to Use It

gRPC excels in specific scenarios:

Microservice-to-microservice communication — where latency and throughput matter, and both sides are controlled by the same organization. The strict schema contract prevents integration bugs.
Real-time streaming — server streaming for live data feeds, bidirectional streaming for interactive protocols. gRPC's streaming is more natural than bolting WebSockets onto a REST API.
Polyglot environments — when services are written in different languages (Go, Java, Python, C++), the shared .proto definition generates idiomatic code for each language, ensuring compatibility.
Mobile clients — gRPC's binary format reduces bandwidth usage on cellular networks, and its connection multiplexing reduces battery drain from repeated connection setup.
High-fan-out architectures — where one service calls many others to serve a single request, deadline propagation prevents wasted work after the original caller has timed out.

REST remains a better choice for public-facing APIs consumed by browsers, APIs where human readability of requests and responses is important, CRUD-style APIs that map naturally to HTTP verbs and resource URLs, and situations where HTTP caching is needed.

Many systems use both: gRPC for internal communication and a REST/JSON gateway (like grpc-gateway) at the edge to expose the same services as REST APIs to external consumers.

Reflection and Service Discovery

gRPC servers can optionally enable server reflection, which allows clients to discover available services and their methods at runtime without having the .proto files. This is primarily used for debugging and tooling — tools like grpcurl and grpcui use reflection to inspect a server and make ad-hoc calls. In production, reflection is often disabled for security (it reveals your API surface) but enabled in staging and development environments.

gRPC and Infrastructure

The network infrastructure that carries gRPC traffic is the same infrastructure that routes all internet traffic. gRPC calls between services in different data centers traverse the same BGP-routed paths, cross the same internet exchange points, and are subject to the same routing decisions made by autonomous systems. The latency of a gRPC call between services in different cloud regions is dominated by the physical distance and routing path, not by the protocol overhead — which is exactly why gRPC optimizes the protocol layer so aggressively, to keep that overhead as close to zero as possible.

You can explore the network paths between cloud regions by looking up their IP addresses in the looking glass:

AS16509 — Amazon Web Services
AS15169 — Google Cloud
AS8075 — Microsoft Azure