gRPC and Service Mesh: Istio, Envoy, and Linkerd

Microservices architectures decompose monolithic applications into dozens or hundreds of independently deployable services. This decomposition solves organizational scaling problems, but it creates a new class of infrastructure challenges: how do you secure communication between all these services? How do you observe the latency and error rates of every inter-service call? How do you roll out a new version without taking down production? A service mesh answers all of these questions by moving cross-cutting networking concerns out of application code and into a dedicated infrastructure layer. When the dominant RPC protocol in your mesh is gRPC, the interplay between HTTP/2 framing, Protocol Buffers, and mesh-level traffic management becomes particularly interesting.

The Problems a Service Mesh Solves

Before service meshes, every microservice had to implement its own retry logic, circuit breakers, mutual TLS, distributed tracing headers, and load balancing. This meant that every team, writing in potentially different languages, had to get all of these cross-cutting concerns right. Libraries like Netflix's Hystrix and Finagle from Twitter attempted to standardize this, but they were language-specific and required application-level integration. A service mesh solves this by extracting these concerns into the infrastructure itself.

The three pillars of a service mesh are:

The Sidecar Proxy Model

The defining architectural pattern of a service mesh is the sidecar proxy. For every application container in your cluster, the mesh injects a small proxy that intercepts all inbound and outbound network traffic. The application connects to localhost and the proxy handles everything else: TLS termination and origination, routing decisions, retries, and telemetry collection. The application never needs to know the mesh exists.

Pod A App Envoy Sidecar localhost Pod B App Envoy Sidecar mTLS (H2)

Traffic interception is typically handled by iptables rules (or eBPF programs in newer meshes) that redirect all TCP traffic to the sidecar's ports. On Kubernetes, this happens automatically via an init container that configures the network namespace before the application starts. The application sends a gRPC call to what it thinks is the remote service, but the kernel redirects that connection to the local Envoy sidecar. Envoy then establishes an mTLS connection to the remote pod's sidecar, which terminates TLS and forwards the plaintext request to the destination application on localhost.

Envoy: The Universal Data Plane

Envoy is the proxy that powers most service meshes. Originally built at Lyft and donated to the CNCF, Envoy has become the de facto standard data plane for service mesh deployments. It is a high-performance, C++-based proxy designed for cloud-native environments, with first-class support for HTTP/2 and gRPC.

Envoy's architecture is built around a few core concepts:

What makes Envoy particularly powerful for service meshes is its xDS API — a set of gRPC-based discovery services that allow a control plane to dynamically push configuration to every Envoy instance. The "x" in xDS stands for "anything": there are separate discovery protocols for listeners (LDS), routes (RDS), clusters (CDS), endpoints (EDS), and secrets/certificates (SDS). This means the entire mesh configuration can change at runtime without restarting any proxy.

For gRPC traffic specifically, Envoy understands the HTTP/2 framing and can make routing decisions based on the gRPC service name and method. A request to /mypackage.MyService/MyMethod is a first-class routing primitive, enabling per-method traffic policies, rate limits, and retry configurations.

Istio Architecture

Istio is the most widely deployed service mesh. Its architecture is divided into a control plane and a data plane. The data plane consists of all the Envoy sidecar proxies running alongside application workloads. The control plane is istiod — a single binary that consolidates what used to be three separate components (Pilot, Citadel, and Galley) into one process.

istiod (Control Plane) Pilot xDS server Citadel CA / certs Galley config Control Plane xDS xDS xDS Data Plane Pod: svc-orders App Envoy Sidecar Pod: svc-payments App Envoy Sidecar Pod: svc-inventory App Envoy Sidecar mTLS between all sidecars

Istiod performs three critical functions:

When you apply a VirtualService in Istio, the flow is: kubectl apply writes to the Kubernetes API → istiod watches and picks up the change → istiod computes new Envoy configuration → istiod pushes xDS updates to all relevant Envoy sidecars → Envoy hot-reloads its routing rules without dropping connections. The entire process typically completes in under a second.

Linkerd and the Rust Micro-Proxy

Linkerd takes a fundamentally different approach from Istio. Rather than using Envoy (a general-purpose proxy with a vast feature set), Linkerd built its own purpose-built proxy called Linkerd2-proxy, written entirely in Rust. This is one of the largest production Rust networking projects in the CNCF ecosystem.

The design philosophy behind Linkerd2-proxy is minimalism and operational simplicity. Where Envoy is a Swiss Army knife that can be configured to do almost anything, Linkerd2-proxy does exactly what a service mesh data plane needs and nothing more. This translates to concrete operational benefits:

Linkerd's control plane runs on Kubernetes and consists of a destination controller (service discovery and policy), an identity controller (mTLS certificate issuance), and a proxy injector (automatic sidecar injection via a Kubernetes mutating webhook). Like Istio, Linkerd provides automatic mTLS, but it enables it by default with zero configuration — every meshed pod gets encrypted communication immediately upon injection.

Mutual TLS Between Services

One of the most impactful features of a service mesh is mutual TLS (mTLS) — encrypted, authenticated communication between every pair of services. Without a mesh, achieving mTLS requires each application to manage certificates, trust stores, and rotation schedules. A mesh automates all of this.

The mTLS flow in a service mesh works as follows:

  1. The mesh control plane operates a certificate authority (CA). In Istio, this is Citadel (inside istiod); in Linkerd, it is the identity controller.
  2. When a workload starts, its sidecar proxy generates a private key and a Certificate Signing Request (CSR). It sends the CSR to the control plane CA over a secure channel, proving its identity via the Kubernetes service account token.
  3. The CA validates the request, issues a short-lived X.509 certificate (typically valid for 24 hours), and returns it to the sidecar.
  4. The sidecar uses this certificate for all outbound connections (client certificate) and inbound connections (server certificate).
  5. Before the certificate expires, the sidecar automatically requests a new one. No application restart is needed.

Both Istio and Linkerd use SPIFFE (Secure Production Identity Framework for Everyone) identities encoded in the certificates. A SPIFFE ID looks like spiffe://cluster.local/ns/payments/sa/payments-api and uniquely identifies a workload based on its namespace and service account rather than its IP address. This is critical because in Kubernetes, pod IPs are ephemeral and cannot be trusted as stable identifiers.

The mesh can then enforce authorization policies based on these cryptographic identities. For example, you can write a policy that says "only the checkout service in the production namespace can call the payments service's ChargeCard gRPC method." This is far stronger than network-level policies based on IP CIDRs.

Traffic Splitting and Canary Deployments

Service meshes give operators fine-grained control over how traffic is distributed across different versions of a service. This is the foundation of canary deployments, blue-green deployments, and A/B testing.

In Istio, traffic splitting is configured via VirtualService resources. Here is what a canary deployment looks like for a gRPC service:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: payments-service
spec:
  hosts:
  - payments.production.svc.cluster.local
  http:
  - route:
    - destination:
        host: payments.production.svc.cluster.local
        subset: stable
      weight: 95
    - destination:
        host: payments.production.svc.cluster.local
        subset: canary
      weight: 5

This routes 95% of traffic to the stable version and 5% to the canary. The subsets are defined in a DestinationRule that maps subset names to Kubernetes label selectors (e.g., version: v2). You can gradually increase the canary weight as you gain confidence, and roll back instantly by setting it back to 0%.

For gRPC specifically, the mesh operates at the HTTP/2 level, meaning traffic splitting happens per-request, not per-connection. This is important because gRPC uses long-lived connections with multiplexed streams. A naive connection-level split would route all requests on a given connection to the same backend. Envoy and Linkerd2-proxy both perform request-level balancing within existing HTTP/2 connections, which is essential for correct gRPC load distribution.

More advanced traffic routing can match on gRPC metadata (headers), specific methods, or even request content. You could route all requests from internal test clients to the canary while keeping production users on stable, by matching on a custom gRPC metadata key.

Circuit Breaking

Circuit breaking prevents cascading failures in a microservices system. When a downstream service is struggling, continuing to send it traffic makes the situation worse — the failing service uses resources on requests it will never complete, and callers waste time waiting for responses that will never arrive. A circuit breaker detects this condition and fails fast.

Envoy implements circuit breaking with several configurable thresholds per upstream cluster:

Envoy also supports outlier detection, which is a form of adaptive circuit breaking. If a specific endpoint (pod) returns too many errors (e.g., 5 consecutive 5xx responses or gRPC status codes indicating failure), Envoy ejects that endpoint from the load balancing pool for a configurable duration. The endpoint is reintroduced after a backoff period, and if it continues to fail, the ejection duration increases exponentially.

In Istio, circuit breaking is configured via DestinationRule:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: payments-circuit-breaker
spec:
  host: payments.production.svc.cluster.local
  trafficPolicy:
    connectionPool:
      http:
        h2UpgradePolicy: UPGRADE
        maxRequestsPerConnection: 1000
        http2MaxRequests: 500
    outlierDetection:
      consecutive5xxErrors: 3
      interval: 10s
      baseEjectionTime: 30s
      maxEjectionPercent: 50

Retry Budgets

Retries are essential for resilience — transient network failures and brief service disruptions are normal in distributed systems. But naive retry policies can cause retry storms: if a downstream service is slow, every caller retries, tripling or quadrupling the load on a system that is already struggling. This positive feedback loop turns partial failures into total outages.

Service meshes solve this with retry budgets. Instead of configuring a fixed number of retries per request, you define a budget as a percentage of the total request rate. Linkerd pioneered this approach: by default, Linkerd allows retries to add at most 20% additional load. If your service normally handles 100 requests per second, Linkerd will allow at most 20 retries per second — regardless of how many individual requests are failing. Once the budget is exhausted, additional failed requests return errors immediately rather than retrying.

Istio/Envoy supports retries with configurable limits per route:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: payments
spec:
  hosts:
  - payments.production.svc.cluster.local
  http:
  - retries:
      attempts: 3
      perTryTimeout: 2s
      retryOn: unavailable,resource-exhausted,internal
    route:
    - destination:
        host: payments.production.svc.cluster.local

The retryOn field specifies which gRPC status codes trigger a retry. This is important — you should only retry idempotent operations, and you should not retry client errors like INVALID_ARGUMENT or PERMISSION_DENIED because they will always fail. The correct set typically includes UNAVAILABLE (the service is temporarily overloaded), RESOURCE_EXHAUSTED (rate limited), and sometimes INTERNAL for transient server errors.

Distributed Tracing with gRPC and OpenTelemetry

When a single user request fans out to ten different microservices, understanding where latency is introduced requires distributed tracing. A trace is a tree of spans, where each span represents a unit of work (typically a single RPC call). The root span is the initial request from the user, and child spans are the downstream calls made while processing that request.

0ms 200ms api-gateway /v1/checkout 200ms svc-orders CreateOrder 145ms svc-inventory Reserve 55ms svc-payments Charge 62ms db SELECT 12ms stripe-gw Auth 45ms svc-notify 30ms Trace ID: 4bf92f3577b34da6a3ce929d0e0e4736 Context propagated via gRPC metadata (traceparent header)

The service mesh plays two roles in distributed tracing:

OpenTelemetry has become the standard framework for instrumentation. It provides SDKs for every major language that handle context propagation automatically and let you add custom spans and attributes. Envoy natively exports spans in OpenTelemetry, Zipkin, and Jaeger formats. A typical production setup sends spans from both the application (via OpenTelemetry SDK) and the mesh sidecars to an OpenTelemetry Collector, which then forwards them to a backend like Jaeger, Tempo, or Honeycomb.

For gRPC, OpenTelemetry provides dedicated interceptors (middleware) that automatically create spans for each RPC, record the gRPC method, status code, and message sizes, and propagate trace context through gRPC metadata. Combined with the mesh's automatic sidecar spans, this gives you a complete picture of every request's journey through the system.

Ambient Mesh and Istio's ztunnel

The sidecar model works, but it has real costs. Every pod in the mesh runs an additional container that consumes CPU, memory, and adds latency. At scale — thousands of pods — the aggregate resource overhead is significant. Sidecar injection also introduces operational complexity: upgrades require restarting every pod, and the interaction between sidecar lifecycle and application lifecycle can cause subtle issues.

Istio's ambient mesh mode (introduced as beta in Istio 1.22) offers an alternative. Instead of injecting a sidecar into every pod, ambient mesh uses two layers:

Secure Overlay (L4) — ztunnel ztunnel Node 1 ztunnel Node 2 ztunnel Node 3 HBONE/mTLS HBONE/mTLS App App App App App App L7 Processing (opt-in) — waypoint proxies Waypoint Proxy ns: payments (Envoy) Waypoint Proxy ns: orders (Envoy) Waypoints handle L7 policy: retries, traffic splitting, authz

This two-tier model means that services which only need mTLS and basic L4 authorization (which is the majority in many deployments) pay zero sidecar overhead. The ztunnel on each node is lightweight and shared across all pods. Services that need advanced L7 features opt in by deploying a waypoint proxy for their namespace.

The ztunnel implementation is noteworthy for being written in Rust using the Tokio async runtime. Like Linkerd2-proxy, it benefits from memory safety and a small binary size. The choice of Rust for both Linkerd's sidecar and Istio's ambient mode node proxy reflects the ecosystem's recognition that data plane proxies need the performance characteristics of a systems language with the safety guarantees that C++ cannot provide.

eBPF-Based Approaches

eBPF (extended Berkeley Packet Filter) represents a fundamentally different approach to service mesh networking. Rather than proxying traffic through a userspace process (whether sidecar or per-node), eBPF programs run inside the Linux kernel itself, making networking decisions at the socket or packet level with minimal overhead.

Cilium, the most prominent eBPF-based networking project, offers service mesh capabilities as part of its CNI (Container Network Interface) plugin. When Cilium is the cluster's CNI, it can provide:

The eBPF approach offers significant performance benefits for L4 operations. Packets that would normally traverse the full Linux networking stack (netfilter, conntrack, routing tables) can be short-circuited by eBPF programs that make forwarding decisions at the socket or TC (traffic control) layer. Benchmarks typically show 10–30% lower latency for L4 operations compared to iptables-based traffic interception.

However, eBPF has limitations. Complex L7 processing (full HTTP/2 parsing, gRPC header inspection, traffic splitting) is difficult or impractical in eBPF's restricted execution environment. The BPF verifier imposes limits on program complexity, stack depth, and loop bounds that make general-purpose protocol parsing challenging. For this reason, Cilium and other eBPF-based systems typically combine kernel-level L4 processing with userspace Envoy proxies for L7 features — a hybrid approach that tries to get the best of both worlds.

gRPC Load Balancing in a Service Mesh

gRPC's use of HTTP/2 creates a specific challenge for load balancing. HTTP/2 multiplexes many concurrent requests (streams) over a single long-lived TCP connection. An L4 load balancer that distributes connections will route all streams on a given connection to the same backend, creating hot spots. This is a well-known problem in gRPC deployments.

Service mesh proxies solve this by performing L7 (request-level) load balancing. The sidecar proxy terminates the incoming HTTP/2 connection from the application and creates separate connections (or reuses pooled connections) to upstream backends. Each individual gRPC request is independently routed to a backend chosen by the load balancing algorithm. Envoy supports several algorithms relevant to gRPC:

Additionally, gRPC supports client-side load balancing through its built-in name resolution and load balancing framework. A gRPC client can receive a list of endpoints from a name resolver (e.g., Kubernetes DNS returning multiple pod IPs via headless services) and distribute requests directly, bypassing the proxy. In practice, most service mesh deployments use proxy-based load balancing because it provides the mesh with visibility and control, but for latency-critical paths, some teams use gRPC's native xDS-based load balancing to talk directly to the mesh control plane, combining the benefits of both approaches.

Choosing Between Istio and Linkerd

Both Istio and Linkerd are production-ready service meshes, but they target different points on the complexity-capability spectrum:

For gRPC-heavy workloads, both meshes handle HTTP/2 and gRPC correctly. Istio provides more granular gRPC-specific configuration options (per-method routing, gRPC-JSON transcoding via Envoy filters), while Linkerd provides automatic per-route metrics and retry budgets with less configuration.

The service mesh landscape also includes Consul Connect (from HashiCorp, using Envoy sidecars with Consul's service discovery), Kuma (from Kong, also Envoy-based), and Traefik Mesh (using per-node proxies). The trend across all of these is convergence on Envoy as the data plane and increasing adoption of eBPF for L4 acceleration.

The Networking Connection

Service meshes operate at the application layer (L7), but they sit on top of the same network infrastructure that this looking glass monitors. Every gRPC call between services ultimately traverses physical or virtual networks, routed by the same BGP protocol that connects autonomous systems on the internet. In a multi-region Kubernetes deployment, inter-cluster mesh traffic travels over the public internet (or private interconnects) and is subject to the same BGP routing, prefix announcements, and path selection that governs all internet traffic. Understanding both layers — the L7 mesh and the L3 network fabric beneath it — gives you a complete picture of how your services communicate.

Explore the network layer that underpins all of this:

See BGP routing data in real time

Open Looking Glass
More Articles
How gRPC Works
How Protocol Buffers Work
How gRPC-Web Works
gRPC Load Balancing: Strategies and Patterns
gRPC Security: Authentication, TLS, and Authorization
gRPC Reflection, Testing, and Debugging