How xDS Works: The Service Mesh Control Plane Protocol

The xDS protocol is the configuration and service discovery API that powers modern service meshes and proxyless gRPC deployments. Originally designed as Envoy proxy's dynamic configuration mechanism, xDS has evolved into a universal control plane protocol — a way for infrastructure to tell data plane components (proxies, gRPC clients, load balancers) how to route traffic, where to find backends, what TLS certificates to use, and how to enforce policy. If BGP is the protocol that distributes reachability information across the internet's routing infrastructure, xDS is the protocol that distributes reachability and routing policy across a service mesh's internal infrastructure. Both are control plane protocols that separate the decision of where traffic should go from the act of forwarding it.

This article covers the xDS transport protocol, each of the discovery service resource types (EDS, CDS, RDS, LDS, SDS), the Aggregated Discovery Service (ADS), incremental xDS via DeltaDiscovery, control plane implementations, and the relationship between the control plane and data plane in production deployments.

Control Plane vs. Data Plane

The separation between control plane and data plane is the foundational architectural principle behind xDS. The data plane is everything that touches actual user traffic: the Envoy sidecar proxies intercepting requests, the load balancers distributing connections, the gRPC client libraries picking backends. The control plane is the system that tells the data plane what to do: which backends exist, how to route requests, what certificates to present, what retry policies to apply.

This is directly analogous to how internet routing works. In BGP, routers (the data plane) forward packets based on their forwarding information base (FIB). The BGP protocol (the control plane) populates that FIB by exchanging route announcements between autonomous systems. A router does not discover routes by probing — it receives them from its BGP peers. Similarly, an Envoy proxy does not discover endpoints by scanning the network — it receives them from its xDS control plane. Both protocols solve the same fundamental problem: distributing routing state to forwarding elements so they can make local forwarding decisions without global knowledge.

Control Plane / Data Plane Separation: BGP vs xDS Internet Routing (BGP) Control Plane BGP Speaker BGP Speaker Data Plane Router A Router B Router C Router D FIB updates Service Mesh (xDS) Control Plane xDS Server (istiod) Data Plane Envoy A Envoy B gRPC C Envoy N xDS streams Both architectures: control plane distributes routing state; data plane forwards traffic

The key difference is scope and dynamics. BGP propagates relatively stable routing information across autonomous systems on a timescale of seconds to minutes, converging after events like link failures or route leaks. xDS propagates rapidly changing service topology — pods starting and stopping, deployments rolling out, canary weights shifting — on a timescale of milliseconds to seconds. But the architectural principle is identical: decouple routing intelligence from packet forwarding.

The xDS Transport Protocol

xDS uses gRPC bidirectional streaming as its transport. The data plane client (Envoy, a gRPC client library, or any xDS-aware software) opens a long-lived gRPC stream to the control plane server. On this stream, the client sends DiscoveryRequest messages and receives DiscoveryResponse messages. This is not a simple request-response pattern — the server can push updates at any time, and the client ACKs or NACKs each update.

The DiscoveryRequest / DiscoveryResponse Flow

A DiscoveryRequest contains:

A DiscoveryResponse contains:

The ACK/NACK mechanism is critical. When a client receives a DiscoveryResponse, it attempts to apply the configuration. If successful, it sends a new DiscoveryRequest with the version_info from the response and the same response_nonce — this is an ACK. If the configuration is invalid (malformed routes, unknown filters, TLS certificate parse failure), the client sends a DiscoveryRequest with the previous version_info (the last known good version), the response's nonce, and an error_detail explaining the failure — this is a NACK. The client continues operating with its last known good configuration until the control plane sends a corrected update.

This is conceptually similar to how BGP handles invalid route announcements. A BGP speaker that receives a malformed UPDATE message does not tear down its entire routing table — it sends a NOTIFICATION or treats the route as withdrawn and continues forwarding based on previously accepted routes. Both protocols are designed for safe, incremental configuration convergence.

xDS Resource Types

The "x" in xDS is a wildcard. Each resource type corresponds to a different aspect of proxy configuration. Together, they form a complete description of how traffic should flow through the system.

LDS — Listener Discovery Service

LDS configures the listeners — the network sockets on which the proxy accepts connections. A Listener resource defines:

In a service mesh, each Envoy sidecar typically has two classes of listeners: an inbound listener that receives traffic destined for the local application, and multiple outbound listeners (or a single outbound listener with complex filter chains) that capture traffic the application sends to other services. The control plane generates these listener configurations based on the services deployed in the cluster.

The HTTP connection manager filter within a listener is where routing configuration lives. It can either embed route configuration directly (inline routes) or reference an RDS resource by name, allowing routes to be updated independently of the listener.

RDS — Route Discovery Service

RDS configures route tables — the rules that map incoming requests to upstream clusters. A RouteConfiguration resource contains a set of virtual hosts, each matching on a domain name (the HTTP Host header or gRPC authority). Within each virtual host, routes match on path prefixes, exact paths, regex patterns, or header values, and direct traffic to a named cluster.

Route configuration is where traffic management policy lives:

Separating RDS from LDS is a critical design choice. Listeners change rarely (you rarely add new ports), but routes change constantly as deployments roll out and traffic policies shift. By splitting them into separate xDS resource types, the control plane can push route updates without triggering listener re-creation, which would cause connection drops.

CDS — Cluster Discovery Service

CDS configures clusters — logical groups of upstream hosts that the proxy can route traffic to. A Cluster resource defines:

EDS — Endpoint Discovery Service

EDS configures endpoints — the actual IP addresses and ports of individual service instances within a cluster. A ClusterLoadAssignment resource (the EDS resource type) contains:

EDS is the most dynamic of the xDS resource types. In a Kubernetes cluster, pods start and stop constantly — rolling deployments, autoscaling, node failures, spot instance reclamation. Every pod lifecycle event generates an EDS update. This is where the comparison with BGP is most direct: EDS updates are the xDS equivalent of BGP UPDATE messages announcing or withdrawing routes. When a new pod starts, it is like a new prefix being announced. When a pod terminates, it is like a prefix being withdrawn.

Locality-aware load balancing in EDS mirrors BGP's preference for locally originated routes. Just as BGP routers prefer routes with shorter AS paths or higher local preference (keeping traffic as local as possible), Envoy prefers endpoints in the same zone or region before falling back to more distant ones. Both systems optimize for locality to reduce latency and cross-zone data transfer costs.

SDS — Secret Discovery Service

SDS delivers TLS certificates and keys to the data plane. Instead of storing certificates on disk and restarting the proxy when they rotate, SDS allows the control plane to push certificates over the same gRPC channel used for other xDS resources. A Secret resource contains:

In Istio, the control plane (istiod) acts as a certificate authority. It issues short-lived SPIFFE identity certificates to each workload via SDS. These certificates are rotated automatically, typically every 24 hours, without any proxy restart. The SDS stream delivers the new certificate, the proxy hot-swaps it, and connections continue without interruption.

SDS is a security-critical component. The gRPC channel between the data plane and the control plane must itself be authenticated and encrypted — otherwise an attacker could intercept certificate delivery. In Istio, the initial bootstrap uses a Kubernetes-issued service account token to authenticate the first SDS request, after which the proxy uses its issued certificate for subsequent connections.

Resource Ordering and Dependencies

The xDS resource types are not independent. They form a dependency graph that the control plane and client must respect:

LDS (Listener)
 └── references RDS (RouteConfiguration) by name
      └── routes point to CDS (Cluster) by name
           └── EDS-type clusters reference EDS (ClusterLoadAssignment)
                └── endpoints may use SDS (Secret) for upstream TLS
LDS/CDS also reference SDS for downstream/upstream TLS contexts

This ordering matters during startup and during updates. If the client receives a route pointing to cluster service-b but has not yet received the CDS resource for service-b, it cannot route traffic there. The xDS specification defines a warming mechanism: a new listener or cluster is held in a "warming" state until all its dependencies are resolved. Only after the dependent RDS, CDS, and EDS resources have been received does the listener become active.

Without ADS (discussed below), each resource type is delivered on a separate gRPC stream, which creates race conditions. The client might receive a new route configuration before the clusters it references exist. Warming handles this gracefully, but it introduces latency during configuration updates. This is why most production deployments use ADS.

ADS — Aggregated Discovery Service

ADS multiplexes all xDS resource types onto a single gRPC stream. Instead of the client maintaining separate streams for LDS, RDS, CDS, EDS, and SDS, it opens one ADS stream and sends typed DiscoveryRequest messages for all resource types on that stream.

The primary benefit is ordering guarantees. On a single stream, the control plane can ensure that CDS resources arrive before the EDS resources that reference them, and that RDS resources reference only clusters that have already been delivered. This eliminates the configuration inconsistency window that exists with per-type streams.

ADS also simplifies operational concerns:

In practice, ADS is the default transport for production xDS deployments. Istio uses ADS exclusively. The separate per-type streams exist mainly for backward compatibility and edge cases where a component only needs a single resource type.

Incremental xDS (DeltaDiscovery)

The original xDS protocol (sometimes called "State of the World" or SotW) sends the complete set of resources in every DiscoveryResponse. If a cluster has 10,000 endpoints and one endpoint changes, the control plane sends all 10,000 endpoints again. This is simple to implement and reason about — the client replaces its entire configuration for that resource type — but it is inefficient at scale.

Incremental xDS (Delta xDS) uses DeltaDiscoveryRequest and DeltaDiscoveryResponse messages that express changes relative to the current state. A DeltaDiscoveryResponse contains:

A DeltaDiscoveryRequest contains:

The performance implications are significant. Consider a production mesh with 5,000 service instances across 200 services. With SotW xDS, every endpoint change in a cluster with 500 instances means sending all 500 endpoints. With Delta xDS, only the changed endpoint is sent. At scale, this reduces control plane CPU (serialization), network bandwidth, and client-side processing (deserialization and diffing) by orders of magnitude.

Delta xDS also enables on-demand resource loading. In SotW mode, the client must subscribe to all resources of a type upfront. With Delta, the client can subscribe to specific resources as needed. An Envoy proxy that only routes to 10 out of 200 services can subscribe to just those 10 clusters and their endpoints, dramatically reducing memory usage and update volume. This is the xDS equivalent of BGP's ORF (Outbound Route Filtering), where a BGP peer tells its neighbor "only send me routes matching this filter" to reduce unnecessary route processing.

State-of-the-World vs. Incremental (Delta) xDS SotW (Full Snapshot) Initial push: all 5 endpoints ep1 ep2 ep3 ep4 ep5 ep3 changes: resend ALL 5 endpoints ep1 ep2 ep3 ep4 ep5 ep2 removed: resend remaining 4 ep1 ep3 ep4 ep5 14 endpoint objects sent total Delta (Incremental) Initial push: all 5 endpoints ep1 ep2 ep3 ep4 ep5 ep3 changes: send only ep3 ep3* ep2 removed: send removal -ep2 7 endpoint objects sent total At production scale (thousands of endpoints), Delta reduces xDS traffic by 90%+

The xDS Bootstrap

Before a data plane client can connect to the control plane, it needs a bootstrap configuration that tells it where the xDS server is and how to authenticate. For Envoy, this is typically a YAML or JSON file mounted into the pod. For proxyless gRPC clients, it is a JSON bootstrap file pointed to by the GRPC_XDS_BOOTSTRAP environment variable.

A minimal Envoy bootstrap specifies:

node:
  id: "sidecar~10.0.0.5~my-app-pod~default.svc.cluster.local"
  cluster: "my-app"
  locality:
    region: "us-east-1"
    zone: "us-east-1a"

dynamic_resources:
  ads_config:
    api_type: GRPC
    transport_api_version: V3
    grpc_services:
    - envoy_grpc:
        cluster_name: xds_cluster

static_resources:
  clusters:
  - name: xds_cluster
    type: STRICT_DNS
    load_assignment:
      cluster_name: xds_cluster
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: istiod.istio-system.svc
                port_value: 15010

The node section is particularly important. The node ID encodes the proxy type, IP address, pod name, and namespace. The control plane parses this to determine which configuration to send — a sidecar for service A receives different listeners, routes, and clusters than a sidecar for service B. This is analogous to how BGP peers negotiate capabilities and exchange routes based on their neighbor relationship — each peer gets a view of the routing table tailored to its role and location.

Control Plane Implementations

Several production-grade control planes implement the xDS API. Each makes different tradeoffs around complexity, features, and operational model.

Istio (istiod)

Istio is the most feature-rich service mesh, and istiod is its monolithic control plane. It watches Kubernetes API resources (Services, Endpoints, Pods, plus Istio-specific CRDs like VirtualService, DestinationRule, Gateway) and translates them into xDS configuration for every Envoy sidecar in the mesh.

Istiod combines several functions:

Istio's architecture has evolved significantly. Early versions (pre-1.5) split the control plane into separate Pilot (xDS server), Citadel (CA), and Galley (config validation) processes. The consolidation into a single istiod binary dramatically simplified operations and reduced inter-component communication overhead. For a deeper look at how Istio manages gRPC traffic in a service mesh, see the dedicated article.

Consul Connect

HashiCorp Consul Connect takes a different architectural approach. Instead of Kubernetes-native CRDs, Consul uses its own service catalog and key-value store as the source of truth. Consul agents run on each node, and the Consul server cluster generates xDS configuration for Envoy sidecar proxies.

Consul Connect's advantages:

Consul generates xDS configuration through its consul connect envoy command, which bootstraps an Envoy sidecar with the correct xDS control plane settings. The xDS server is built into the Consul agent, so there is no separate control plane deployment to manage.

Custom Control Planes

The xDS protocol is an open specification, and building a custom control plane is a viable option for organizations with specific requirements that do not map cleanly to Istio or Consul. The go-control-plane library (github.com/envoyproxy/go-control-plane) provides a reference implementation of the xDS server in Go, including the ADS server, resource caching, and snapshot-based configuration management.

A minimal custom control plane:

// Create a snapshot cache
cache := cachev3.NewSnapshotCache(true, cachev3.IDHash{}, logger)

// Build a configuration snapshot
snap, _ := cachev3.NewSnapshot("v1",
    map[resource.Type][]types.Resource{
        resource.ClusterType:  {makeCluster("my-service")},
        resource.EndpointType: {makeEndpoint("my-service", "10.0.0.5", 8080)},
        resource.ListenerType: {makeListener("my-listener")},
        resource.RouteType:    {makeRoute("my-route", "my-service")},
    },
)

// Push snapshot to all nodes matching this ID
cache.SetSnapshot(ctx, "node-id", snap)

// Start the xDS gRPC server
server := serverv3.NewServer(ctx, cache, nil)
grpcServer := grpc.NewServer()
discoverygrpc.RegisterAggregatedDiscoveryServiceServer(grpcServer, server)
lis, _ := net.Listen("tcp", ":18000")
grpcServer.Serve(lis)

Custom control planes are common in organizations that need to integrate xDS with their existing service discovery infrastructure (ZooKeeper, etcd, internal systems) or that need control plane behavior that Istio does not support. Companies like Lyft, Pinterest, and Stripe have built custom xDS control planes tailored to their infrastructure.

Other notable implementations include:

xDS in Proxyless gRPC

One of the most significant developments in the xDS ecosystem is proxyless gRPC — where the gRPC client library itself acts as an xDS client, eliminating the sidecar proxy entirely. Instead of routing traffic through a local Envoy, the gRPC client connects directly to the xDS control plane, receives configuration, and applies it within the client process.

Proxyless gRPC supports a subset of xDS features:

The gRPC client uses the xds:/// resolver scheme. When a client dials xds:///my-service, the gRPC library reads the bootstrap file, connects to the xDS control plane, subscribes to the relevant LDS and RDS resources, discovers the target cluster, subscribes to CDS and EDS for that cluster, and uses the endpoint list for load balancing — all within the client process, with zero network hops through a proxy.

The performance benefit is substantial. A sidecar proxy adds two extra hops to every RPC (client to local sidecar, remote sidecar to server). For latency-sensitive applications, eliminating these hops can reduce p99 latency by hundreds of microseconds. The tradeoff is reduced feature coverage (not all Envoy filters are available in proxyless mode) and language-specific support (Go, Java, C++, and Node.js have the most mature xDS integration in their gRPC libraries).

Practical Considerations

Control Plane Scalability

The control plane is a potential bottleneck. Every Envoy sidecar maintains a persistent gRPC stream to the control plane, and every service topology change (pod start, pod stop, config update) generates xDS pushes to potentially thousands of clients. At scale, this requires careful engineering:

Failure Modes

What happens when the control plane goes down? This is where xDS's design shows maturity. Data plane clients cache their last known good configuration and continue operating independently. If the control plane is unreachable, Envoy keeps routing traffic based on its current configuration. New configuration updates will not be received, but existing traffic flows are unaffected.

This "last known good" behavior is directly analogous to BGP. When a BGP session drops, the router does not immediately withdraw all routes learned from that peer — it continues forwarding based on its existing RIB (Routing Information Base) for the graceful restart hold time. Similarly, Envoy continues forwarding based on its existing xDS configuration indefinitely when the control plane is unavailable. The system degrades gracefully: it loses the ability to adapt to changes, but it does not lose the ability to function.

Other failure modes include:

Debugging xDS

Envoy exposes its current xDS configuration through the admin interface (usually on port 15000 for Istio sidecars). Key endpoints:

Istio provides istioctl proxy-status to show the sync state between istiod and every sidecar (SYNCED, NOT SENT, or STALE), and istioctl proxy-config to inspect individual sidecar configuration by resource type. When an xDS configuration is not applying correctly, start with these tools before diving into raw Envoy admin dumps.

xDS and Container Networking

In a Kubernetes-based service mesh, xDS interacts closely with the container networking layer. Traffic interception (redirecting application traffic through the sidecar) is handled by iptables rules or eBPF programs in the pod's network namespace. The xDS configuration tells the sidecar what to do with the traffic, but the interception mechanism determines which traffic reaches the sidecar in the first place.

Misconfigured interception is a common source of mesh issues. If iptables rules are not set up correctly, traffic may bypass the sidecar entirely, or the sidecar may intercept traffic it should not (e.g., traffic to the control plane itself, causing bootstrap deadlocks). Understanding the interplay between network namespace configuration and xDS configuration is essential for troubleshooting mesh deployments.

The Future of xDS

The xDS protocol continues to evolve through the CNCF's xDS Working Group. Several developments are shaping its trajectory:

The arc of xDS's evolution mirrors a broader trend in infrastructure: the separation of policy (what should happen) from mechanism (how it happens). Just as BGP separates routing policy from packet forwarding, and just as Kubernetes separates desired state from reconciliation, xDS separates traffic management policy from traffic forwarding. The control plane declares intent; the data plane executes it. This pattern — simple in principle, endlessly complex in implementation — is the foundation of every scalable distributed system.

See BGP routing data in real time

Open Looking Glass
More Articles
How DOCSIS Works: Cable Internet Technology Explained
How DSL Works: Internet Over Telephone Lines
How Submarine Cables Work: The Physical Internet
How Rate Limiting Works
How Fiber to the Home (FTTH) Works
How WiFi Works: 802.11 from Radio to Router