gRPC Observability: Metrics, Tracing, and Logging

Running gRPC services in production without observability is flying blind. gRPC's binary protocol, multiplexed streams, and code-generated stubs make it extraordinarily productive for developers, but those same characteristics make it harder to debug than traditional REST APIs. You cannot simply open browser dev tools and read request bodies. You need structured metrics, distributed traces, and correlated logs to understand what your services are doing under load, why certain calls are slow, and where failures originate across service boundaries. This article covers the three pillars of gRPC observability, the tooling ecosystem built around OpenTelemetry, and the operational patterns that keep gRPC-based architectures debuggable at scale.

If you are new to gRPC fundamentals, start with How gRPC Works. For server- and client-side hook points where observability code is typically installed, see gRPC Interceptors. For how observability fits into broader infrastructure, see gRPC and Service Mesh.

The Three Pillars for gRPC

Observability for any distributed system rests on three complementary signals: metrics, traces, and logs. Each answers a different class of question. Metrics tell you what is happening in aggregate. Traces tell you where time is spent for a specific request. Logs tell you why something happened at a particular moment. For gRPC services, these signals have specific shapes that differ from HTTP/1.1 REST services.

Metrics are numeric aggregates: counters, histograms, gauges. They are cheap to collect and store, and they are always on. For gRPC, key metrics include request counts partitioned by method and status code, latency distributions as histograms, active stream counts, and connection pool sizes. Metrics answer questions like "what is the 99th percentile latency of UserService.GetProfile?" or "how many UNAVAILABLE errors has this service returned in the last hour?"

Traces capture the lifecycle of a single request as it flows through multiple services. Each gRPC call becomes a span, and spans link together into a tree that shows exactly where time was spent. When a client calls Service A, which calls Service B and Service C, the trace shows the full fan-out with timing for each hop. Traces answer "why was this particular request slow?" and "which downstream dependency is the bottleneck?"

Logs are discrete events with rich context. For gRPC, structured logs attached to interceptors capture the method name, request metadata, response status, and optionally serialized request/response payloads. Logs are the most detailed signal but also the most expensive to store. The key to making logs useful at scale is correlating them with trace IDs so you can jump from a trace span to the exact log lines for that request.

OpenTelemetry gRPC Instrumentation

OpenTelemetry (OTel) is the CNCF project that has become the de facto standard for collecting and exporting observability data. It provides language-specific SDKs, a collector for routing telemetry data, and a protocol (OTLP) for transmitting metrics, traces, and logs to backends. gRPC is a first-class citizen in OpenTelemetry: the SDKs include auto-instrumentation for gRPC clients and servers out of the box.

Auto-Instrumentation

Auto-instrumentation means you get metrics and traces for every gRPC call without modifying your application code. You install interceptors (or middleware) provided by the OpenTelemetry SDK, and they automatically create spans, record durations, and emit metrics for each RPC. Here is what auto-instrumentation gives you in Go:

import (
    "go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"
    "google.golang.org/grpc"
)

// Server side
server := grpc.NewServer(
    grpc.StatsHandler(otelgrpc.NewServerHandler()),
)

// Client side
conn, err := grpc.Dial(target,
    grpc.WithStatsHandler(otelgrpc.NewClientHandler()),
)

With those two lines, every server handler and every client call automatically gets a trace span with the gRPC method name, status code, request/response sizes, and duration. Equivalent auto-instrumentation exists in Java (via the opentelemetry-java-instrumentation agent), Python (opentelemetry-instrumentation-grpc), Node.js (@opentelemetry/instrumentation-grpc), and every other language OTel supports.

The auto-instrumented spans include standard semantic conventions. The span name follows the format package.Service/Method. Attributes include rpc.system=grpc, rpc.service, rpc.method, rpc.grpc.status_code, and net.peer.name/net.peer.port. These standardized attribute names mean that dashboards and alerts built for one gRPC service work for any other, regardless of language.

Manual Enrichment

Auto-instrumentation captures the mechanical details, but you often want to add business-specific context. You can enrich spans with custom attributes inside your handler code:

func (s *server) GetProfile(ctx context.Context, req *pb.GetProfileRequest) (*pb.Profile, error) {
    span := trace.SpanFromContext(ctx)
    span.SetAttributes(
        attribute.String("user.id", req.UserId),
        attribute.String("user.tier", "premium"),
    )
    // ... handler logic
}

This gives you the ability to filter traces by business dimensions: "show me all slow requests for premium-tier users" or "find all errors for user ID X across all services."

Key gRPC Metrics

The metrics that matter most for gRPC services break down into five categories. These apply whether you are using OpenTelemetry, Prometheus client libraries, or a service mesh data plane to collect them.

Request Count

A counter of total RPCs, partitioned by method, status code, and type (unary, client-streaming, server-streaming, bidirectional). This is the foundation of error rate calculations and throughput tracking. In OpenTelemetry, this is rpc.server.duration (which doubles as both a count and a latency histogram). In Prometheus with grpc-ecosystem middleware, it is grpc_server_handled_total.

Latency Histogram

A histogram of RPC durations, recording the distribution of response times rather than just an average. Histograms let you compute percentiles (p50, p95, p99) which are far more useful than averages for understanding tail latency. gRPC latency should be measured from when the server receives the request metadata to when the final response is sent, including any streaming messages in between.

Error Rate by Status Code

gRPC has a richer error model than HTTP. Instead of broad categories like 4xx and 5xx, gRPC defines 16 specific status codes: OK, CANCELLED, UNKNOWN, INVALID_ARGUMENT, DEADLINE_EXCEEDED, NOT_FOUND, ALREADY_EXISTS, PERMISSION_DENIED, RESOURCE_EXHAUSTED, FAILED_PRECONDITION, ABORTED, OUT_OF_RANGE, UNIMPLEMENTED, INTERNAL, UNAVAILABLE, and DATA_LOSS. Tracking error counts per status code is essential because they indicate different failure modes: UNAVAILABLE suggests connectivity issues, DEADLINE_EXCEEDED signals timeout problems, RESOURCE_EXHAUSTED means rate limiting or memory pressure, and INTERNAL points to bugs.

Stream Message Counts

For streaming RPCs, you need to know how many messages were sent and received per stream. A server-streaming RPC that returns 10,000 messages behaves very differently from one that returns 10. Tracking rpc.server.request.size and rpc.server.response.size (or the per-message equivalents) helps you understand data volume and detect streams that have grown unexpectedly large.

Connection Count

gRPC uses HTTP/2, which multiplexes many RPCs over a single TCP connection. Tracking the number of active connections, active streams per connection, and the connection churn rate (how often connections are established and torn down) reveals infrastructure-level issues. A sudden drop in connection count might indicate network partitions. A spike in connection churn could mean clients are failing health checks and reconnecting repeatedly.

Prometheus Integration

Prometheus remains the most widely deployed metrics backend for gRPC services, especially in Kubernetes environments. The grpc-ecosystem/go-grpc-prometheus library (and equivalents in other languages) exposes a standardized set of metrics that have become the industry convention.

grpc_server_handled_total

This counter tracks the total number of RPCs completed on the server, labeled by grpc_type (unary, client_stream, server_stream, bidi_stream), grpc_service, grpc_method, and grpc_code (the gRPC status code). It is the single most important gRPC metric. From it, you can derive:

Request rate: rate(grpc_server_handled_total[5m])
Error rate: rate(grpc_server_handled_total{grpc_code!="OK"}[5m])
Availability: 1 - (rate(grpc_server_handled_total{grpc_code!="OK"}[5m]) / rate(grpc_server_handled_total[5m]))
Error breakdown: sum by (grpc_code)(rate(grpc_server_handled_total{grpc_code!="OK"}[5m]))

grpc_server_handling_seconds

This histogram records the latency of RPCs, with the same label set as grpc_server_handled_total. It captures the full server-side duration from receiving the request to sending the final response. You use it to compute percentiles:

# p99 latency for GetProfile
histogram_quantile(0.99,
  rate(grpc_server_handling_seconds_bucket{
    grpc_method="GetProfile"
  }[5m])
)

# p50 latency across all methods
histogram_quantile(0.50,
  sum by (le)(
    rate(grpc_server_handling_seconds_bucket[5m])
  )
)

The default histogram buckets (.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10) work for most services, but you should customize them for your latency profile. A service that consistently responds in under 1ms needs finer-grained buckets at the low end. A batch processing service that takes minutes needs buckets extending much higher.

Client-Side Metrics

The corresponding client-side metrics are grpc_client_handled_total and grpc_client_handling_seconds. These measure latency as seen by the caller, including network round-trip time, serialization, and any client-side retry overhead. Comparing client-side and server-side latency for the same RPC reveals network or load-balancer-induced delays.

Additional Prometheus Metrics

Beyond the core handled/handling pair, the grpc-prometheus library can also expose:

grpc_server_started_total -- RPCs that were started but may not have completed (useful for detecting in-flight requests and connection drops)
grpc_server_msg_received_total / grpc_server_msg_sent_total -- per-message counters for streaming RPCs

Distributed Tracing with gRPC Metadata Propagation

Distributed tracing works by propagating a trace context from service to service. When Service A calls Service B, the trace ID and span ID must travel with the request so that Service B can create a child span linked to Service A's span. In HTTP/1.1, this context is carried in headers. In gRPC, it is carried in metadata -- gRPC's equivalent of HTTP headers, which are key-value pairs sent at the beginning of each RPC.

W3C Trace Context

The W3C Trace Context standard (a W3C Recommendation since 2020) defines two headers for propagation: traceparent and tracestate. In gRPC, these are carried as metadata keys. The traceparent header encodes the trace ID, parent span ID, and trace flags in a compact format:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
             ^^-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^-^^^^^^^^^^^^^^^^-^^
           version        trace-id              parent-span-id  flags

The trace ID is a 128-bit identifier shared across all spans in the same trace. The parent span ID identifies the specific span that initiated the current request. The flags field indicates whether the trace is sampled (01) or not (00). The tracestate header carries vendor-specific data, allowing multiple tracing systems to coexist.

When using OpenTelemetry auto-instrumentation, W3C Trace Context propagation is the default. The client-side interceptor injects the traceparent metadata key before sending each RPC, and the server-side interceptor extracts it and creates a child span. You do not need to write any propagation code.

B3 Propagation

The B3 format, originated by Zipkin, is an older propagation standard still widely used, especially in service mesh environments where Envoy or Istio are involved. B3 uses multiple headers (or a single compact header):

# Multi-header format
X-B3-TraceId: 463ac35c9f6413ad48485a3953bb6124
X-B3-SpanId: 0020000000000001
X-B3-ParentSpanId: 0000000000000000
X-B3-Sampled: 1

# Single-header format
b3: 463ac35c9f6413ad48485a3953bb6124-0020000000000001-1

In gRPC, these become metadata keys: x-b3-traceid, x-b3-spanid, etc. OpenTelemetry supports B3 propagation via a configurable propagator. If you are running in a service mesh that uses B3, you configure OTel to use the B3 propagator:

from opentelemetry.propagators.b3 import B3MultiFormat
from opentelemetry import propagate

propagate.set_global_textmap(B3MultiFormat())

You can also compose propagators to support both W3C and B3 simultaneously, which is useful during migrations:

from opentelemetry.propagators.composite import CompositePropagator
propagate.set_global_textmap(CompositePropagator([
    TraceContextTextMapPropagator(),
    B3MultiFormat(),
]))

gRPC-Specific Propagation Concerns

Unlike HTTP/1.1, gRPC uses HTTP/2 which has specific header handling rules. Metadata keys in gRPC are case-insensitive and are always lowercased on the wire. Binary metadata values must use keys ending in -bin, and the values are base64-encoded automatically. Trace context headers are text-based, so they propagate naturally without the -bin suffix.

For streaming RPCs, the trace context is propagated once when the stream is established. All messages on the stream belong to the same span. If you need finer-grained tracing for individual stream messages, you must create child spans manually within your handler. For long-lived streams, be aware that the final error status is conveyed via HTTP/2 trailers, which arrive only when the stream ends. If a stream fails after running for hours, the trace span must remain open until the trailers arrive to capture the terminal status correctly -- prematurely closing the span loses the error context.

Channelz: Runtime Diagnostics

Channelz is a gRPC-specific diagnostic service built into the gRPC runtime itself. Unlike metrics and tracing, which are external observability signals, channelz provides an introspective view into the gRPC transport layer. It exposes real-time information about channels, subchannels, sockets, and servers -- the internal plumbing that connects your services.

What Channelz Exposes

Channels represent client-side connections to a target (e.g., dns:///payment-service:443). Each channel has a connectivity state (IDLE, CONNECTING, READY, TRANSIENT_FAILURE, SHUTDOWN) and tracks call counts, last call timestamps, and the channel's target string.

Subchannels represent connections to individual backend endpoints. When a channel uses a load balancer (round-robin, pick-first, etc.), it creates one subchannel per resolved address. Subchannels show you which specific backends are healthy and which are in TRANSIENT_FAILURE, which is invaluable for debugging load balancing issues.

Sockets expose the lowest level: individual TCP connections with their local and remote addresses, TLS information, streams started/succeeded/failed, messages sent/received, bytes transferred, and flow control window sizes. Socket-level data helps diagnose HTTP/2 flow control issues, where a slow consumer can back-pressure the sender.

Servers show server-side listening sockets with aggregate call counts and the list of active server sockets. This reveals how many clients are connected and the overall call volume.

Enabling Channelz

In Go, channelz is available by importing the channelz service and registering it with your server:

import (
    "google.golang.org/grpc/channelz/service"
)

// Register channelz service with your gRPC server
service.RegisterChannelzServiceToServer(server)

Channelz has effectively replaced the older grpc-zpages (z-Pages) approach to runtime diagnostics. You can query the channelz data via gRPC calls to the grpc.channelz.v1.Channelz service, or use a web UI to browse it visually. In production, channelz should be exposed on a separate admin port, not on the public-facing server port.

Structured Logging with Interceptors

The most effective pattern for gRPC logging uses interceptors to capture structured log entries for every RPC. Rather than scattering log statements throughout handler code, a logging interceptor provides consistent, comprehensive request/response logging at the framework level.

A well-designed logging interceptor captures:

The full method name (/package.Service/Method)
The gRPC status code
Request and response durations
The trace ID and span ID (for correlation with traces)
Peer information (client IP, TLS details)
Optionally, request and response payloads (redacted for sensitive fields)

Here is a Go logging interceptor that emits structured JSON logs with trace correlation:

func LoggingUnaryInterceptor(
    ctx context.Context,
    req interface{},
    info *grpc.UnaryServerInfo,
    handler grpc.UnaryHandler,
) (interface{}, error) {
    start := time.Now()

    // Extract trace context
    spanCtx := trace.SpanContextFromContext(ctx)

    // Call the handler
    resp, err := handler(ctx, req)

    // Extract gRPC status
    st, _ := status.FromError(err)

    // Emit structured log
    slog.InfoContext(ctx, "grpc_request",
        "method", info.FullMethod,
        "code", st.Code().String(),
        "duration_ms", time.Since(start).Milliseconds(),
        "trace_id", spanCtx.TraceID().String(),
        "span_id", spanCtx.SpanID().String(),
        "peer", peerFromContext(ctx),
    )

    return resp, err
}

The critical detail is including the trace_id in every log line. This allows you to search your log aggregation system (Loki, Elasticsearch, CloudWatch) for a specific trace ID and see all the log lines from all services that participated in that request. Without this correlation, logs from different services are disconnected islands of information.

Payload Logging

Logging request and response payloads is powerful for debugging but must be done carefully. gRPC messages are protocol buffers, which serialize to binary by default. For logging, you need to marshal them to JSON or use proto's text format. You should also redact sensitive fields (passwords, tokens, PII) and truncate large messages to avoid overwhelming your log storage.

// Only log payloads in non-production or for specific methods
if shouldLogPayload(info.FullMethod) {
    reqJSON, _ := protojson.Marshal(req.(proto.Message))
    slog.Debug("grpc_request_payload",
        "method", info.FullMethod,
        "payload", string(reqJSON),
        "trace_id", spanCtx.TraceID().String(),
    )
}

Streaming RPC Logging

Logging streaming RPCs requires wrapping the stream to intercept individual messages. You create a wrapper around grpc.ServerStream that logs each SendMsg and RecvMsg call. This is more complex than unary logging but follows the same pattern: capture the method, message count, timing, and trace context.

Dashboards and Alerting Patterns

Raw observability data is only useful if it is presented in dashboards that surface problems and wired to alerts that wake you up when things break. For gRPC services, dashboards and alerts should be organized around the RED method (Rate, Errors, Duration) and the Four Golden Signals (latency, traffic, errors, saturation).

Essential Dashboard Panels

A standard gRPC service dashboard should include these panels, roughly in this order:

Request rate -- rate(grpc_server_handled_total[5m]) broken down by method. This is your traffic signal. A sudden drop means clients cannot reach you or have stopped calling. A spike may indicate a retry storm.
Error rate -- rate(grpc_server_handled_total{grpc_code!="OK"}[5m]) as both an absolute count and as a percentage of total traffic. Break this down by status code to distinguish client errors (INVALID_ARGUMENT, NOT_FOUND) from server errors (INTERNAL, UNAVAILABLE).
Latency percentiles -- p50, p95, p99, and p999 of grpc_server_handling_seconds. Show these per method and as an aggregate. The gap between p50 and p99 tells you how much tail latency your users experience.
In-flight requests -- grpc_server_started_total - grpc_server_handled_total (or a dedicated gauge). A rising count of in-flight requests signals that handlers are getting slower and work is queuing up.
Stream message rates -- for streaming methods, rate(grpc_server_msg_sent_total[5m]) and rate(grpc_server_msg_received_total[5m]). This reveals throughput for long-lived streams.
Downstream latency -- client-side metrics (grpc_client_handling_seconds) for each downstream dependency. If your service calls three backends, show their latency side by side to quickly identify which dependency is slowing you down.

Alerting Rules

Effective alerts for gRPC services focus on symptoms, not causes. Alert on things that directly impact users, and use dashboards to investigate causes.

# High error rate (more than 5% of requests failing)
- alert: GrpcHighErrorRate
  expr: |
    sum(rate(grpc_server_handled_total{grpc_code!="OK"}[5m]))
    /
    sum(rate(grpc_server_handled_total[5m]))
    > 0.05
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "gRPC error rate above 5%"

# High latency (p99 above 500ms)
- alert: GrpcHighLatency
  expr: |
    histogram_quantile(0.99,
      sum by (le)(rate(grpc_server_handling_seconds_bucket[5m]))
    ) > 0.5
  for: 5m
  labels:
    severity: warning

# Deadline exceeded spike (often indicates cascading timeouts)
- alert: GrpcDeadlineExceeded
  expr: |
    rate(grpc_server_handled_total{grpc_code="DeadlineExceeded"}[5m]) > 10
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "High rate of deadline exceeded errors -- possible cascading failure"

The DEADLINE_EXCEEDED alert is especially important for gRPC services because deadline propagation is a core gRPC feature. When Service A sets a 2-second deadline and calls Service B, the remaining deadline is automatically propagated via metadata. If Service B calls Service C and the deadline is nearly exhausted, Service C immediately returns DEADLINE_EXCEEDED. A spike in deadline exceeded errors at any point in the call chain usually indicates a latency increase at one of the downstream services, causing cascading timeouts across the entire tree.

Tail-Based Sampling for High-Volume Services

Sampling is the fundamental tradeoff in distributed tracing: storing every trace is expensive, but sampling uniformly at a low rate means you miss the interesting traces -- the ones that are slow, errored, or unusual. Tail-based sampling solves this by making the sampling decision after the trace is complete, based on the actual characteristics of the trace rather than a random coin flip at the start.

How Head-Based Sampling Fails

With head-based sampling, the decision to record a trace is made at the first service in the call chain, before anything interesting has happened. If you sample at 1%, you record 1 in 100 traces. This is fine for understanding general patterns, but it means you only have a 1% chance of capturing any given error. If an error occurs once per 10,000 requests, you would need to wait for roughly 1,000,000 requests to capture a single trace of that error. For rare but critical failures, head-based sampling is effectively useless.

How Tail-Based Sampling Works

Tail-based sampling buffers all trace spans temporarily (typically in the OpenTelemetry Collector) and makes the sampling decision after the root span completes. At that point, the collector knows the full picture: the total duration, the final status code, whether any span in the trace errored, and all the attributes set by application code. Sampling policies can then express rules like:

Always keep traces with any error status code
Always keep traces with latency above the p99 threshold
Always keep traces for specific high-value customers (based on a span attribute)
Sample healthy, fast traces at 0.1%

This approach dramatically reduces storage costs while ensuring you never miss the traces that matter. A service handling 100,000 requests per second might keep only 100 healthy traces per second but all 50 error traces, resulting in a 99.85% reduction in trace volume without losing any signal about failures.

OpenTelemetry Collector Configuration

The OpenTelemetry Collector supports tail-based sampling via the tail_sampling processor. Here is a configuration that implements the policies described above:

processors:
  tail_sampling:
    decision_wait: 10s        # buffer spans for 10 seconds
    num_traces: 100000         # max traces in memory
    expected_new_traces_per_sec: 10000
    policies:
      # Always keep errors
      - name: errors
        type: status_code
        status_code:
          status_codes: [ERROR]

      # Always keep slow traces
      - name: slow-traces
        type: latency
        latency:
          threshold_ms: 500

      # Always keep traces for specific methods
      - name: critical-methods
        type: string_attribute
        string_attribute:
          key: rpc.method
          values: [ProcessPayment, TransferFunds]

      # Sample everything else at 1%
      - name: baseline
        type: probabilistic
        probabilistic:
          sampling_percentage: 1

The decision_wait parameter is critical: it defines how long the collector buffers spans before making a decision. If your traces typically complete within 5 seconds, a 10-second wait gives adequate margin. Setting it too low risks making decisions before all spans have arrived; setting it too high increases memory usage.

Scaling Tail-Based Sampling

Tail-based sampling has a fundamental challenge: all spans for a given trace must arrive at the same collector instance, because the decision is made per-trace. In a scaled-out collector deployment, you need a load-balancing layer that routes spans by trace ID. The OpenTelemetry Collector's loadbalancing exporter handles this by consistently hashing the trace ID to a specific downstream collector. The architecture looks like this: services export spans to a pool of "gateway" collectors, which use the load-balancing exporter to route spans to a pool of "sampling" collectors, each of which runs the tail_sampling processor and exports kept traces to the backend.

Putting It All Together: Correlation

The real power of gRPC observability comes from correlating signals. A metric alert fires: "error rate for OrderService.CreateOrder is above 5%." You open the dashboard and see the error rate spiked at 14:32. You switch to the trace view and filter for OrderService.CreateOrder traces with error status between 14:30 and 14:35. You find a trace showing that the InventoryService.ReserveStock call is returning UNAVAILABLE. You click on the span and jump to the logs for that trace ID in Loki. The logs show that the inventory service's database connection pool was exhausted. From alert to root cause in under two minutes.

This workflow requires three things to be in place: (1) all three signals share a common identifier (trace ID), (2) your dashboarding tool supports jumping between metrics, traces, and logs (Grafana does this natively with its Tempo/Loki/Prometheus stack), and (3) your log lines include the trace ID as a structured field.

Operational Best Practices

After instrumenting gRPC services across many teams, several patterns emerge as consistently valuable:

Use Exemplars to Bridge Metrics and Traces

Prometheus exemplars attach a trace ID to specific metric observations. When you see a spike in your latency histogram, the exemplar tells you the trace ID of an actual request that contributed to that spike. Grafana can then link directly from the metric panel to the trace view. This eliminates the guesswork of finding a relevant trace after a metric alert fires.

// Go: record a histogram observation with an exemplar
duration := time.Since(start).Seconds()
serverHandlingSeconds.With(labels).
    (Exemplar)Observe(duration, prometheus.Labels{
        "traceID": spanCtx.TraceID().String(),
    })

Set Meaningful Deadlines

gRPC's deadline propagation is a double-edged sword. When every service sets aggressive deadlines and the call chain is deep, a latency spike at a leaf service causes DEADLINE_EXCEEDED errors to cascade up the entire chain. Monitor deadline exceeded errors at every service in the chain. If service D shows deadline exceeded errors but its own latency is normal, the problem is upstream -- the deadline was already almost expired by the time the request reached service D. Tracing makes this visible: the span for service D shows a very short time between receiving the request and the deadline firing.

Monitor the Interceptor Stack

It is easy to pile up interceptors: logging, metrics, auth, rate limiting, validation, retry. Each interceptor adds latency. Measure the time spent in the interceptor chain versus the actual handler. If 40% of your request latency is interceptor overhead, it is time to consolidate or optimize. Some teams use a single "observability" interceptor that handles metrics, logging, and trace enrichment in one pass.

Separate Admin Ports

Expose metrics endpoints (/metrics), health checks (/healthz), and channelz on a separate port from your application gRPC server. This prevents Prometheus scrapes from appearing in your application metrics, keeps health check traffic out of your access logs, and avoids exposing diagnostic endpoints to external clients. A common pattern is application gRPC on port 9090 and admin HTTP on port 9091.

Version Your Dashboards

Store Grafana dashboard JSON in version control alongside the service code. When a service adds a new method or changes its metrics, the dashboard update is part of the same pull request. This prevents dashboards from drifting out of sync with the service's actual interface. Jsonnet or Grafonnet can template dashboards for consistent structure across services.

Conclusion

gRPC observability is not a single tool but a layered system. Metrics give you the aggregate view: request rates, error rates, latency distributions. Traces give you the per-request view: where time is spent across service boundaries. Logs give you the event-level view: what exactly happened and why. Channelz gives you the transport-level view: connection states, flow control, subchannel health. OpenTelemetry provides the unified instrumentation layer that collects all of these signals with minimal application code changes. And tail-based sampling ensures you keep the traces that matter without drowning in storage costs.

The key architectural insight is correlation. Every log line, every trace span, and every metric exemplar should carry a trace ID. When an alert fires, you should be able to navigate from the metric that triggered the alert to a specific trace to the exact log lines that explain the failure -- all within seconds. That workflow, more than any individual tool, is what makes gRPC services operable at scale.

For the foundational concepts that underpin everything in this article, review How gRPC Works. For details on the interceptor pattern that powers most of the instrumentation described here, see gRPC Interceptors. And for how service meshes like Istio and Linkerd can provide observability without application-level instrumentation, see gRPC and Service Mesh.