gRPC-Gateway and HTTP/JSON Transcoding
gRPC is built on HTTP/2 and Protocol Buffers, delivering compact binary serialization and bidirectional streaming between services. But the moment you need to expose that same API to a browser, a mobile app using plain HTTP, a partner integration expecting JSON, or a legacy system that only speaks REST, you hit a wall. Browsers cannot make native gRPC calls. Curl cannot send Protobuf. Your monitoring dashboard expects JSON. The solution is HTTP/JSON transcoding — a translation layer that sits in front of your gRPC services and maps RESTful HTTP requests to gRPC calls and back again, automatically and with minimal overhead.
Why You Need HTTP/JSON Alongside gRPC
gRPC's binary protocol is excellent for service-to-service communication inside a data center, but it creates friction at every boundary where the consumer cannot speak gRPC natively. There are several common scenarios where transcoding becomes necessary.
Browser Clients
The browser's fetch() API and XMLHttpRequest work over HTTP/1.1 and HTTP/2, but they cannot construct raw HTTP/2 frames with gRPC's framing protocol. gRPC-Web partially addresses this by defining a browser-compatible subset of gRPC, but it still requires a proxy and does not support all streaming modes. For many teams, a clean JSON/REST API that browsers can call directly is simpler than deploying a gRPC-Web proxy and generating client stubs.
Public APIs
When you publish an API for external developers, you cannot mandate that they install a Protobuf compiler, generate language-specific stubs, and learn the gRPC toolchain. REST/JSON is the lingua franca of public APIs — every programming language, every HTTP library, and every developer already knows how to call it. Google, for instance, exposes nearly all of its Cloud APIs through both gRPC and REST, using the same Protobuf service definitions to generate both interfaces.
Legacy Systems and Third-Party Integrations
Enterprise environments are full of systems that speak HTTP/JSON: webhook consumers, API gateways, monitoring tools, log aggregators, CI/CD pipelines, and SaaS integrations. Rewriting every integration point to use gRPC is rarely feasible. Transcoding lets you keep your internal services on gRPC while presenting a JSON interface to everything that needs one.
Tooling and Debugging
Developers can test a REST endpoint with curl, Postman, or a browser. Testing a gRPC endpoint requires grpcurl, Evans, or a custom client with compiled stubs. During development, incident response, and debugging, the ability to quickly fire off an HTTP request and read a JSON response is invaluable.
google.api.http Annotations in Proto Files
The foundation of gRPC transcoding is a set of annotations defined in Google's API design guide. You add google.api.http options to your Protobuf service methods to specify how each RPC maps to an HTTP endpoint. These annotations live in your .proto files, keeping the REST mapping co-located with the service definition.
Here is a concrete example. Suppose you have a service that manages network prefixes:
syntax = "proto3";
import "google/api/annotations.proto";
package network.v1;
service PrefixService {
// Look up a single prefix by its CIDR block
rpc GetPrefix(GetPrefixRequest) returns (Prefix) {
option (google.api.http) = {
get: "/v1/prefixes/{cidr}"
};
}
// List all prefixes for an autonomous system
rpc ListPrefixes(ListPrefixesRequest) returns (ListPrefixesResponse) {
option (google.api.http) = {
get: "/v1/asns/{asn}/prefixes"
};
}
// Announce a new prefix
rpc AnnouncePrefix(AnnouncePrefixRequest) returns (Prefix) {
option (google.api.http) = {
post: "/v1/prefixes"
body: "*"
};
}
// Update prefix attributes
rpc UpdatePrefix(UpdatePrefixRequest) returns (Prefix) {
option (google.api.http) = {
patch: "/v1/prefixes/{prefix.cidr}"
body: "prefix"
};
}
}
Each annotation specifies the HTTP method (get, post, put, patch, delete), a URL pattern with path parameters in curly braces, and optionally a body field that controls how the request body maps to the Protobuf message. The annotations are defined in google/api/annotations.proto and google/api/http.proto, which you include from the googleapis repository.
Path Parameters, Query Parameters, and Body Mapping
The transcoding layer applies precise rules to map between HTTP requests and Protobuf messages. Understanding these rules is essential for designing clean APIs.
Path Parameters
Curly-brace segments in the URL pattern bind to fields in the request message. If your URL is /v1/asns/{asn}/prefixes/{prefix_id} and your request message has fields asn (string) and prefix_id (string), an HTTP request to /v1/asns/AS15169/prefixes/8.8.8.0-24 will populate both fields automatically. Path parameters can reference nested fields using dot notation: {prefix.cidr} maps to the cidr field inside a nested prefix sub-message.
Query Parameters
Any request message fields that are not bound by a path parameter and not consumed by the body mapping are automatically available as query parameters. For a ListPrefixes RPC with fields asn, page_size, and page_token, and a URL pattern of /v1/asns/{asn}/prefixes, the page_size and page_token fields become query parameters:
GET /v1/asns/AS13335/prefixes?page_size=50&page_token=abc123
This is automatic — you do not need to annotate each query parameter individually. Repeated fields (Protobuf arrays) map to repeated query parameters: ?status=active&status=pending.
Body Mapping
The body field in the annotation controls what portion of the request message is populated from the HTTP request body:
body: "*"— The entire request message (minus path-bound fields) is mapped from the JSON body. This is the most common choice forPOSTandPUTmethods.body: "resource"— Only theresourcesub-field of the request message is populated from the body. Other fields come from path or query parameters. This is useful for update operations where the URL identifies the resource and the body carries the new values.- No body field — No request body is expected. All fields must come from path and query parameters. This is the default for
GETandDELETE.
On the response side, the entire response message is serialized to JSON by default. You can use response_body to select a sub-field, though this is uncommon.
The grpc-gateway: Go Reverse Proxy Generator
The grpc-gateway is the most widely used open-source tool for gRPC transcoding. It is a protoc plugin that reads your .proto files with google.api.http annotations and generates a Go reverse proxy server. This proxy accepts RESTful HTTP/JSON requests, translates them into gRPC calls, forwards them to your backend service, and translates the gRPC responses back to JSON.
How It Works
You run protoc with the --grpc-gateway_out plugin, and it generates a .gw.go file for each service. This file contains an HTTP handler that you register with a standard Go HTTP router (typically net/http or a mux like gorilla/mux). At runtime, the generated code:
- Parses the incoming HTTP request (method, path, query string, body)
- Extracts path parameters and query parameters according to the annotations
- Deserializes the JSON body into the corresponding Protobuf request message
- Makes a gRPC call to the backend service
- Serializes the Protobuf response message to JSON
- Returns the JSON response with appropriate HTTP status codes
The generated proxy handles error mapping automatically: gRPC status codes are translated to HTTP status codes (NOT_FOUND becomes 404, INVALID_ARGUMENT becomes 400, INTERNAL becomes 500, and so on).
Deployment Topology
In production, the grpc-gateway proxy typically runs as a sidecar or a separate service in front of your gRPC servers. Internal service-to-service communication goes directly via gRPC, while external traffic flows through the gateway:
The grpc-gateway can also run in-process alongside your gRPC server. Instead of making network calls, the generated proxy invokes the gRPC service handler directly through Go interfaces, eliminating network round-trip overhead entirely. This is useful for small deployments where running a separate proxy process is unnecessary.
Envoy gRPC-JSON Transcoding Filter
If you are already running Envoy as your service mesh proxy or API gateway, you can use its built-in gRPC-JSON transcoding filter instead of deploying a separate grpc-gateway process. Envoy's transcoding filter reads the same google.api.http annotations from a compiled Protobuf descriptor file and performs the translation at the proxy layer.
The configuration involves two steps. First, you compile your proto files into a binary descriptor set:
protoc --descriptor_set_out=api_descriptor.pb \
--include_imports \
my_service.proto
Then you configure the Envoy filter to use this descriptor:
http_filters:
- name: envoy.filters.http.grpc_json_transcoder
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters
.http.grpc_json_transcoder.v3
.GrpcJsonTranscoder
proto_descriptor: "/etc/envoy/api_descriptor.pb"
services:
- network.v1.PrefixService
print_options:
add_whitespace: true
always_print_primitive_fields: true
preserve_proto_field_names: true
Envoy handles the transcoding at the network layer with minimal overhead. Because Envoy is written in C++ and already sits in the data path for many microservice architectures (as the sidecar proxy in Istio, for example), adding transcoding requires no additional network hop. The filter operates within the same connection pipeline that handles TLS termination, load balancing, retries, and observability.
The tradeoff is flexibility. Envoy's transcoding filter does not support some edge cases that the grpc-gateway handles, such as custom error response shaping or middleware hooks. But for straightforward REST-to-gRPC mapping, Envoy's approach is operationally simpler — you do not need to build, deploy, or maintain a separate proxy application.
Cloud Endpoints and API Gateway Transcoding
Managed cloud services offer transcoding as a turnkey feature, removing the operational burden entirely.
Google Cloud Endpoints
Google Cloud Endpoints is built on the Extensible Service Proxy (ESP), which is itself built on Envoy. You upload your compiled proto descriptor and a service configuration file that references your google.api.http annotations. The ESP proxy is deployed alongside your gRPC service (as a sidecar container in Cloud Run or GKE), and it automatically transcodes HTTP/JSON to gRPC. Cloud Endpoints also adds API key validation, authentication, rate limiting, and monitoring — all configured declaratively.
AWS API Gateway
AWS API Gateway does not natively understand Protobuf or google.api.http annotations. To transcode with AWS, you typically deploy a grpc-gateway or a Lambda function that performs the translation, then place API Gateway in front for routing, authentication, and throttling. AWS's approach requires more custom glue code, but it integrates with the broader AWS ecosystem (IAM, CloudWatch, WAF).
Other Managed Gateways
Kong and Apigee both support gRPC proxying, with varying levels of transcoding capability. Kong's grpc-gateway plugin maps HTTP routes to gRPC methods using a proto descriptor, similar to Envoy. Apigee supports gRPC backends through its API proxy configuration, letting you define REST endpoints that forward to gRPC services.
Request and Response Transcoding in Detail
The transcoding process involves precise transformations at multiple layers. Understanding these details helps you design proto definitions that produce clean REST APIs.
JSON Field Name Mapping
Protobuf field names use snake_case by convention, but JSON APIs typically use camelCase. The transcoding layer handles this automatically: a Protobuf field named origin_as appears as originAs in JSON by default. You can control this with the json_name option in your proto definition, or by setting preserve_proto_field_names in the transcoder configuration to keep snake_case in JSON.
Protobuf-JSON Type Mapping
Several Protobuf types have special JSON representations that the transcoder handles:
int64,uint64,fixed64— Serialized as JSON strings to avoid precision loss in JavaScript, which only supports 53-bit integers natively.bytes— Base64-encoded strings in JSON.google.protobuf.Timestamp— RFC 3339 date-time strings (e.g.,"2025-01-15T08:30:00Z").google.protobuf.Duration— Seconds with nanosecond precision as a string (e.g.,"3.5s").google.protobuf.Struct— Arbitrary JSON objects, useful for dynamic or untyped data.google.protobuf.FieldMask— Comma-separated field paths (e.g.,"origin_as,rpki_status").- Enums — Serialized as strings by default (
"VALID"instead of1), though numeric representation is also accepted.
HTTP Status Code Mapping
gRPC uses a fixed set of status codes that differ from HTTP status codes. The transcoder maps between them:
gRPC Status HTTP Status Meaning
------------------------------------------------------
OK 200 Success
INVALID_ARGUMENT 400 Bad request
UNAUTHENTICATED 401 Missing/invalid auth
PERMISSION_DENIED 403 Forbidden
NOT_FOUND 404 Resource not found
ALREADY_EXISTS 409 Conflict
FAILED_PRECONDITION 400 Precondition failed
RESOURCE_EXHAUSTED 429 Rate limited
CANCELLED 499 Client cancelled
INTERNAL 500 Internal server error
UNIMPLEMENTED 501 Not implemented
UNAVAILABLE 503 Service unavailable
DEADLINE_EXCEEDED 504 Gateway timeout
The transcoder also maps gRPC error detail messages to a JSON error response body, typically in the format {"code": 5, "message": "prefix not found", "details": [...]}. This follows Google's standard error model, but you can customize the error shape in the grpc-gateway through interceptors.
Additional Bindings
A single RPC method can be exposed at multiple HTTP endpoints using the additional_bindings option. This is useful for API versioning or when you want multiple URL patterns to invoke the same backend method:
rpc GetPrefix(GetPrefixRequest) returns (Prefix) {
option (google.api.http) = {
get: "/v1/prefixes/{cidr}"
additional_bindings {
get: "/v1/routes/{cidr}/prefix"
}
};
}
Streaming Transcoding Limitations
gRPC supports four communication patterns: unary, server streaming, client streaming, and bidirectional streaming. Transcoding works well for some of these, but not all.
Unary RPCs
A single request and single response — this maps cleanly to standard HTTP request/response semantics. Every transcoding solution handles unary RPCs without issues.
Server Streaming
The server sends multiple response messages for a single request. The grpc-gateway represents this as newline-delimited JSON (NDJSON), where each line is a complete JSON object representing one response message. Envoy uses HTTP chunked transfer encoding to stream responses. This works for use cases like watching BGP route updates in real time or tailing a log stream, but clients must be written to handle streaming responses rather than expecting a single JSON array.
// Server streaming over HTTP:
GET /v1/routes/watch?prefix=8.8.8.0/24
// Response (NDJSON - one JSON object per line):
{"prefix":"8.8.8.0/24","origin_as":"AS15169","event":"UPDATE"}
{"prefix":"8.8.8.0/24","origin_as":"AS15169","event":"WITHDRAW"}
{"prefix":"8.8.8.0/24","origin_as":"AS15169","event":"UPDATE"}
Client Streaming
The client sends multiple request messages and receives a single response. This has no natural HTTP/1.1 equivalent. You cannot send multiple JSON bodies in a single HTTP request without a framing protocol. The grpc-gateway does not support client streaming transcoding. You must either batch the messages into a single request (by redesigning the API with a repeated field) or require clients to use native gRPC for this method.
Bidirectional Streaming
Both client and server stream messages simultaneously. This is fundamentally incompatible with HTTP/1.1 request-response semantics. Neither the grpc-gateway nor Envoy's transcoding filter supports bidirectional streaming over REST. WebSockets could bridge the gap in theory, but none of the standard transcoding tools implement this. For bidirectional streaming, clients must use native gRPC or gRPC-Web.
This limitation is fundamental to the REST model, not a deficiency in the tools. HTTP request-response semantics are inherently half-duplex at the application layer. If your service relies heavily on client or bidirectional streaming, transcoding is not the right approach for those specific methods — expose them through native gRPC or gRPC-Web instead, while transcoding the unary and server-streaming methods for REST consumers.
OpenAPI/Swagger Generation from Proto Files
One of the most valuable side effects of annotating your proto files with google.api.http is the ability to automatically generate an OpenAPI (Swagger) specification for your REST API. The grpc-gateway project includes a companion protoc plugin, protoc-gen-openapiv2, that reads your annotated protos and produces an OpenAPI 2.0 (Swagger) JSON file.
protoc --openapiv2_out=./docs \
--openapiv2_opt=logtostderr=true \
--openapiv2_opt=json_names_for_fields=true \
my_service.proto
The generated specification includes all the HTTP endpoints, request/response schemas derived from Protobuf message definitions, path and query parameters, and even comments from your proto files as descriptions. You can feed this directly into Swagger UI, Redoc, or any other API documentation tool to produce interactive documentation.
This means your Protobuf service definitions become the single source of truth for:
- gRPC client stubs (generated by
protoc) - REST reverse proxy code (generated by
protoc-gen-grpc-gateway) - OpenAPI documentation (generated by
protoc-gen-openapiv2) - Client SDKs for REST consumers (generated from OpenAPI by tools like
openapi-generator)
The Buf ecosystem offers a more modern alternative. The buf generate command can run all three plugins in a single invocation, managed by a buf.gen.yaml configuration file. Buf also handles dependency management for the googleapis protos, eliminating the manual setup of importing google/api/annotations.proto.
OpenAPI v3 Support
The grpc-gateway's OpenAPI plugin currently generates OpenAPI 2.0 (Swagger). For OpenAPI 3.0, you can use the gnostic converter to transform the 2.0 spec to 3.0, or use third-party protoc plugins like protoc-gen-openapi (from the Google gnostic project) that produce OpenAPI 3.0 directly. Google's own API tooling (used for Cloud APIs) generates OpenAPI 3.0 from annotated proto files, confirming that the annotation format supports both specification versions.
Performance Overhead of Transcoding
Transcoding introduces overhead at several points. Understanding where the cost lies helps you decide whether it matters for your use case.
Serialization Cost
The dominant overhead is the JSON-to-Protobuf and Protobuf-to-JSON conversion. Protobuf binary encoding is compact and fast to parse. JSON is text-based, verbose, and requires more CPU to parse and generate. For small messages (under a few kilobytes), the difference is negligible — microseconds at most. For large messages (megabytes of data, deeply nested structures, large repeated fields), JSON serialization can consume meaningful CPU time.
Benchmarks consistently show that Protobuf serialization is 3-10x faster than JSON, and the encoded size is 2-5x smaller. But in a transcoding scenario, you pay the JSON cost only on the external-facing edge. Internal service-to-service calls still use native Protobuf, so the performance impact is limited to the boundary.
Network Overhead
If the transcoding proxy runs as a separate process, there is an additional network hop between the proxy and the gRPC backend. This adds latency (typically sub-millisecond on a local network, a few milliseconds cross-zone in a cloud environment) and consumes bandwidth for the internal gRPC call. Running the proxy in-process (grpc-gateway's in-process mode) or as an Envoy sidecar (on the same pod/VM) minimizes this overhead.
Memory and CPU
The transcoding proxy must buffer the full JSON request body before it can deserialize it into a Protobuf message (JSON is not streamable in the same way Protobuf is, because field ordering and nesting require seeing the complete object). For very large request bodies, this means the proxy's memory usage scales with request size. gRPC's native streaming avoids this issue by processing messages incrementally.
Practical Impact
For the vast majority of APIs, the transcoding overhead is not the bottleneck. Database queries, network calls to downstream services, and business logic dominate latency. The transcoding layer typically adds 0.5-2ms of latency for unary calls with reasonably sized payloads. At Google scale, where Cloud APIs serve millions of requests per second through transcoding, the architecture has proven viable for production workloads.
If you are optimizing for absolute minimum latency on the external API, consider:
- Running the transcoding proxy in-process or as a sidecar to eliminate the network hop
- Using Envoy's C++ transcoding filter rather than the Go-based grpc-gateway for lower per-request CPU cost
- Enabling HTTP/2 between the client and the transcoding proxy to avoid head-of-line blocking
- Caching frequently requested resources at the CDN or proxy layer to bypass transcoding entirely
Designing Proto Files for Good REST APIs
Not every Protobuf service definition produces a clean REST API through transcoding. Following a few design principles ensures the generated REST endpoints feel natural to consumers who have never seen your proto files.
Use Resource-Oriented Design
Structure your services around resources (nouns) rather than actions (verbs). Instead of rpc ActivatePrefix(ActivatePrefixRequest) mapped to POST /v1/activatePrefix, model it as rpc UpdatePrefix(UpdatePrefixRequest) mapped to PATCH /v1/prefixes/{id} with an active field. This follows the standard REST conventions that API consumers expect.
Follow the AIP (API Improvement Proposals)
Google's AIP guidelines (aip.dev) codify best practices for designing APIs that work well with both gRPC and REST. Key AIPs include standard methods (AIP-131 through AIP-135 for Get, List, Create, Update, Delete), field masks for partial updates (AIP-134), filtering and pagination (AIP-132 and AIP-158), and error handling (AIP-193). Following these patterns produces consistent, predictable REST endpoints.
Keep Messages Flat for Simple Endpoints
Deeply nested Protobuf messages create complex JSON structures. Where possible, flatten request messages so that the HTTP mapping is straightforward. A request with top-level fields maps cleanly to path parameters, query parameters, and a simple JSON body. A request that requires navigating three levels of nesting to set a field creates friction for REST consumers.
End-to-End Workflow
Putting it all together, here is the typical workflow for adding gRPC transcoding to a service:
- Define your service in .proto files with
google.api.httpannotations on each RPC method. - Generate gRPC server stubs in your language of choice using
protoc. - Implement the gRPC service with your business logic.
- Generate the transcoding layer — either grpc-gateway Go code, an Envoy descriptor, or a cloud service configuration.
- Generate OpenAPI documentation using
protoc-gen-openapiv2. - Deploy the gRPC service and the transcoding proxy together.
- Test the REST API using curl or Postman, and the gRPC API using grpcurl or generated client stubs.
The key insight is that you write your service logic once, in your gRPC implementation. The REST API is entirely derived — no separate REST controllers, no manual JSON parsing, no duplicate validation logic. When you add a field to your Protobuf message, it automatically appears in both the gRPC and REST interfaces. When you add a new RPC method with HTTP annotations, a new REST endpoint appears.
When Not to Transcode
Transcoding is not always the right choice. If your API is exclusively consumed by services you control and all of them can speak gRPC, transcoding adds complexity without benefit. If your API relies heavily on client streaming or bidirectional streaming, the transcoded REST surface will be incomplete, creating a confusing experience where some methods work over REST and others do not. If you need to support REST-specific patterns that do not map to gRPC semantics — like content negotiation, HATEOAS links, or multipart file uploads — a custom REST layer may be more appropriate than transcoding.
For everything else — public APIs that need broad accessibility, internal services with mixed client ecosystems, gradual migrations from REST to gRPC, or teams that want the type safety of Protobuf with the convenience of JSON — transcoding is one of the most effective patterns in the gRPC ecosystem.
Further Reading
- How gRPC Works — understand the protocol that transcoding translates from
- gRPC vs REST — compare the two API styles and when to use each
- How gRPC-Web Works — an alternative approach for browser clients
- How Protocol Buffers Work — the serialization format underlying gRPC