gRPC Service Design
gRPC is an RPC framework built on HTTP/2 and Protocol Buffers. You define services and methods in a .proto file, generate server and client code, and get type-safe, high-performance remote procedure calls. Unlike REST, where you model resources and use HTTP verbs, gRPC models services and methods — you call functions on remote servers as if they were local.
Service Definition
A gRPC service is defined in a .proto file alongside the message types it uses:
syntax = "proto3";
package orders.v1;
import "google/protobuf/timestamp.proto";
import "google/protobuf/field_mask.proto";
service OrderService {
rpc CreateOrder (CreateOrderRequest) returns (Order);
rpc GetOrder (GetOrderRequest) returns (Order);
rpc ListOrders (ListOrdersRequest) returns (ListOrdersResponse);
rpc UpdateOrder (UpdateOrderRequest) returns (Order);
rpc CancelOrder (CancelOrderRequest) returns (Order);
}
message CreateOrderRequest {
string customer_id = 1;
repeated LineItem items = 2;
string currency = 3;
string idempotency_key = 4;
}
message GetOrderRequest {
string order_id = 1;
}
message ListOrdersRequest {
string customer_id = 1;
int32 page_size = 2;
string page_token = 3;
OrderStatus status_filter = 4;
}
message ListOrdersResponse {
repeated Order orders = 1;
string next_page_token = 2;
int32 total_count = 3;
}
message UpdateOrderRequest {
Order order = 1;
google.protobuf.FieldMask update_mask = 2;
}
message CancelOrderRequest {
string order_id = 1;
string reason = 2;
}
Each rpc line declares a method: its name, request type, and response type. The generated server code provides an interface to implement; the generated client code provides methods to call.
RPC Types
gRPC supports four communication patterns.
Unary RPCs
One request, one response. This is the standard request-response pattern, equivalent to a REST call:
rpc GetOrder (GetOrderRequest) returns (Order);
The client sends a request and waits for a response. This covers the majority of use cases: fetching data, creating resources, updating state.
Server Streaming
One request, many responses. The client sends a single request, and the server sends back a stream of messages:
rpc StreamOrderUpdates (StreamOrderUpdatesRequest) returns (stream OrderUpdate);
message StreamOrderUpdatesRequest {
string order_id = 1;
}
message OrderUpdate {
string order_id = 1;
OrderStatus previous_status = 2;
OrderStatus new_status = 3;
google.protobuf.Timestamp timestamp = 4;
}
Use cases: real-time notifications, log tailing, progress updates for long-running operations. The client opens the stream and receives updates as they happen, without polling.
Client Streaming
Many requests, one response. The client sends a stream of messages and the server responds with a single message after the client finishes:
rpc UploadOrderItems (stream LineItem) returns (UploadSummary);
message UploadSummary {
int32 items_received = 1;
int32 items_accepted = 2;
int32 items_rejected = 3;
}
Use cases: batch uploads, file streaming, aggregating data from a client. The server processes items as they arrive and returns a summary when the stream closes.
Bidirectional Streaming
Many requests, many responses. Both sides send and receive messages independently:
rpc Chat (stream ChatMessage) returns (stream ChatMessage);
Use cases: chat systems, collaborative editing, interactive debugging sessions. Both sides send messages whenever they have data, without waiting for the other side.
Bidirectional streaming is the most complex pattern. Use it only when the other three patterns are insufficient.
Deadlines & Timeouts
Every gRPC call should have a deadline. A deadline is an absolute point in time by which the call must complete. If the deadline passes, the call fails with DEADLINE_EXCEEDED.
Client sets deadline: now + 5 seconds
Client sends request to Service A
Service A calls Service B (propagates remaining deadline: 4.8 seconds)
Service B calls Service C (propagates remaining deadline: 4.2 seconds)
Deadlines propagate through the call chain. If the client sets a 5-second deadline and Service A takes 3 seconds, Service B has 2 seconds remaining. This prevents cascading timeouts where a slow downstream service holds up the entire chain.
Without deadlines, a failed service causes requests to pile up indefinitely. Servers run out of resources. Clients hang forever. Set deadlines on every call.
Google's API design guide recommends that servers should check the remaining deadline before starting expensive operations. If the deadline has already passed or there is insufficient time remaining, return DEADLINE_EXCEEDED immediately rather than wasting work.
Metadata
gRPC metadata is the equivalent of HTTP headers. It carries key-value pairs alongside the request and response, outside the protobuf message.
Common metadata uses:
- Authentication —
authorization: Bearer <token> - Request tracing —
x-request-id: abc123,traceparent: 00-... - Client version —
x-client-version: 2.1.0 - Idempotency keys —
idempotency-key: order-456-attempt-1
Metadata is sent as initial metadata (before the first message) or trailing metadata (after the last message). Trailing metadata is where servers typically put status details and debugging information.
Status Codes
gRPC has its own status code system, separate from HTTP status codes:
| Code | Name | When to Use |
|---|---|---|
| 0 | OK | Success |
| 1 | CANCELLED | Client canceled the request |
| 2 | UNKNOWN | Server error without a specific code |
| 3 | INVALID_ARGUMENT | Client sent bad input (like HTTP 400) |
| 4 | DEADLINE_EXCEEDED | Operation timed out |
| 5 | NOT_FOUND | Resource does not exist (like HTTP 404) |
| 6 | ALREADY_EXISTS | Resource already exists (like HTTP 409) |
| 7 | PERMISSION_DENIED | Caller lacks permission (like HTTP 403) |
| 9 | FAILED_PRECONDITION | System not in required state for the operation |
| 10 | ABORTED | Concurrency conflict, retry may succeed |
| 11 | OUT_OF_RANGE | Value outside valid range |
| 12 | UNIMPLEMENTED | Method not implemented |
| 13 | INTERNAL | Internal server error (like HTTP 500) |
| 14 | UNAVAILABLE | Service temporarily unavailable, retry with backoff |
| 16 | UNAUTHENTICATED | Missing or invalid authentication (like HTTP 401) |
The most commonly used: OK, INVALID_ARGUMENT, NOT_FOUND, PERMISSION_DENIED, INTERNAL, UNAVAILABLE, and UNAUTHENTICATED.
Use UNAVAILABLE (not INTERNAL) for transient errors that the client should retry. Use FAILED_PRECONDITION for logical errors where the client needs to change state before retrying (e.g., "account must be verified before placing an order").
Rich Error Details
The status code alone is often insufficient. gRPC supports rich error details using the google.rpc.Status message:
// The error includes structured details
message Status {
int32 code = 1;
string message = 2;
repeated google.protobuf.Any details = 3;
}
Google provides standard error detail types:
BadRequest — field violations with field name and description
PreconditionFailure — preconditions that were not met
RetryInfo — how long to wait before retrying
ResourceInfo — which resource was not found or already exists
This is similar to Stripe's structured error responses in REST: machine-readable codes, human-readable messages, and specific field-level details.
gRPC-Web for Browser Clients
Standard gRPC requires HTTP/2 with trailers, which browsers do not support directly. gRPC-Web is a protocol adaptation that works in browsers:
Browser --> gRPC-Web (HTTP/1.1 or HTTP/2) --> Envoy Proxy --> gRPC (HTTP/2)
A proxy (typically Envoy) translates between gRPC-Web and standard gRPC. The browser client uses a generated gRPC-Web client that looks similar to a standard gRPC client but uses a different wire format.
gRPC-Web supports unary and server streaming RPCs. Client streaming and bidirectional streaming are not supported in the browser environment.
If your primary consumers are browsers, REST or GraphQL is simpler. gRPC-Web is for organizations that want a unified RPC framework and are willing to run the proxy infrastructure.
Load Balancing
gRPC uses persistent HTTP/2 connections with multiplexed streams. This creates a load balancing challenge: a traditional L4 (TCP) load balancer assigns a connection to a backend, and all RPCs on that connection go to the same backend.
Client-Side Load Balancing
The client maintains a list of backends and distributes RPCs across them. This is common in Kubernetes with headless services:
Client discovers backends: [backend-1:50051, backend-2:50051, backend-3:50051]
RPC 1 -> backend-1
RPC 2 -> backend-2
RPC 3 -> backend-3
RPC 4 -> backend-1 (round-robin)
gRPC clients have built-in support for client-side load balancing with pluggable policies (round-robin, weighted, pick-first).
Proxy-Based Load Balancing
An L7 (application-level) proxy inspects individual gRPC requests and distributes them across backends. Envoy, Linkerd, and Istio all support gRPC-aware load balancing:
Client --> Envoy (L7 proxy) --> backend-1
--> backend-2
--> backend-3
The proxy sees each RPC as a separate request and can balance per-RPC, not per-connection. This is transparent to the client.
Recommendation: use proxy-based load balancing unless you have a specific reason to push balancing logic into clients. Proxies are easier to operate and update without changing client code.
Common Pitfalls
- No deadlines — every gRPC call without a deadline is a potential resource leak. Clients hang, servers accumulate connections, and the system degrades under load. Always set deadlines.
- Wrong status codes — returning
INTERNALfor client errors, orUNKNOWNwhen a more specific code applies. UseINVALID_ARGUMENTfor bad input,NOT_FOUNDfor missing resources, andUNAVAILABLEfor transient failures. - Too many streaming RPCs — streaming adds complexity to error handling, backpressure, and connection management. Use unary RPCs for request-response patterns and streaming only when data genuinely arrives over time.
- Ignoring backpressure — a server streaming faster than the client can consume. gRPC provides flow control via HTTP/2 windows, but application logic must respect it. Monitor stream buffer sizes.
- L4 load balancing with persistent connections — TCP-level load balancers assign an entire HTTP/2 connection to one backend, creating hot spots. Use L7 proxies or client-side balancing.
- Large messages — gRPC has a default message size limit of 4 MB. Sending larger messages requires configuration changes and often indicates a design problem. Paginate or stream instead.
- No health checks — gRPC has a standard health checking protocol (
grpc.health.v1.Health). Implement it so load balancers and orchestrators can detect unhealthy instances.
Key Takeaways
- gRPC services are defined in
.protofiles withserviceandrpcdeclarations. The generated code provides type-safe clients and server interfaces. - Four RPC types: unary (one-to-one), server streaming (one-to-many), client streaming (many-to-one), and bidirectional streaming (many-to-many). Use unary for most operations; add streaming only when needed.
- Always set deadlines on gRPC calls. Deadlines propagate through the call chain and prevent cascading failures.
- Use the correct status code:
INVALID_ARGUMENTfor bad input,NOT_FOUNDfor missing resources,UNAVAILABLEfor transient errors,PERMISSION_DENIEDfor authorization failures. - gRPC-Web enables browser access but requires a proxy (Envoy) and does not support all streaming modes. For browser-first APIs, REST or GraphQL is simpler.
- Use L7 (application-level) load balancing for gRPC. L4 load balancers do not distribute individual RPCs across backends because gRPC multiplexes streams over persistent connections.