gRPC Service Design

gRPC is an RPC framework built on HTTP/2 and Protocol Buffers. You define services and methods in a .proto file, generate server and client code, and get type-safe, high-performance remote procedure calls. Unlike REST, where you model resources and use HTTP verbs, gRPC models services and methods — you call functions on remote servers as if they were local.

Service Definition

A gRPC service is defined in a .proto file alongside the message types it uses:

syntax = "proto3";

package orders.v1;

import "google/protobuf/timestamp.proto";
import "google/protobuf/field_mask.proto";

service OrderService {
  rpc CreateOrder (CreateOrderRequest) returns (Order);
  rpc GetOrder (GetOrderRequest) returns (Order);
  rpc ListOrders (ListOrdersRequest) returns (ListOrdersResponse);
  rpc UpdateOrder (UpdateOrderRequest) returns (Order);
  rpc CancelOrder (CancelOrderRequest) returns (Order);
}

message CreateOrderRequest {
  string customer_id = 1;
  repeated LineItem items = 2;
  string currency = 3;
  string idempotency_key = 4;
}

message GetOrderRequest {
  string order_id = 1;
}

message ListOrdersRequest {
  string customer_id = 1;
  int32 page_size = 2;
  string page_token = 3;
  OrderStatus status_filter = 4;
}

message ListOrdersResponse {
  repeated Order orders = 1;
  string next_page_token = 2;
  int32 total_count = 3;
}

message UpdateOrderRequest {
  Order order = 1;
  google.protobuf.FieldMask update_mask = 2;
}

message CancelOrderRequest {
  string order_id = 1;
  string reason = 2;
}

Each rpc line declares a method: its name, request type, and response type. The generated server code provides an interface to implement; the generated client code provides methods to call.

RPC Types

gRPC supports four communication patterns.

Unary RPCs

One request, one response. This is the standard request-response pattern, equivalent to a REST call:

rpc GetOrder (GetOrderRequest) returns (Order);

The client sends a request and waits for a response. This covers the majority of use cases: fetching data, creating resources, updating state.

Server Streaming

One request, many responses. The client sends a single request, and the server sends back a stream of messages:

rpc StreamOrderUpdates (StreamOrderUpdatesRequest) returns (stream OrderUpdate);

message StreamOrderUpdatesRequest {
  string order_id = 1;
}

message OrderUpdate {
  string order_id = 1;
  OrderStatus previous_status = 2;
  OrderStatus new_status = 3;
  google.protobuf.Timestamp timestamp = 4;
}

Use cases: real-time notifications, log tailing, progress updates for long-running operations. The client opens the stream and receives updates as they happen, without polling.

Client Streaming

Many requests, one response. The client sends a stream of messages and the server responds with a single message after the client finishes:

rpc UploadOrderItems (stream LineItem) returns (UploadSummary);

message UploadSummary {
  int32 items_received = 1;
  int32 items_accepted = 2;
  int32 items_rejected = 3;
}

Use cases: batch uploads, file streaming, aggregating data from a client. The server processes items as they arrive and returns a summary when the stream closes.

Bidirectional Streaming

Many requests, many responses. Both sides send and receive messages independently:

rpc Chat (stream ChatMessage) returns (stream ChatMessage);

Use cases: chat systems, collaborative editing, interactive debugging sessions. Both sides send messages whenever they have data, without waiting for the other side.

Bidirectional streaming is the most complex pattern. Use it only when the other three patterns are insufficient.

Deadlines & Timeouts

Every gRPC call should have a deadline. A deadline is an absolute point in time by which the call must complete. If the deadline passes, the call fails with DEADLINE_EXCEEDED.

Client sets deadline: now + 5 seconds
Client sends request to Service A
Service A calls Service B (propagates remaining deadline: 4.8 seconds)
Service B calls Service C (propagates remaining deadline: 4.2 seconds)

Deadlines propagate through the call chain. If the client sets a 5-second deadline and Service A takes 3 seconds, Service B has 2 seconds remaining. This prevents cascading timeouts where a slow downstream service holds up the entire chain.

Without deadlines, a failed service causes requests to pile up indefinitely. Servers run out of resources. Clients hang forever. Set deadlines on every call.

Google's API design guide recommends that servers should check the remaining deadline before starting expensive operations. If the deadline has already passed or there is insufficient time remaining, return DEADLINE_EXCEEDED immediately rather than wasting work.

Metadata

gRPC metadata is the equivalent of HTTP headers. It carries key-value pairs alongside the request and response, outside the protobuf message.

Common metadata uses:

Authentication — authorization: Bearer <token>
Request tracing — x-request-id: abc123, traceparent: 00-...
Client version — x-client-version: 2.1.0
Idempotency keys — idempotency-key: order-456-attempt-1

Metadata is sent as initial metadata (before the first message) or trailing metadata (after the last message). Trailing metadata is where servers typically put status details and debugging information.

Status Codes

gRPC has its own status code system, separate from HTTP status codes:

Code	Name	When to Use
0	OK	Success
1	CANCELLED	Client canceled the request
2	UNKNOWN	Server error without a specific code
3	INVALID_ARGUMENT	Client sent bad input (like HTTP 400)
4	DEADLINE_EXCEEDED	Operation timed out
5	NOT_FOUND	Resource does not exist (like HTTP 404)
6	ALREADY_EXISTS	Resource already exists (like HTTP 409)
7	PERMISSION_DENIED	Caller lacks permission (like HTTP 403)
9	FAILED_PRECONDITION	System not in required state for the operation
10	ABORTED	Concurrency conflict, retry may succeed
11	OUT_OF_RANGE	Value outside valid range
12	UNIMPLEMENTED	Method not implemented
13	INTERNAL	Internal server error (like HTTP 500)
14	UNAVAILABLE	Service temporarily unavailable, retry with backoff
16	UNAUTHENTICATED	Missing or invalid authentication (like HTTP 401)

The most commonly used: OK, INVALID_ARGUMENT, NOT_FOUND, PERMISSION_DENIED, INTERNAL, UNAVAILABLE, and UNAUTHENTICATED.

Use UNAVAILABLE (not INTERNAL) for transient errors that the client should retry. Use FAILED_PRECONDITION for logical errors where the client needs to change state before retrying (e.g., "account must be verified before placing an order").

Rich Error Details

The status code alone is often insufficient. gRPC supports rich error details using the google.rpc.Status message:

// The error includes structured details
message Status {
  int32 code = 1;
  string message = 2;
  repeated google.protobuf.Any details = 3;
}

Google provides standard error detail types:

BadRequest        — field violations with field name and description
PreconditionFailure — preconditions that were not met
RetryInfo         — how long to wait before retrying
ResourceInfo      — which resource was not found or already exists

This is similar to Stripe's structured error responses in REST: machine-readable codes, human-readable messages, and specific field-level details.

gRPC-Web for Browser Clients

Standard gRPC requires HTTP/2 with trailers, which browsers do not support directly. gRPC-Web is a protocol adaptation that works in browsers:

Browser  -->  gRPC-Web (HTTP/1.1 or HTTP/2)  -->  Envoy Proxy  -->  gRPC (HTTP/2)

A proxy (typically Envoy) translates between gRPC-Web and standard gRPC. The browser client uses a generated gRPC-Web client that looks similar to a standard gRPC client but uses a different wire format.

gRPC-Web supports unary and server streaming RPCs. Client streaming and bidirectional streaming are not supported in the browser environment.

If your primary consumers are browsers, REST or GraphQL is simpler. gRPC-Web is for organizations that want a unified RPC framework and are willing to run the proxy infrastructure.

Load Balancing

gRPC uses persistent HTTP/2 connections with multiplexed streams. This creates a load balancing challenge: a traditional L4 (TCP) load balancer assigns a connection to a backend, and all RPCs on that connection go to the same backend.

Client-Side Load Balancing

The client maintains a list of backends and distributes RPCs across them. This is common in Kubernetes with headless services:

Client discovers backends: [backend-1:50051, backend-2:50051, backend-3:50051]
RPC 1 -> backend-1
RPC 2 -> backend-2
RPC 3 -> backend-3
RPC 4 -> backend-1 (round-robin)

gRPC clients have built-in support for client-side load balancing with pluggable policies (round-robin, weighted, pick-first).

Proxy-Based Load Balancing

An L7 (application-level) proxy inspects individual gRPC requests and distributes them across backends. Envoy, Linkerd, and Istio all support gRPC-aware load balancing:

Client  -->  Envoy (L7 proxy)  -->  backend-1
                                -->  backend-2
                                -->  backend-3

The proxy sees each RPC as a separate request and can balance per-RPC, not per-connection. This is transparent to the client.

Recommendation: use proxy-based load balancing unless you have a specific reason to push balancing logic into clients. Proxies are easier to operate and update without changing client code.

Common Pitfalls

No deadlines — every gRPC call without a deadline is a potential resource leak. Clients hang, servers accumulate connections, and the system degrades under load. Always set deadlines.
Wrong status codes — returning INTERNAL for client errors, or UNKNOWN when a more specific code applies. Use INVALID_ARGUMENT for bad input, NOT_FOUND for missing resources, and UNAVAILABLE for transient failures.
Too many streaming RPCs — streaming adds complexity to error handling, backpressure, and connection management. Use unary RPCs for request-response patterns and streaming only when data genuinely arrives over time.
Ignoring backpressure — a server streaming faster than the client can consume. gRPC provides flow control via HTTP/2 windows, but application logic must respect it. Monitor stream buffer sizes.
L4 load balancing with persistent connections — TCP-level load balancers assign an entire HTTP/2 connection to one backend, creating hot spots. Use L7 proxies or client-side balancing.
Large messages — gRPC has a default message size limit of 4 MB. Sending larger messages requires configuration changes and often indicates a design problem. Paginate or stream instead.
No health checks — gRPC has a standard health checking protocol (grpc.health.v1.Health). Implement it so load balancers and orchestrators can detect unhealthy instances.

Key Takeaways

gRPC services are defined in .proto files with service and rpc declarations. The generated code provides type-safe clients and server interfaces.
Four RPC types: unary (one-to-one), server streaming (one-to-many), client streaming (many-to-one), and bidirectional streaming (many-to-many). Use unary for most operations; add streaming only when needed.
Always set deadlines on gRPC calls. Deadlines propagate through the call chain and prevent cascading failures.
Use the correct status code: INVALID_ARGUMENT for bad input, NOT_FOUND for missing resources, UNAVAILABLE for transient errors, PERMISSION_DENIED for authorization failures.
gRPC-Web enables browser access but requires a proxy (Envoy) and does not support all streaming modes. For browser-first APIs, REST or GraphQL is simpler.
Use L7 (application-level) load balancing for gRPC. L4 load balancers do not distribute individual RPCs across backends because gRPC multiplexes streams over persistent connections.