Code Generation

When Writing Code Is Not the Answer

Some code shouldn't be written by hand. It's repetitive, follows rigid patterns, and the interesting decisions were already made elsewhere — in a schema, a spec, or a template. Writing it manually is slow, error-prone, and soul-crushing.

Code generation exists for exactly this: take a source of truth (a schema, a template, a configuration file) and mechanically produce the code that implements it. The generated code is consistent, correct, and free. The humans focus on the parts that actually require thought.

But code generation has a dark side. Over-applied, it creates systems that are harder to understand, harder to debug, and harder to modify than the hand-written code it replaced. The line between useful generation and over-engineering is real, and most teams end up on the wrong side of it at least once.

Types of Code Generation

Schema-driven generation

A schema defines the structure; a generator produces the code. This is the most common and most useful form of generation.

Source of truth        Generated output
----------------------------------------------
OpenAPI spec       ->  API client code, types, validation
Protobuf files     ->  gRPC service stubs, message types
GraphQL schema     ->  TypeScript types, query hooks
Database schema    ->  ORM models, migration files
JSON Schema        ->  Validation code, TypeScript interfaces

The value here is that the schema is the single source of truth. Change the schema, regenerate, and every consumer is updated. No more manually keeping API clients in sync with the server, or TypeScript types in sync with the database.

Example: OpenAPI code generation

# Generate a TypeScript API client from an OpenAPI spec
npx openapi-typescript-codegen \
  --input api-spec.yaml \
  --output src/generated/api-client \
  --client axios

# What you get:
# src/generated/api-client/
#   models/User.ts          - TypeScript interface for User
#   models/Order.ts         - TypeScript interface for Order
#   services/UserService.ts - API client with typed methods
#   services/OrderService.ts

Now when the API adds a field to the User model, you regenerate and the TypeScript types update automatically. No manual sync required.

Example: Protobuf generation

# user.proto defines the service contract
syntax = "proto3";

service UserService {
  rpc GetUser (GetUserRequest) returns (User);
  rpc CreateUser (CreateUserRequest) returns (User);
}

message User {
  string id = 1;
  string email = 2;
  string name = 3;
}

# Generate Go server and client code
protoc --go_out=. --go-grpc_out=. user.proto

# You get typed request/response structs and a service interface
# to implement. The networking, serialization, and routing are handled.

Template-based scaffolding

Templates generate files for common patterns in your project. Unlike schema-driven generation, the source of truth is a template plus a name or configuration.

# Scaffolding a new microservice
scaffold new-service payment-service

# Generates:
# payment-service/
#   cmd/server/main.go
#   internal/handler/handler.go
#   internal/service/service.go
#   internal/repository/repository.go
#   Dockerfile
#   Makefile
#   README.md
#   .github/workflows/ci.yml

The value is consistency and speed. Every new service starts with the same structure, the same CI pipeline, the same Dockerfile. Teams don't waste time debating project layout because the generator enforces it.

AST-based generation

The most powerful and most dangerous form. These tools parse your source code, understand its structure, and generate code based on it.

Source code                 Generated output
----------------------------------------------
Go struct tags          ->  JSON serialization, database queries
TypeScript decorators   ->  API routes, validation, documentation
Java annotations        ->  Dependency injection, ORM mapping
Rust derive macros      ->  Trait implementations

Example: Go code generation with struct tags

// You write:
type User struct {
    ID    string `json:"id" db:"id"`
    Email string `json:"email" db:"email"`
    Name  string `json:"name" db:"name"`
}

// go generate produces:
// - SQL queries for CRUD operations
// - JSON marshaling/unmarshaling
// - Validation functions
// All derived from the struct definition and tags

One-off generation scripts

Sometimes you need to generate code once, not as part of an ongoing pipeline. A quick script that generates boilerplate from a list of inputs.

#!/usr/bin/env bash
set -euo pipefail

# Generate handler functions from a list of endpoints
while IFS=, read -r method path handler; do
    cat << EOF

func ${handler}Handler(w http.ResponseWriter, r *http.Request) {
    // TODO: implement ${method} ${path}
    w.WriteHeader(http.StatusNotImplemented)
}
EOF
done < endpoints.csv

This is throwaway automation — write the script, run it, delete it. The generated code is what you keep.

When to Generate Code vs Write It

Generate when

The source of truth exists elsewhere. If a schema, spec, or configuration already defines the structure, generate the implementation. Writing it by hand means maintaining two representations of the same thing, which will inevitably diverge.

The pattern is completely mechanical. If you could describe the generation rules to a non-programmer and they could do it (slowly), it's a candidate for generation. CRUD endpoints from a database schema. TypeScript types from a JSON schema. Test stubs from function signatures.

Consistency matters more than flexibility. When every instance of a pattern must follow exactly the same structure — API clients, serialization code, database models — generation enforces consistency that hand-writing cannot.

The volume makes hand-writing impractical. 200 API endpoints. 50 database tables. 100 event types. At these scales, hand-writing is not just slow — it's a source of bugs.

Write by hand when

The code requires judgment. Business logic, error handling strategies, algorithm selection, user experience decisions — these require human thought that generation cannot provide.

The pattern has high variance. If every instance of the pattern is significantly different, a generator either produces lowest-common-denominator output that needs heavy modification, or it becomes so configurable that it's harder to use than writing the code directly.

The generated code would be hard to debug. If something goes wrong, can you read the generated code and understand it? If the generation is so complex that the output is opaque, you've traded one problem (manual coding) for a worse one (debugging generated code you don't understand).

The generation tooling is more complex than the code it produces. If your code generator is 500 lines to produce 200 lines of output, the economics don't work. You now maintain a generator instead of maintaining code, and the generator is harder to understand.

Managing Generated Code

Mark it clearly

Generated files should be immediately recognizable:

// Code generated by openapi-codegen. DO NOT EDIT.
// Source: api-spec.yaml
// Generated at: 2026-04-18T10:30:00Z

package api

type User struct {
    // ...
}

The comment serves two purposes: it tells developers not to modify the file (their changes will be overwritten), and it tells tools like linters to potentially skip the file.

Version control strategy

There are two schools of thought:

Commit generated code. The generated files are in the repo. PRs show the generated diff alongside source changes. CI doesn't need the generation tools installed.

Pros: Easy to review generated changes. No build dependency on generators.
Cons: Noisy diffs. Risk of generated files getting out of sync with source.

Generate at build time. Only the source (schemas, templates) is committed. CI runs the generator as a build step.

Pros: Source of truth is always source. No sync issues.
Cons: Reviewers can't see the generated code in PRs. Build is slower.

For most teams, committing generated code is pragmatically better. Add a CI check that regenerates and verifies the output matches what's committed:

# CI step: verify generated code is up to date
make generate
git diff --exit-code || (echo "Generated code is out of date. Run 'make generate'." && exit 1)

Regeneration workflow

Make regeneration trivial:

# Makefile
generate: ## Regenerate all generated code
	protoc --go_out=. --go-grpc_out=. proto/*.proto
	npx openapi-typescript-codegen --input api-spec.yaml --output src/generated
	go generate ./...
	@echo "Generation complete. Review changes with 'git diff'"

One command regenerates everything. No one needs to remember which tools to run or in what order.

Scaffolding Tools

Scaffolding is code generation for new files and projects. It's less about ongoing generation and more about getting started quickly.

Built-in framework scaffolding

Most frameworks include scaffolding commands:

# Rails
rails generate model User name:string email:string
rails generate controller Users index show

# Angular
ng generate component user-profile
ng generate service auth

# Next.js (with create-next-app)
npx create-next-app@latest my-app --typescript

Custom scaffolding

For patterns specific to your project, build your own scaffolding:

# Using Plop.js
npx plop component     # Interactive prompt for component name
npx plop service       # Interactive prompt for service name

# Using Cookiecutter (Python ecosystem)
cookiecutter gh:your-org/microservice-template

# Using a simple shell script
./scripts/new-endpoint.sh POST /api/users CreateUser

The key principle: scaffolding should produce files that are immediately usable. If the developer has to modify the scaffolded code extensively before it works, the template needs updating.

Common Pitfalls

Generating code that nobody understands. If the team can't read, debug, and reason about the generated output, the generator is a liability. Generated code should look like code a human would write.

Over-relying on generation to avoid learning. Using a code generator because you don't understand the underlying patterns means you can't debug issues when the generator produces unexpected output. Understand the domain first, then automate it.

Not regenerating when the source changes. If the API spec is updated but nobody reruns the code generator, the generated client is now wrong. CI checks that verify generated code is up to date prevent this.

Generators that require complex configuration. If your code generator needs a 200-line configuration file to produce useful output, the tool is too complex. Good generators have sensible defaults and minimal required configuration.

Treating generation as a substitute for abstraction. If you're generating 50 nearly identical files, the real answer might be a shared abstraction (a base class, a generic function, a higher-order component) instead of 50 generated copies.

Building custom generators too early. Before building a generator, check if one already exists. The OpenAPI, Protobuf, and GraphQL ecosystems have mature generation tools. Don't reinvent them.

Key Takeaways

Code generation is most valuable when a single source of truth (schema, spec, template) can mechanically produce code that would otherwise be written by hand. The canonical examples are API clients from OpenAPI specs and types from database schemas.

Generate mechanical code. Write code that requires judgment. The line between the two is usually clear: if you can describe the transformation rules completely, it's mechanical.

Always mark generated files clearly and provide a single command to regenerate everything. CI should verify generated code is in sync with its source.

Scaffolding is code generation for new files and projects. Use framework built-in tools when available, and build custom scaffolding for patterns specific to your project.

The danger zone is over-engineering: generators that are more complex than the code they produce, generated output that nobody can debug, and generation as a substitute for proper abstractions. When in doubt, write the code by hand.