Writing Good Dockerfiles

The difference between a 50 MB image that starts in under a second and a 1.5 GB image that takes a minute is how you write your Dockerfile. Most Dockerfiles in production are mediocre: they work, but they are bloated, insecure, and slow to build. This file covers the patterns that separate production-grade images from quick prototypes.

Multi-Stage Builds

A multi-stage build uses multiple FROM statements. The first stage compiles or builds the application. The final stage copies only the output into a clean, minimal image. Build tools, source code, and intermediate files never make it into the production image.

# Stage 1: Build
FROM golang:1.22-alpine AS build
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /server .

# Stage 2: Production
FROM alpine:3.19
RUN apk --no-cache add ca-certificates
COPY --from=build /server /server
EXPOSE 8080
CMD ["/server"]

Without multi-stage:
  golang:1.22 base image:  ~800 MB
  + source code:            ~50 MB
  + build cache:           ~200 MB
  Total:                   ~1050 MB

With multi-stage:
  alpine:3.19 base image:   ~7 MB
  + compiled binary:        ~15 MB
  Total:                    ~22 MB

For Node.js applications:

# Stage 1: Install dependencies and build
FROM node:20-alpine AS build
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build

# Stage 2: Production
FROM node:20-alpine
WORKDIR /app
COPY --from=build /app/dist ./dist
COPY --from=build /app/node_modules ./node_modules
COPY --from=build /app/package.json ./
EXPOSE 3000
CMD ["node", "dist/main.js"]

For compiled languages like Go or Rust, you can use scratch (an empty image) or distroless as the final stage, since the binary is self-contained.

Distroless & Scratch Bases

scratch is a completely empty image. No shell, no package manager, no libc. The only thing in the image is what you copy into it. This produces the smallest possible images and the smallest possible attack surface.

FROM golang:1.22-alpine AS build
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o /server .

FROM scratch
COPY --from=build /server /server
EXPOSE 8080
ENTRYPOINT ["/server"]

distroless images from Google contain only the application runtime (no shell, no package manager) but include essential things like CA certificates and timezone data.

FROM golang:1.22 AS build
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o /server .

FROM gcr.io/distroless/static-debian12
COPY --from=build /server /server
EXPOSE 8080
CMD ["/server"]

The tradeoff: you cannot docker exec into a scratch or distroless container to debug. For production, this is a feature. For development, use a fuller base image.

Non-Root Users

By default, processes inside containers run as root. If an attacker exploits a vulnerability in your application, they get root access inside the container -- and potentially on the host if a container escape vulnerability exists.

FROM node:20-alpine
WORKDIR /app
COPY --from=build /app/dist ./dist
COPY --from=build /app/node_modules ./node_modules

# Create a non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup

# Change ownership of app files
RUN chown -R appuser:appgroup /app

# Switch to non-root user
USER appuser

EXPOSE 3000
CMD ["node", "dist/main.js"]

Some base images already include a non-root user. Node.js images have a node user. Distroless images run as nonroot by default.

FROM gcr.io/distroless/nodejs20-debian12
COPY --from=build /app/dist /app/dist
COPY --from=build /app/node_modules /app/node_modules
WORKDIR /app
USER nonroot
CMD ["dist/main.js"]

.dockerignore

Without a .dockerignore, every COPY . . sends your entire project directory to the Docker daemon -- including node_modules, .git, test fixtures, IDE configuration, and potentially secrets.

# .dockerignore
.git
.github
.env
.env.*
node_modules
dist
*.md
Dockerfile
docker-compose.yml
.dockerignore
coverage
.vscode
.idea
**/*.test.js
**/*.spec.js

A good .dockerignore makes builds faster (less data to send to the daemon), images smaller (less data to copy), and more secure (no secrets or unnecessary files in the image).

Layer Ordering for Cache Efficiency

Docker caches each layer. When a layer changes, all subsequent layers are invalidated. Order your instructions so that infrequently-changing steps come first.

# Optimal layer ordering for a Python app

# 1. Base image (changes rarely)
FROM python:3.12-slim

# 2. System dependencies (changes rarely)
RUN apt-get update && \
    apt-get install -y --no-install-recommends libpq-dev && \
    rm -rf /var/lib/apt/lists/*

# 3. Python dependencies (changes occasionally)
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# 4. Application code (changes frequently)
COPY . .

# 5. Startup command (changes rarely)
CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0"]

Cache behavior when only app code changes:
  Layer 1: FROM python:3.12-slim         -> CACHED
  Layer 2: RUN apt-get install ...       -> CACHED
  Layer 3: COPY requirements.txt         -> CACHED
  Layer 4: RUN pip install ...           -> CACHED
  Layer 5: COPY . .                      -> REBUILT (code changed)
  Layer 6: CMD [...]                     -> REBUILT (follows invalidated layer)

Build time: seconds instead of minutes.

Health Checks

Docker and orchestrators need to know if your application is healthy. A HEALTHCHECK instruction tells Docker how to test it.

FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install --no-cache-dir -r requirements.txt

HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
  CMD curl -f http://localhost:8000/health || exit 1

CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0"]

If curl is not available in your image (and it should not be in a minimal image), use a language-native health check:

HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
  CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1

In Kubernetes, you define health checks (liveness and readiness probes) in the pod spec instead of the Dockerfile.

Signal Handling: The PID 1 Problem

The first process in a container runs as PID 1. In Linux, PID 1 has special behavior: it does not receive default signal handlers. If your application does not explicitly handle SIGTERM, Docker's docker stop will wait for the timeout (default 10 seconds) and then send SIGKILL, which cannot be caught. This means no graceful shutdown: in-flight requests are dropped, database connections are not closed, and files may be corrupted.

# Problem: shell form runs via /bin/sh, which does not forward signals
CMD python main.py

# Solution: exec form runs the process directly as PID 1
CMD ["python", "main.py"]

For more complex startup scripts, use tini as an init process:

FROM python:3.12-slim
RUN apt-get update && apt-get install -y tini && rm -rf /var/lib/apt/lists/*
ENTRYPOINT ["tini", "--"]
CMD ["python", "main.py"]

Tini forwards signals properly and reaps zombie processes. Many base images now include tini or an equivalent. Alpine-based images have it available via apk add tini.

Reducing Image Size: Practical Techniques

Every megabyte matters. Larger images mean slower CI/CD, slower scaling, higher registry storage costs, and a larger attack surface.

# Technique 1: Use slim or alpine base images
FROM python:3.12-slim     # ~150 MB vs python:3.12 at ~1 GB

# Technique 2: Clean up package manager caches in the same RUN layer
RUN apt-get update && \
    apt-get install -y --no-install-recommends gcc libpq-dev && \
    pip install --no-cache-dir -r requirements.txt && \
    apt-get purge -y gcc && \
    apt-get autoremove -y && \
    rm -rf /var/lib/apt/lists/*

# Technique 3: Use --no-cache-dir with pip
RUN pip install --no-cache-dir -r requirements.txt

# Technique 4: Use npm ci --omit=dev for production Node.js
RUN npm ci --omit=dev

# Check image size
docker images my-app

# Analyze layers and find waste
docker history my-app:v1

Size comparison for a typical Go API server:
  golang:1.22 (no multi-stage):     ~1050 MB
  alpine:3.19 (multi-stage):          ~22 MB
  distroless (multi-stage):            ~18 MB
  scratch (multi-stage):               ~12 MB

A Complete Production Dockerfile

Putting it all together for a Go application:

FROM golang:1.22-alpine AS build
RUN apk --no-cache add ca-certificates
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o /server .

FROM scratch
COPY --from=build /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=build /server /server
EXPOSE 8080
USER 65534
ENTRYPOINT ["/server"]

This image is roughly 10-15 MB, runs as a non-root user (UID 65534 is the nobody user), has no shell or package manager (minimal attack surface), and starts in milliseconds.

Common Pitfalls

Installing dev dependencies in production images -- Build tools, test frameworks, and linters have no place in production images. Use multi-stage builds.
Using ADD when COPY will do -- ADD has hidden behavior (URL fetching, archive extraction). Use COPY for clarity. Use ADD only when you specifically need its extra features.
Not pinning base image versions -- FROM python:3 can change without warning. Pin to FROM python:3.12.3-slim for reproducible builds.
Storing secrets in images -- RUN --mount=type=secret or build arguments for secrets. Never COPY .env into an image.
One RUN per command -- Each RUN creates a layer. Combine related commands with && to reduce layers and image size.
Ignoring build context size -- A missing .dockerignore means Docker sends your .git directory (potentially hundreds of MB) to the daemon on every build.

Key Takeaways

Multi-stage builds are non-negotiable for production images -- they cut image size by 90% or more
Run as a non-root user. No exceptions in production.
Order Dockerfile instructions by change frequency: base image and dependencies first, application code last
Use .dockerignore to exclude unnecessary files from the build context
Handle PID 1 signals correctly: use exec form for CMD/ENTRYPOINT or add tini as an init process
Pin base image versions for reproducible builds
The goal is a small, secure, fast-starting image with minimal attack surface