Containers & Docker
What Are Containers?
Containers package an application with all its dependencies into a portable, isolated unit that runs consistently across environments. Unlike virtual machines, containers share the host kernel, making them lightweight and fast to start.
Virtual Machine: Container:
┌─────────────────┐ ┌─────────────────┐
│ Application │ │ Application │
├─────────────────┤ ├─────────────────┤
│ Libraries │ │ Libraries │
├─────────────────┤ ├─────────────────┤
│ Guest OS │ │ (no guest OS) │
├─────────────────┤ └────────┬────────┘
│ Hypervisor │ │
├─────────────────┤ ┌────────┴────────┐
│ Host OS │ │ Host OS │
└─────────────────┘ └─────────────────┘
A VM includes an entire guest operating system (gigabytes). A container shares the host kernel and only packages the application and its direct dependencies (megabytes).
Docker Internals
Docker uses three Linux kernel features to isolate containers:
Namespaces
Namespaces give each container its own isolated view of system resources:
| Namespace | Isolates | Effect | |-----------|----------|--------| | PID | Process IDs | Container sees only its own processes; PID 1 is the container's entrypoint | | NET | Network stack | Container gets its own IP address, ports, routing table | | MNT | Filesystem mounts | Container has its own root filesystem | | UTS | Hostname | Container can have its own hostname | | IPC | Inter-process communication | Shared memory and semaphores are isolated | | USER | User/group IDs | Root inside container can map to non-root on host |
cgroups (Control Groups)
cgroups limit and account for resource usage:
- CPU -- limit a container to N cores or a percentage of CPU time.
- Memory -- set a hard memory limit; the kernel OOM-kills the container if exceeded.
- I/O -- throttle disk read/write bandwidth.
- PIDs -- limit the number of processes (prevents fork bombs).
These limits are what Kubernetes resources.requests and resources.limits configure under the hood.
Union Filesystem (OverlayFS)
Docker images are built in layers. Each Dockerfile instruction creates a new layer. Layers are read-only and shared between images.
Layer 4: COPY target/release/myapp (application binary, ~10 MB)
Layer 3: RUN apt-get install ... (runtime deps, ~30 MB)
Layer 2: debian:bookworm-slim (minimal OS, ~80 MB)
Layer 1: (base)
When a container runs, a thin read-write layer is added on top. This is why containers start almost instantly -- they do not copy the image; they overlay a writable layer on shared read-only layers.
Multi-Stage Dockerfile for Rust
The Rust toolchain is large (~1.5 GB). You do not want it in your production image. Multi-stage builds solve this by using one stage to compile and another to package only the binary.
# ── Stage 1: Build ──────────────────────────────────────────────
FROM rust:1.77 AS builder
WORKDIR /app
# Copy manifests first for dependency caching.
COPY Cargo.toml Cargo.lock ./
# Create a dummy main.rs to build dependencies only.
# This layer is cached unless Cargo.toml or Cargo.lock change.
RUN mkdir src && echo "fn main() {}" > src/main.rs
RUN cargo build --release
RUN rm -rf src
# Now copy real source and build.
COPY src ./src
RUN cargo build --release
# ── Stage 2: Runtime (minimal image) ───────────────────────────
FROM debian:bookworm-slim
RUN apt-get update \
&& apt-get install -y --no-install-recommends ca-certificates \
&& rm -rf /var/lib/apt/lists/*
# Run as non-root user.
RUN useradd --create-home appuser
USER appuser
COPY --from=builder /app/target/release/myapp /usr/local/bin/
EXPOSE 8080
CMD ["myapp"]
Why multi-stage matters
| | Build image | Runtime image | |---|---|---| | Size | ~1.5 GB (Rust toolchain + deps) | ~90 MB (binary + minimal OS) | | Attack surface | Compiler, build tools, source code | Only the binary and CA certs | | Deploy speed | Slow to pull | Fast to pull |
Dependency caching trick
The "dummy main.rs" step is important. Docker caches layers by content hash. If you copy src/ before building dependencies, any source change invalidates the dependency cache and forces a full rebuild. By building dependencies in a separate step, they are only rebuilt when Cargo.toml or Cargo.lock changes.
Image Optimization
Size reduction techniques
- Use slim or distroless base images.
debian:bookworm-slimis ~80 MB.gcr.io/distroless/cc-debian12is ~20 MB. Alpine is ~5 MB but uses musl libc (can cause issues with some Rust crates). - Multi-stage builds. Covered above. Never ship the compiler.
- Minimize layers. Combine
RUNcommands where logical to reduce layer count. - Remove package manager caches. Always add
&& rm -rf /var/lib/apt/lists/*afterapt-get install. - Use
.dockerignore. Excludetarget/,.git/, test data, and documentation from the build context.
Example .dockerignore:
target/
.git/
.github/
*.md
tests/
benches/
Static linking for minimal images
For the smallest possible image, compile a fully static binary and use scratch (empty) or distroless:
FROM rust:1.77 AS builder
RUN rustup target add x86_64-unknown-linux-musl
WORKDIR /app
COPY . .
RUN cargo build --release --target x86_64-unknown-linux-musl
FROM scratch
COPY --from=builder /app/target/x86_64-unknown-linux-musl/release/myapp /myapp
EXPOSE 8080
ENTRYPOINT ["/myapp"]
This produces an image that contains only your binary -- typically under 20 MB. The downside: no shell for debugging, no package manager, no CA certificates (bundle them in or use distroless).
Container Security
Principle of least privilege
- Run as non-root. Add
USER appuserin your Dockerfile. Never run production containers as root. - Read-only filesystem. Use
--read-onlyflag or KubernetesreadOnlyRootFilesystem: true. Write only to explicitly mounted volumes. - Drop capabilities. Containers inherit Linux capabilities by default. Drop all and add back only what is needed.
Image scanning
Scan images for known vulnerabilities before deploying:
- Trivy (
trivy image myapp:latest) -- fast, open-source scanner. - Grype (
grype myapp:latest) -- alternative from Anchore. - GitHub Dependabot / Container Scanning -- integrated into CI.
Run scans in CI so vulnerable images never reach production:
- name: Scan image
uses: aquasecurity/trivy-action@master
with:
image-ref: myapp:${{ github.sha }}
severity: CRITICAL,HIGH
exit-code: 1
Image signing
Sign images to verify they came from your CI pipeline and were not tampered with:
- cosign (from Sigstore) signs and verifies container images.
- Kubernetes admission controllers (e.g., Kyverno, OPA Gatekeeper) can enforce that only signed images are deployed.
Base image hygiene
- Pin base image versions:
FROM debian:bookworm-slim@sha256:abc123...notFROM debian:latest. - Rebuild images regularly to pick up security patches in base layers.
- Use
docker scoutortrivyto monitor base image CVEs.
Common Mistakes
- Using
latesttag.latestis mutable -- you never know what version you are running. Tag with the Git SHA:myapp:a1b2c3d. - Running as root. Default Docker containers run as root. A container escape with root privileges compromises the host.
- Large images. Shipping the entire build toolchain. Fix: multi-stage builds.
- No
.dockerignore. Sending the entire.git/directory andtarget/folder to the Docker daemon. Slows builds and leaks information. - "It works on my machine" Docker. Dockerfile depends on host state (cached layers, local files not in the build context). Fix: build from a clean CI environment.
Key Takeaways
- Containers solve "works on my machine" by packaging the application with its dependencies.
- Docker uses namespaces (isolation), cgroups (resource limits), and union filesystems (efficient layering).
- Multi-stage builds are essential for Rust: build in a full toolchain image, run in a minimal runtime image.
- Image size matters: smaller images deploy faster and have less attack surface.
- Security is not optional: run as non-root, scan for vulnerabilities, sign images, pin base versions.