Docker Fundamentals

Containers solve the "works on my machine" problem. A container packages an application with everything it needs to run -- code, runtime, libraries, system tools -- into a single portable unit. If it runs in a container on your laptop, it runs the same way in CI, staging, and production. Docker is the tool that made containers practical for everyday development.

Containers Are Not VMs

A virtual machine runs a full operating system on top of a hypervisor. A container shares the host operating system's kernel and isolates only the application's processes and filesystem. This distinction matters for performance and resource usage.

Virtual Machine:                     Container:
+---------------------------+        +---------------------------+
| App A      | App B        |        | App A      | App B        |
| Bins/Libs  | Bins/Libs    |        | Bins/Libs  | Bins/Libs    |
| Guest OS   | Guest OS     |        +---------------------------+
+---------------------------+        | Container Runtime (Docker) |
| Hypervisor                |        +---------------------------+
+---------------------------+        | Host OS Kernel             |
| Host OS Kernel            |        +---------------------------+
+---------------------------+        | Hardware                   |
| Hardware                  |
+---------------------------+

VM: 1-2 GB per instance, minutes to start
Container: 10-100 MB per instance, milliseconds to start

VMs provide stronger isolation (separate kernels). Containers provide faster startup, lower overhead, and higher density. For most application workloads, containers are the right choice. For true multi-tenant isolation or running different operating systems, VMs are still necessary.

Core Concepts

Images

An image is a read-only template that contains everything needed to run an application. It is built from a Dockerfile and stored in a registry. Images are identified by a repository name and a tag.

# Image naming convention
registry/repository:tag

# Examples
docker.io/library/python:3.12-slim
ghcr.io/myorg/api-server:v1.2.3
my-registry.example.com/backend:latest

The latest tag is a convention, not a guarantee. It points to whatever was most recently pushed without an explicit tag. In production, always use specific tags or digests.

Containers

A container is a running instance of an image. You can run multiple containers from the same image. Each container has its own filesystem, network interface, and process space, but shares the host kernel.

# Run a container from an image
docker run -d --name my-api -p 8080:8080 api-server:v1.2.3

# List running containers
docker ps

# View logs
docker logs my-api

# Execute a command inside a running container
docker exec -it my-api /bin/sh

# Stop and remove
docker stop my-api
docker rm my-api

Registries

A registry stores and distributes images. Docker Hub is the default public registry. For private images, use a cloud provider's registry or self-host one.

# Push an image to a registry
docker tag api-server:v1.2.3 ghcr.io/myorg/api-server:v1.2.3
docker push ghcr.io/myorg/api-server:v1.2.3

# Pull an image from a registry
docker pull ghcr.io/myorg/api-server:v1.2.3

The Dockerfile

A Dockerfile is a text file with instructions for building an image. Each instruction creates a layer.

# Start from a base image
FROM python:3.12-slim

# Set the working directory inside the container
WORKDIR /app

# Copy dependency file first (for cache efficiency)
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Expose the port the app listens on
EXPOSE 8000

# Define the command to run the application
CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Key Instructions

FROM sets the base image. Every Dockerfile starts with FROM. Choose the smallest base that has what you need.

FROM python:3.12-slim    # Debian-based, ~150MB, has pip and common libs
FROM python:3.12-alpine  # Alpine-based, ~50MB, but can cause issues with C extensions
FROM ubuntu:22.04        # Full Ubuntu, ~77MB, use when you need apt packages
FROM scratch             # Empty image, ~0MB, for statically compiled binaries

COPY copies files from the build context (your local filesystem) into the image. ADD does the same but also handles URLs and auto-extracts archives. Prefer COPY -- it is simpler and more explicit.

RUN executes a command during the build. Each RUN creates a new layer. Combine related commands to reduce layers.

# Bad: three layers for one logical operation
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*

# Good: one layer
RUN apt-get update && \
    apt-get install -y curl && \
    rm -rf /var/lib/apt/lists/*

CMD sets the default command when a container starts. There can be only one CMD. If you specify multiple, only the last one takes effect.

ENTRYPOINT sets the executable that runs when the container starts. CMD arguments are appended to it.

# ENTRYPOINT + CMD pattern
ENTRYPOINT ["python", "-m", "uvicorn"]
CMD ["main:app", "--host", "0.0.0.0", "--port", "8000"]

# Running the container:
docker run my-app                          # runs: python -m uvicorn main:app --host 0.0.0.0 --port 8000
docker run my-app other:app --port 9000    # runs: python -m uvicorn other:app --port 9000

Layers & Caching

Docker images are built in layers. Each instruction in a Dockerfile creates a layer. Docker caches layers and reuses them if the instruction and all preceding layers have not changed.

Layer 1: FROM python:3.12-slim        (cached from base image)
Layer 2: COPY requirements.txt .       (cached if requirements.txt hasn't changed)
Layer 3: RUN pip install ...           (cached if Layer 2 is cached)
Layer 4: COPY . .                      (invalidated if any source file changed)
Layer 5: CMD [...]                     (invalidated because Layer 4 was invalidated)

This is why you copy dependency files before source code. Dependencies change rarely; source code changes constantly. If you copied everything at once, changing a single line of code would reinstall all dependencies.

# Good: dependencies cached separately
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .

# Bad: any code change invalidates the dependency install
COPY . .
RUN pip install -r requirements.txt

Building & Running

# Build an image from a Dockerfile in the current directory
docker build -t my-app:v1 .

# Build with a specific Dockerfile
docker build -t my-app:v1 -f Dockerfile.prod .

# Run interactively
docker run -it my-app:v1 /bin/sh

# Run in the background with port mapping and environment variables
docker run -d \
  --name my-app \
  -p 8080:8000 \
  -e DATABASE_URL="postgres://localhost:5432/mydb" \
  my-app:v1

# Mount a local directory for development
docker run -d \
  --name my-app \
  -p 8080:8000 \
  -v $(pwd)/src:/app/src \
  my-app:v1

Why Containers Matter

Before containers, deployment was brittle. Applications depended on specific versions of libraries, language runtimes, and system packages installed on the host. Deploying to a new server meant replicating the exact environment -- manually. Configuration drift was inevitable.

Without containers:
  Developer: "It works on my machine."
  Ops: "Your machine has Python 3.11. Production has Python 3.9."
  Developer: "Also I have libssl 1.1. What does production have?"
  Ops: "Let me check... 1.0."
  Developer: "That explains the segfault."

With containers:
  Developer: "Here's the image. It has everything baked in."
  Ops: "It runs. Same as it did on your machine."

Containers standardize the unit of deployment. The image is the artifact. It runs the same everywhere. This unlocked everything else in modern DevOps: reproducible CI builds, immutable deployments, easy scaling, and consistent environments from development through production.

Common Pitfalls

Using latest in production -- The latest tag is mutable. A docker pull today gives a different image than yesterday. Always pin to a specific tag or digest in production.
Running as root -- By default, containers run as root. If an attacker escapes the container, they have root on the host. Always add a non-root user.
Ignoring image size -- A 1.5 GB image takes longer to build, push, pull, and start. Use slim base images, multi-stage builds, and clean up after package installs.
Not using .dockerignore -- Without a .dockerignore, docker build sends your entire directory (including node_modules, .git, and test data) to the Docker daemon. This slows builds and can leak sensitive files into images.
Treating containers like VMs -- SSH-ing into a running container to make changes defeats the purpose. Containers should be immutable. Fix the Dockerfile, rebuild, redeploy.
Ignoring the PID 1 problem -- The first process in a container (PID 1) must handle signals properly. If your app does not handle SIGTERM, Docker will wait 10 seconds and then SIGKILL it, causing ungraceful shutdowns.

Key Takeaways

Containers package an application with all its dependencies into a portable, reproducible unit
Containers share the host kernel and are much lighter than VMs -- they start in milliseconds and use megabytes, not gigabytes
A Dockerfile defines how to build an image; an image is a read-only template; a container is a running instance of an image
Layer caching is critical for fast builds -- order your Dockerfile so that infrequently-changing layers come first
Always use specific image tags in production, run as a non-root user, and keep images small
Containers are the foundation of modern deployment: they standardize the artifact and eliminate environment drift