4 min read
On this page

Version Control

Version control systems (VCS) track changes to code over time, enabling collaboration, history, and rollback.

Git Internals

Git is a content-addressable filesystem. Every object is identified by its SHA-1 hash.

Object Types

Blob: File contents (no filename, just data).

Tree: Directory listing. Maps filenames to blob/tree hashes.

Commit: Snapshot of the project. Points to a tree + parent commit(s) + metadata (author, message, timestamp).

Tag: Named reference to a commit (with optional message and signature).

commit abc123
├── tree def456
│   ├── blob aaa111  → "src/main.rs"
│   ├── blob bbb222  → "Cargo.toml"
│   └── tree ccc333  → "src/"
│       ├── blob ddd444  → "src/lib.rs"
│       └── blob eee555  → "src/utils.rs"
├── parent: commit 789xyz
├── author: Alice <alice@example.com>
└── message: "Add user authentication"

Refs and HEAD

Refs: Named pointers to commits. Branches are refs: refs/heads/main → commit hash.

HEAD: Points to the current branch (or directly to a commit in detached HEAD state).

HEAD → refs/heads/main → commit abc123

The Index (Staging Area)

The index sits between the working directory and the repository. It holds the next commit's snapshot.

Working Directory → (git add) → Index/Staging → (git commit) → Repository

git add: Copy file from working directory to index. git commit: Create a commit from the index contents.

Branching Strategies

Git Flow

main ─────────────────────────────────────────→ releases
  └── develop ──────────────────────────────→ integration
         ├── feature/auth ────→ (merge back)
         ├── feature/search ──→ (merge back)
         └── release/1.0 ────→ (merge to main + develop)
               └── hotfix/critical → (merge to main + develop)

Branches: main (production), develop (integration), feature/* (new features), release/* (release prep), hotfix/* (urgent fixes).

Pros: Well-defined process. Clear release management. Good for versioned software.

Cons: Complex. Many branches to manage. Slow for continuous deployment.

GitHub Flow

main ──────────────────────────────→ always deployable
  ├── feature-branch → PR → review → merge
  ├── bugfix-branch  → PR → review → merge
  └── experiment     → PR → review → merge

Simple: Branch from main, make changes, open PR, review, merge to main, deploy.

Pros: Simple. Fast. Good for continuous deployment. Low overhead.

Cons: Less structured release process. Main must always be deployable.

Trunk-Based Development

All developers commit to main (trunk) frequently (at least daily). Short-lived feature branches (< 1-2 days).

main ──●──●──●──●──●──●──●──●──●──●──●──→
        └──branch──┘  └─branch─┘
        (< 1 day)      (< 1 day)

Feature flags hide incomplete features in production. Deploy continuously.

Pros: Minimal merge conflicts. Always integrated. Fast feedback. Used by Google, Facebook, Microsoft.

Cons: Requires feature flags. Requires strong CI. Discipline to keep main green.

Key Git Operations

Merge vs Rebase

Merge: Create a merge commit combining two branches. Preserves history.

main:    A ── B ── C ──── M (merge commit)
feature: A ── B ── D ── E ↗

Rebase: Replay feature commits on top of main. Linear history.

Before:  main: A-B-C    feature: A-B-D-E
After rebase:  main: A-B-C    feature: A-B-C-D'-E'

Rebase advantages: Clean, linear history. Easy to read.

Rebase dangers: Rewrites history. Never rebase shared/published branches. git push --force needed after rebase (can destroy others' work).

Guideline: Rebase local, unpublished branches. Merge for shared branches and PRs.

Cherry-Pick

Apply a specific commit from one branch to another.

git cherry-pick abc123   # apply commit abc123 to current branch

Use case: Backport a bugfix from main to a release branch.

Bisect

Binary search through commit history to find the commit that introduced a bug.

git bisect start
git bisect bad          # current commit has the bug
git bisect good v1.0    # this version was good
# Git checks out a commit in the middle
# Test it, then:
git bisect good  # or  git bisect bad
# Repeat until the bad commit is found

Automated: git bisect run ./test_script.sh — runs the script on each commit, automatically narrowing down.

Reflog

Records all changes to HEAD and branch tips. Your safety net for recovering lost commits.

git reflog
# Shows: checkout, commit, rebase, reset — every action
# Even after git reset --hard, commits are in the reflog for 30 days
git checkout HEAD@{3}   # go back to 3 actions ago

Submodules and Worktrees

Submodules

Include one Git repository inside another as a dependency.

git submodule add https://github.com/lib/library.git vendor/library
# Records the dependency and pinned commit in .gitmodules

Pros: Pin exact dependency version. Full history of the dependency.

Cons: Complex workflow (must explicitly update submodule). Easy to forget git submodule update. Many teams prefer package managers instead.

Worktrees

Multiple working directories from a single repository. Work on multiple branches simultaneously without stashing.

git worktree add ../project-feature feature-branch
# Now ../project-feature is a separate working directory on feature-branch

Use case: Run tests on main while developing on a feature branch.

Hooks

Git hooks run scripts at specific points in the Git workflow.

| Hook | When | Use Case | |---|---|---| | pre-commit | Before commit | Lint, format, run fast tests | | commit-msg | After message written | Enforce commit message format | | pre-push | Before push | Run full test suite | | post-merge | After merge | Update dependencies | | pre-receive | Server-side, before accepting push | Enforce policies |

Husky (JS), pre-commit (Python), cargo-husky (Rust): Frameworks for managing hooks.

Monorepo Tools

Managing large monorepos (one repo, many projects).

Bazel (Google): Build system for monorepos. Hermetic builds. Caches aggressively. Supports many languages.

Nx (JS/TS): Monorepo toolkit. Dependency graph. Affected commands (only build/test what changed).

Turborepo: Fast monorepo build system. Remote caching. Incremental builds.

Advantages of monorepos: Atomic cross-project changes. Shared tooling. Easier code sharing. Single source of truth. (Used by Google, Facebook, Microsoft.)

Disadvantages: Large repo size. Slower cloning. Needs specialized tooling. Access control is harder.

Applications in CS

  • Collaboration: Git is the universal standard for code collaboration. GitHub, GitLab, Bitbucket.
  • Code review: PRs/MRs enable structured review. Comments, approvals, CI checks.
  • CI/CD: Git push triggers automated build, test, and deploy pipelines.
  • Auditing: Git history provides a complete audit trail of who changed what and when.
  • Deployment: Git tags mark releases. Gitops (ArgoCD, Flux) deploy from Git state.
  • Documentation: Docs versioned alongside code. Docs-as-code workflow.