Version Control
Version control systems (VCS) track changes to code over time, enabling collaboration, history, and rollback.
Git Internals
Git is a content-addressable filesystem. Every object is identified by its SHA-1 hash.
Object Types
Blob: File contents (no filename, just data).
Tree: Directory listing. Maps filenames to blob/tree hashes.
Commit: Snapshot of the project. Points to a tree + parent commit(s) + metadata (author, message, timestamp).
Tag: Named reference to a commit (with optional message and signature).
commit abc123
├── tree def456
│ ├── blob aaa111 → "src/main.rs"
│ ├── blob bbb222 → "Cargo.toml"
│ └── tree ccc333 → "src/"
│ ├── blob ddd444 → "src/lib.rs"
│ └── blob eee555 → "src/utils.rs"
├── parent: commit 789xyz
├── author: Alice <alice@example.com>
└── message: "Add user authentication"
Refs and HEAD
Refs: Named pointers to commits. Branches are refs: refs/heads/main → commit hash.
HEAD: Points to the current branch (or directly to a commit in detached HEAD state).
HEAD → refs/heads/main → commit abc123
The Index (Staging Area)
The index sits between the working directory and the repository. It holds the next commit's snapshot.
Working Directory → (git add) → Index/Staging → (git commit) → Repository
git add: Copy file from working directory to index.
git commit: Create a commit from the index contents.
Branching Strategies
Git Flow
main ─────────────────────────────────────────→ releases
└── develop ──────────────────────────────→ integration
├── feature/auth ────→ (merge back)
├── feature/search ──→ (merge back)
└── release/1.0 ────→ (merge to main + develop)
└── hotfix/critical → (merge to main + develop)
Branches: main (production), develop (integration), feature/* (new features), release/* (release prep), hotfix/* (urgent fixes).
Pros: Well-defined process. Clear release management. Good for versioned software.
Cons: Complex. Many branches to manage. Slow for continuous deployment.
GitHub Flow
main ──────────────────────────────→ always deployable
├── feature-branch → PR → review → merge
├── bugfix-branch → PR → review → merge
└── experiment → PR → review → merge
Simple: Branch from main, make changes, open PR, review, merge to main, deploy.
Pros: Simple. Fast. Good for continuous deployment. Low overhead.
Cons: Less structured release process. Main must always be deployable.
Trunk-Based Development
All developers commit to main (trunk) frequently (at least daily). Short-lived feature branches (< 1-2 days).
main ──●──●──●──●──●──●──●──●──●──●──●──→
└──branch──┘ └─branch─┘
(< 1 day) (< 1 day)
Feature flags hide incomplete features in production. Deploy continuously.
Pros: Minimal merge conflicts. Always integrated. Fast feedback. Used by Google, Facebook, Microsoft.
Cons: Requires feature flags. Requires strong CI. Discipline to keep main green.
Key Git Operations
Merge vs Rebase
Merge: Create a merge commit combining two branches. Preserves history.
main: A ── B ── C ──── M (merge commit)
feature: A ── B ── D ── E ↗
Rebase: Replay feature commits on top of main. Linear history.
Before: main: A-B-C feature: A-B-D-E
After rebase: main: A-B-C feature: A-B-C-D'-E'
Rebase advantages: Clean, linear history. Easy to read.
Rebase dangers: Rewrites history. Never rebase shared/published branches. git push --force needed after rebase (can destroy others' work).
Guideline: Rebase local, unpublished branches. Merge for shared branches and PRs.
Cherry-Pick
Apply a specific commit from one branch to another.
git cherry-pick abc123 # apply commit abc123 to current branch
Use case: Backport a bugfix from main to a release branch.
Bisect
Binary search through commit history to find the commit that introduced a bug.
git bisect start
git bisect bad # current commit has the bug
git bisect good v1.0 # this version was good
# Git checks out a commit in the middle
# Test it, then:
git bisect good # or git bisect bad
# Repeat until the bad commit is found
Automated: git bisect run ./test_script.sh — runs the script on each commit, automatically narrowing down.
Reflog
Records all changes to HEAD and branch tips. Your safety net for recovering lost commits.
git reflog
# Shows: checkout, commit, rebase, reset — every action
# Even after git reset --hard, commits are in the reflog for 30 days
git checkout HEAD@{3} # go back to 3 actions ago
Submodules and Worktrees
Submodules
Include one Git repository inside another as a dependency.
git submodule add https://github.com/lib/library.git vendor/library
# Records the dependency and pinned commit in .gitmodules
Pros: Pin exact dependency version. Full history of the dependency.
Cons: Complex workflow (must explicitly update submodule). Easy to forget git submodule update. Many teams prefer package managers instead.
Worktrees
Multiple working directories from a single repository. Work on multiple branches simultaneously without stashing.
git worktree add ../project-feature feature-branch
# Now ../project-feature is a separate working directory on feature-branch
Use case: Run tests on main while developing on a feature branch.
Hooks
Git hooks run scripts at specific points in the Git workflow.
| Hook | When | Use Case | |---|---|---| | pre-commit | Before commit | Lint, format, run fast tests | | commit-msg | After message written | Enforce commit message format | | pre-push | Before push | Run full test suite | | post-merge | After merge | Update dependencies | | pre-receive | Server-side, before accepting push | Enforce policies |
Husky (JS), pre-commit (Python), cargo-husky (Rust): Frameworks for managing hooks.
Monorepo Tools
Managing large monorepos (one repo, many projects).
Bazel (Google): Build system for monorepos. Hermetic builds. Caches aggressively. Supports many languages.
Nx (JS/TS): Monorepo toolkit. Dependency graph. Affected commands (only build/test what changed).
Turborepo: Fast monorepo build system. Remote caching. Incremental builds.
Advantages of monorepos: Atomic cross-project changes. Shared tooling. Easier code sharing. Single source of truth. (Used by Google, Facebook, Microsoft.)
Disadvantages: Large repo size. Slower cloning. Needs specialized tooling. Access control is harder.
Applications in CS
- Collaboration: Git is the universal standard for code collaboration. GitHub, GitLab, Bitbucket.
- Code review: PRs/MRs enable structured review. Comments, approvals, CI checks.
- CI/CD: Git push triggers automated build, test, and deploy pipelines.
- Auditing: Git history provides a complete audit trail of who changed what and when.
- Deployment: Git tags mark releases. Gitops (ArgoCD, Flux) deploy from Git state.
- Documentation: Docs versioned alongside code. Docs-as-code workflow.