4 min read
On this page

Database Fundamentals

A database is an organized collection of structured data. A Database Management System (DBMS) is the software that manages databases — handling storage, retrieval, concurrency, recovery, and security.

Why Databases?

Without a DBMS, applications must handle:

  • Data storage format and layout
  • Concurrent access by multiple users
  • Crash recovery (partial writes, corruption)
  • Efficient querying (indexes, optimization)
  • Access control and security
  • Data integrity constraints

The DBMS provides all of this as a service, letting applications focus on business logic.

DBMS Architecture

DBMS component architecture — parser, optimizer, executor, storage

┌──────────────────────────────────────┐
│        Client Applications           │
├──────────────────────────────────────┤
│        Query Interface (SQL)         │
├──────────────────────────────────────┤
│  Query Parser → Optimizer → Executor │
├──────────────────────────────────────┤
│   Transaction Manager │ Lock Manager │
├──────────────────────────────────────┤
│   Buffer Pool Manager                │
├──────────────────────────────────────┤
│   Storage Engine (Disk Manager)      │
├──────────────────────────────────────┤
│   File System / Raw Disk             │
└──────────────────────────────────────┘

Key Components

Query Parser: Validates SQL syntax, checks permissions, resolves names.

Query Optimizer: Generates and evaluates execution plans. Chooses the cheapest plan based on statistics and cost models.

Execution Engine: Executes the chosen plan. Implements operators (scan, join, sort, aggregate).

Transaction Manager: Ensures ACID properties. Manages commit/rollback, isolation levels.

Buffer Pool: Caches disk pages in memory. Manages page replacement (similar to OS page cache but application-controlled).

Storage Engine: Manages on-disk data structures (B-trees, heap files, WAL).

Data Models

Relational Model

Data organized into tables (relations) of rows (tuples) and columns (attributes). The dominant model since the 1970s.

Users:
| id | name    | email           | age |
|----|---------|-----------------|-----|
| 1  | Alice   | alice@mail.com  | 30  |
| 2  | Bob     | bob@mail.com    | 25  |

Strengths: Strong consistency (ACID), powerful query language (SQL), well-understood theory (normalization, relational algebra), mature ecosystem.

Systems: PostgreSQL, MySQL, SQLite, Oracle, SQL Server, CockroachDB.

Document Model

Data stored as documents (JSON/BSON). Schema-flexible — each document can have different fields.

{
    "id": "1",
    "name": "Alice",
    "email": "alice@mail.com",
    "addresses": [
        {"city": "NYC", "zip": "10001"},
        {"city": "SF", "zip": "94102"}
    ]
}

Strengths: Flexible schema (agile development), natural mapping to application objects, easy horizontal scaling.

Systems: MongoDB, CouchDB, Amazon DocumentDB, Firestore.

Key-Value Model

Simplest model. Each entry is a (key, value) pair. Value is opaque to the database.

Strengths: Extremely fast (O(1) lookups), simple API (get/set/delete), easy to scale.

Systems: Redis, Memcached, DynamoDB, etcd, RocksDB.

Graph Model

Data modeled as nodes (entities) and edges (relationships). Natural for interconnected data.

(Alice)-[:FRIENDS_WITH]->(Bob)
(Alice)-[:WORKS_AT]->(Acme Corp)
(Bob)-[:WORKS_AT]->(Acme Corp)

Strengths: Efficient traversal of relationships. Natural for social networks, knowledge graphs, recommendation engines.

Systems: Neo4j, Amazon Neptune, JanusGraph, ArangoDB (multi-model).

Columnar Model

Store data by column instead of by row. Each column is stored contiguously on disk.

Row-oriented (OLTP): [id,name,email,age] [id,name,email,age] ... Column-oriented (OLAP): [id,id,...] [name,name,...] [email,email,...] [age,age,...]

Strengths: Excellent compression (similar values together). Fast aggregations (read only needed columns). Efficient for analytical queries.

Systems: ClickHouse, DuckDB, Apache Druid, Google BigQuery, Amazon Redshift.

Time-Series Model

Optimized for time-stamped data: metrics, logs, sensor data, financial data.

Strengths: Efficient time-range queries. Downsampling and retention policies. High write throughput.

Systems: InfluxDB, TimescaleDB (PostgreSQL extension), Prometheus, QuestDB.

Three-Schema Architecture

External schema (view level): What each user/application sees. Different views for different users.

Conceptual schema (logical level): The overall structure of the database. Tables, relationships, constraints.

Internal schema (physical level): How data is stored. Files, indexes, storage format.

Data independence:

  • Logical: Change the conceptual schema without changing applications.
  • Physical: Change storage structures without changing the logical schema.

Database Users and Roles

Role Responsibility
DBA (Database Administrator) Installation, configuration, backup, security, tuning
Database Designer Schema design, normalization, index selection
Application Developer Write queries, integrate with application code
End User Query data via applications or ad-hoc queries

ACID Properties (Preview)

The guarantees provided by a DBMS for transactions (detailed in transactions file):

  • Atomicity: All or nothing — a transaction either fully completes or fully rolls back.
  • Consistency: Database moves from one valid state to another.
  • Isolation: Concurrent transactions don't interfere with each other.
  • Durability: Committed data survives crashes.

Applications in CS

  • Web applications: User data, sessions, content — the backbone of nearly every web app.
  • Analytics: Data warehouses aggregate business data for reporting and decision-making.
  • Mobile apps: SQLite for local storage. Cloud databases for sync.
  • IoT: Time-series databases for sensor data. Edge databases for local processing.
  • Machine learning: Feature stores, training data management, experiment tracking.
  • Gaming: Player profiles, inventory, leaderboards, matchmaking.
  • Finance: Transaction processing, compliance, audit trails.