4 min read
On this page

Database Fundamentals

A database is an organized collection of structured data. A Database Management System (DBMS) is the software that manages databases — handling storage, retrieval, concurrency, recovery, and security.

Why Databases?

Without a DBMS, applications must handle:

  • Data storage format and layout
  • Concurrent access by multiple users
  • Crash recovery (partial writes, corruption)
  • Efficient querying (indexes, optimization)
  • Access control and security
  • Data integrity constraints

The DBMS provides all of this as a service, letting applications focus on business logic.

DBMS Architecture

DBMS component architecture — parser, optimizer, executor, storage

┌──────────────────────────────────────┐
│        Client Applications           │
├──────────────────────────────────────┤
│        Query Interface (SQL)         │
├──────────────────────────────────────┤
│  Query Parser → Optimizer → Executor │
├──────────────────────────────────────┤
│   Transaction Manager │ Lock Manager │
├──────────────────────────────────────┤
│   Buffer Pool Manager                │
├──────────────────────────────────────┤
│   Storage Engine (Disk Manager)      │
├──────────────────────────────────────┤
│   File System / Raw Disk             │
└──────────────────────────────────────┘

Key Components

Query Parser: Validates SQL syntax, checks permissions, resolves names.

Query Optimizer: Generates and evaluates execution plans. Chooses the cheapest plan based on statistics and cost models.

Execution Engine: Executes the chosen plan. Implements operators (scan, join, sort, aggregate).

Transaction Manager: Ensures ACID properties. Manages commit/rollback, isolation levels.

Buffer Pool: Caches disk pages in memory. Manages page replacement (similar to OS page cache but application-controlled).

Storage Engine: Manages on-disk data structures (B-trees, heap files, WAL).

Data Models

Relational Model

Data organized into tables (relations) of rows (tuples) and columns (attributes). The dominant model since the 1970s.

Users:
| id | name    | email           | age |
|----|---------|-----------------|-----|
| 1  | Alice   | alice@mail.com  | 30  |
| 2  | Bob     | bob@mail.com    | 25  |

Strengths: Strong consistency (ACID), powerful query language (SQL), well-understood theory (normalization, relational algebra), mature ecosystem.

Systems: PostgreSQL, MySQL, SQLite, Oracle, SQL Server, CockroachDB.

Document Model

Data stored as documents (JSON/BSON). Schema-flexible — each document can have different fields.

{
    "id": "1",
    "name": "Alice",
    "email": "alice@mail.com",
    "addresses": [
        {"city": "NYC", "zip": "10001"},
        {"city": "SF", "zip": "94102"}
    ]
}

Strengths: Flexible schema (agile development), natural mapping to application objects, easy horizontal scaling.

Systems: MongoDB, CouchDB, Amazon DocumentDB, Firestore.

Key-Value Model

Simplest model. Each entry is a (key, value) pair. Value is opaque to the database.

Strengths: Extremely fast (O(1) lookups), simple API (get/set/delete), easy to scale.

Systems: Redis, Memcached, DynamoDB, etcd, RocksDB.

Graph Model

Data modeled as nodes (entities) and edges (relationships). Natural for interconnected data.

(Alice)-[:FRIENDS_WITH]->(Bob)
(Alice)-[:WORKS_AT]->(Acme Corp)
(Bob)-[:WORKS_AT]->(Acme Corp)

Strengths: Efficient traversal of relationships. Natural for social networks, knowledge graphs, recommendation engines.

Systems: Neo4j, Amazon Neptune, JanusGraph, ArangoDB (multi-model).

Columnar Model

Store data by column instead of by row. Each column is stored contiguously on disk.

Row-oriented (OLTP): [id,name,email,age] [id,name,email,age] ... Column-oriented (OLAP): [id,id,...] [name,name,...] [email,email,...] [age,age,...]

Strengths: Excellent compression (similar values together). Fast aggregations (read only needed columns). Efficient for analytical queries.

Systems: ClickHouse, DuckDB, Apache Druid, Google BigQuery, Amazon Redshift.

Time-Series Model

Optimized for time-stamped data: metrics, logs, sensor data, financial data.

Strengths: Efficient time-range queries. Downsampling and retention policies. High write throughput.

Systems: InfluxDB, TimescaleDB (PostgreSQL extension), Prometheus, QuestDB.

Three-Schema Architecture

External schema (view level): What each user/application sees. Different views for different users.

Conceptual schema (logical level): The overall structure of the database. Tables, relationships, constraints.

Internal schema (physical level): How data is stored. Files, indexes, storage format.

Data independence:

  • Logical: Change the conceptual schema without changing applications.
  • Physical: Change storage structures without changing the logical schema.

Database Users and Roles

| Role | Responsibility | |---|---| | DBA (Database Administrator) | Installation, configuration, backup, security, tuning | | Database Designer | Schema design, normalization, index selection | | Application Developer | Write queries, integrate with application code | | End User | Query data via applications or ad-hoc queries |

ACID Properties (Preview)

The guarantees provided by a DBMS for transactions (detailed in transactions file):

  • Atomicity: All or nothing — a transaction either fully completes or fully rolls back.
  • Consistency: Database moves from one valid state to another.
  • Isolation: Concurrent transactions don't interfere with each other.
  • Durability: Committed data survives crashes.

Applications in CS

  • Web applications: User data, sessions, content — the backbone of nearly every web app.
  • Analytics: Data warehouses aggregate business data for reporting and decision-making.
  • Mobile apps: SQLite for local storage. Cloud databases for sync.
  • IoT: Time-series databases for sensor data. Edge databases for local processing.
  • Machine learning: Feature stores, training data management, experiment tracking.
  • Gaming: Player profiles, inventory, leaderboards, matchmaking.
  • Finance: Transaction processing, compliance, audit trails.