Database Fundamentals
A database is an organized collection of structured data. A Database Management System (DBMS) is the software that manages databases — handling storage, retrieval, concurrency, recovery, and security.
Why Databases?
Without a DBMS, applications must handle:
- Data storage format and layout
- Concurrent access by multiple users
- Crash recovery (partial writes, corruption)
- Efficient querying (indexes, optimization)
- Access control and security
- Data integrity constraints
The DBMS provides all of this as a service, letting applications focus on business logic.
DBMS Architecture

┌──────────────────────────────────────┐
│ Client Applications │
├──────────────────────────────────────┤
│ Query Interface (SQL) │
├──────────────────────────────────────┤
│ Query Parser → Optimizer → Executor │
├──────────────────────────────────────┤
│ Transaction Manager │ Lock Manager │
├──────────────────────────────────────┤
│ Buffer Pool Manager │
├──────────────────────────────────────┤
│ Storage Engine (Disk Manager) │
├──────────────────────────────────────┤
│ File System / Raw Disk │
└──────────────────────────────────────┘
Key Components
Query Parser: Validates SQL syntax, checks permissions, resolves names.
Query Optimizer: Generates and evaluates execution plans. Chooses the cheapest plan based on statistics and cost models.
Execution Engine: Executes the chosen plan. Implements operators (scan, join, sort, aggregate).
Transaction Manager: Ensures ACID properties. Manages commit/rollback, isolation levels.
Buffer Pool: Caches disk pages in memory. Manages page replacement (similar to OS page cache but application-controlled).
Storage Engine: Manages on-disk data structures (B-trees, heap files, WAL).
Data Models
Relational Model
Data organized into tables (relations) of rows (tuples) and columns (attributes). The dominant model since the 1970s.
Users:
| id | name | email | age |
|----|---------|-----------------|-----|
| 1 | Alice | alice@mail.com | 30 |
| 2 | Bob | bob@mail.com | 25 |
Strengths: Strong consistency (ACID), powerful query language (SQL), well-understood theory (normalization, relational algebra), mature ecosystem.
Systems: PostgreSQL, MySQL, SQLite, Oracle, SQL Server, CockroachDB.
Document Model
Data stored as documents (JSON/BSON). Schema-flexible — each document can have different fields.
{
"id": "1",
"name": "Alice",
"email": "alice@mail.com",
"addresses": [
{"city": "NYC", "zip": "10001"},
{"city": "SF", "zip": "94102"}
]
}
Strengths: Flexible schema (agile development), natural mapping to application objects, easy horizontal scaling.
Systems: MongoDB, CouchDB, Amazon DocumentDB, Firestore.
Key-Value Model
Simplest model. Each entry is a (key, value) pair. Value is opaque to the database.
Strengths: Extremely fast (O(1) lookups), simple API (get/set/delete), easy to scale.
Systems: Redis, Memcached, DynamoDB, etcd, RocksDB.
Graph Model
Data modeled as nodes (entities) and edges (relationships). Natural for interconnected data.
(Alice)-[:FRIENDS_WITH]->(Bob)
(Alice)-[:WORKS_AT]->(Acme Corp)
(Bob)-[:WORKS_AT]->(Acme Corp)
Strengths: Efficient traversal of relationships. Natural for social networks, knowledge graphs, recommendation engines.
Systems: Neo4j, Amazon Neptune, JanusGraph, ArangoDB (multi-model).
Columnar Model
Store data by column instead of by row. Each column is stored contiguously on disk.
Row-oriented (OLTP): [id,name,email,age] [id,name,email,age] ... Column-oriented (OLAP): [id,id,...] [name,name,...] [email,email,...] [age,age,...]
Strengths: Excellent compression (similar values together). Fast aggregations (read only needed columns). Efficient for analytical queries.
Systems: ClickHouse, DuckDB, Apache Druid, Google BigQuery, Amazon Redshift.
Time-Series Model
Optimized for time-stamped data: metrics, logs, sensor data, financial data.
Strengths: Efficient time-range queries. Downsampling and retention policies. High write throughput.
Systems: InfluxDB, TimescaleDB (PostgreSQL extension), Prometheus, QuestDB.
Three-Schema Architecture
External schema (view level): What each user/application sees. Different views for different users.
Conceptual schema (logical level): The overall structure of the database. Tables, relationships, constraints.
Internal schema (physical level): How data is stored. Files, indexes, storage format.
Data independence:
- Logical: Change the conceptual schema without changing applications.
- Physical: Change storage structures without changing the logical schema.
Database Users and Roles
| Role | Responsibility | |---|---| | DBA (Database Administrator) | Installation, configuration, backup, security, tuning | | Database Designer | Schema design, normalization, index selection | | Application Developer | Write queries, integrate with application code | | End User | Query data via applications or ad-hoc queries |
ACID Properties (Preview)
The guarantees provided by a DBMS for transactions (detailed in transactions file):
- Atomicity: All or nothing — a transaction either fully completes or fully rolls back.
- Consistency: Database moves from one valid state to another.
- Isolation: Concurrent transactions don't interfere with each other.
- Durability: Committed data survives crashes.
Applications in CS
- Web applications: User data, sessions, content — the backbone of nearly every web app.
- Analytics: Data warehouses aggregate business data for reporting and decision-making.
- Mobile apps: SQLite for local storage. Cloud databases for sync.
- IoT: Time-series databases for sensor data. Edge databases for local processing.
- Machine learning: Feature stores, training data management, experiment tracking.
- Gaming: Player profiles, inventory, leaderboards, matchmaking.
- Finance: Transaction processing, compliance, audit trails.