Relational Model

The relational model (Codd, 1970) organizes data into relations (tables) and provides a mathematical foundation for querying and manipulating data.

Relations

A relation is a set of tuples (rows) over a set of attributes (columns).

Students(id: INT, name: VARCHAR, gpa: DECIMAL, major: VARCHAR)

| id  | name    | gpa  | major   |
|-----|---------|------|---------|
| 101 | Alice   | 3.8  | CS      |
| 102 | Bob     | 3.2  | Math    |
| 103 | Carol   | 3.9  | CS      |

Properties of Relations

Unordered rows: No inherent ordering (unlike arrays).
Unordered columns: Attributes are identified by name, not position.
No duplicates: A relation is a set (no duplicate tuples). In practice, SQL tables are bags (allow duplicates unless constrained).
Atomic values: Each cell contains a single value (1NF).

Schema vs Instance

Schema (intension): The structure — attribute names and types. Students(id: INT, name: VARCHAR, ...).

Instance (extension): The data at a particular point in time — the actual rows.

Keys

Types of Keys

Superkey: A set of attributes that uniquely identifies a tuple. {id, name} is a superkey if id alone is unique.

Candidate key: A minimal superkey — no subset is also a superkey. If {id} uniquely identifies students, it's a candidate key.

Primary key: The chosen candidate key. Used as the main identifier. Cannot be NULL. One per table.

Foreign key: An attribute that references the primary key of another table. Enforces referential integrity.

CREATE TABLE Enrollments (
    student_id INT REFERENCES Students(id),  -- foreign key
    course_id  INT REFERENCES Courses(id),   -- foreign key
    grade      CHAR(1),
    PRIMARY KEY (student_id, course_id)       -- composite primary key
);

Composite key: A key consisting of multiple attributes.

Surrogate key: An artificial key (auto-increment integer, UUID) with no business meaning. Stable, compact, no update anomalies.

Natural key: A key with business meaning (email, SSN). May change, may not be truly unique across time.

Relational Algebra

A set of operations on relations, forming the theoretical foundation of SQL.

Selection (σ)

Filter rows by a predicate.

σ_{gpa > 3.5}(Students) = {(101, Alice, 3.8, CS), (103, Carol, 3.9, CS)}

SQL: SELECT * FROM Students WHERE gpa > 3.5

Projection (π)

Select specific columns.

π_{name, major}(Students) = {(Alice, CS), (Bob, Math), (Carol, CS)}

SQL: SELECT name, major FROM Students

Join (⋈)

Combine tuples from two relations based on a condition.

Natural join: Join on all common attribute names.

Theta join (⋈_θ): Join on arbitrary condition.

Equi-join: Join where θ is equality on specific attributes.

Students ⋈_{Students.id = Enrollments.student_id} Enrollments

SQL: SELECT * FROM Students JOIN Enrollments ON Students.id = Enrollments.student_id

Union (∪)

Combine tuples from two union-compatible relations (same schema).

SQL: SELECT ... UNION SELECT ...

Intersection (∩)

Tuples appearing in both relations.

SQL: SELECT ... INTERSECT SELECT ...

Difference (−)

Tuples in the first relation but not the second.

SQL: SELECT ... EXCEPT SELECT ...

Division (÷)

Find tuples in R that are associated with ALL tuples in S. Useful for "for all" queries.

Example: "Find students enrolled in ALL courses."

No direct SQL equivalent — typically implemented with double negation or counting.

Rename (ρ)

Rename a relation or its attributes. ρ_{S(a,b)}(R) renames R to S with attributes a, b.

SQL: SELECT a AS x, b AS y FROM R AS S

Relational Calculus

Tuple Relational Calculus

Declarative: specify what you want, not how to get it.

{t | t ∈ Students ∧ t.gpa > 3.5}

"The set of tuples t from Students where gpa > 3.5."

Domain Relational Calculus

Variables range over domain values (individual attributes) instead of tuples.

{<n, m> | ∃i, g (<i, n, g, m> ∈ Students ∧ g > 3.5)}

CASCADE: Delete/update the referencing rows too.
SET NULL: Set the foreign key to NULL.
SET DEFAULT: Set to a default value.
RESTRICT / NO ACTION: Prevent the delete/update.

CREATE TABLE Enrollments (
    student_id INT REFERENCES Students(id) ON DELETE CASCADE,
    course_id  INT REFERENCES Courses(id) ON DELETE RESTRICT
);

Domain Constraints

Values must be of the correct type and within a specified range.

ALTER TABLE Students ADD CONSTRAINT check_gpa CHECK (gpa >= 0.0 AND gpa <= 4.0);

CREATE TRIGGER update_timestamp
BEFORE UPDATE ON orders
FOR EACH ROW
SET NEW.updated_at = NOW();

Relational Algebra Equivalences

The query optimizer uses these to transform queries:

σ_{p∧q}(R) = σ_p(σ_q(R))                     (cascade selections)
σ_p(R ⋈ S) = σ_p(R) ⋈ S  (if p involves only R's attrs)  (push selection)
π_L(R ⋈ S) = π_L(π_{L∩attrs(R)}(R) ⋈ π_{L∩attrs(S)}(S))  (push projection)
R ⋈ S = S ⋈ R                                 (join commutativity)
(R ⋈ S) ⋈ T = R ⋈ (S ⋈ T)                   (join associativity)

These rules are the foundation of query optimization (covered in later files).

Applications in CS

Schema design: Understanding relations, keys, and constraints is essential for designing correct, efficient database schemas.
Query writing: Relational algebra provides the mental model behind SQL — understanding it helps write better queries.
ORM design: Object-relational mapping (Diesel, SQLAlchemy, Hibernate) bridges the relational model and application objects.
Query optimization: Knowing relational algebra equivalences helps understand and influence query plans.
Data modeling: ER diagrams → relational schema is the standard design process (covered in database design file).