Relational Model
The relational model (Codd, 1970) organizes data into relations (tables) and provides a mathematical foundation for querying and manipulating data.
Relations
A relation is a set of tuples (rows) over a set of attributes (columns).
Students(id: INT, name: VARCHAR, gpa: DECIMAL, major: VARCHAR)
| id | name | gpa | major |
|-----|---------|------|---------|
| 101 | Alice | 3.8 | CS |
| 102 | Bob | 3.2 | Math |
| 103 | Carol | 3.9 | CS |
Properties of Relations
- Unordered rows: No inherent ordering (unlike arrays).
- Unordered columns: Attributes are identified by name, not position.
- No duplicates: A relation is a set (no duplicate tuples). In practice, SQL tables are bags (allow duplicates unless constrained).
- Atomic values: Each cell contains a single value (1NF).
Schema vs Instance
Schema (intension): The structure — attribute names and types. Students(id: INT, name: VARCHAR, ...).
Instance (extension): The data at a particular point in time — the actual rows.
Keys
Types of Keys
Superkey: A set of attributes that uniquely identifies a tuple. {id, name} is a superkey if id alone is unique.
Candidate key: A minimal superkey — no subset is also a superkey. If {id} uniquely identifies students, it's a candidate key.
Primary key: The chosen candidate key. Used as the main identifier. Cannot be NULL. One per table.
Foreign key: An attribute that references the primary key of another table. Enforces referential integrity.
CREATE TABLE Enrollments (
student_id INT REFERENCES Students(id), -- foreign key
course_id INT REFERENCES Courses(id), -- foreign key
grade CHAR(1),
PRIMARY KEY (student_id, course_id) -- composite primary key
);
Composite key: A key consisting of multiple attributes.
Surrogate key: An artificial key (auto-increment integer, UUID) with no business meaning. Stable, compact, no update anomalies.
Natural key: A key with business meaning (email, SSN). May change, may not be truly unique across time.
Relational Algebra
A set of operations on relations, forming the theoretical foundation of SQL.
Selection (σ)
Filter rows by a predicate.
σ_{gpa > 3.5}(Students) = {(101, Alice, 3.8, CS), (103, Carol, 3.9, CS)}
SQL: SELECT * FROM Students WHERE gpa > 3.5
Projection (π)
Select specific columns.
π_{name, major}(Students) = {(Alice, CS), (Bob, Math), (Carol, CS)}
SQL: SELECT name, major FROM Students
Join (⋈)
Combine tuples from two relations based on a condition.
Natural join: Join on all common attribute names.
Theta join (⋈_θ): Join on arbitrary condition.
Equi-join: Join where θ is equality on specific attributes.
Students ⋈_{Students.id = Enrollments.student_id} Enrollments
SQL: SELECT * FROM Students JOIN Enrollments ON Students.id = Enrollments.student_id
Union (∪)
Combine tuples from two union-compatible relations (same schema).
SQL: SELECT ... UNION SELECT ...
Intersection (∩)
Tuples appearing in both relations.
SQL: SELECT ... INTERSECT SELECT ...
Difference (−)
Tuples in the first relation but not the second.
SQL: SELECT ... EXCEPT SELECT ...
Division (÷)
Find tuples in R that are associated with ALL tuples in S. Useful for "for all" queries.
Example: "Find students enrolled in ALL courses."
No direct SQL equivalent — typically implemented with double negation or counting.
Rename (ρ)
Rename a relation or its attributes. ρ_{S(a,b)}(R) renames R to S with attributes a, b.
SQL: SELECT a AS x, b AS y FROM R AS S
Relational Calculus
Tuple Relational Calculus
Declarative: specify what you want, not how to get it.
{t | t ∈ Students ∧ t.gpa > 3.5}
"The set of tuples t from Students where gpa > 3.5."
Domain Relational Calculus
Variables range over domain values (individual attributes) instead of tuples.
{<n, m> | ∃i, g (<i, n, g, m> ∈ Students ∧ g > 3.5)}
Equivalence
Codd's theorem: Relational algebra, tuple relational calculus, and domain relational calculus are all equivalent in expressive power (for safe queries).
SQL is based on tuple relational calculus with extensions (aggregation, ordering, grouping).
Constraints
Entity Integrity
Primary key attributes cannot be NULL. Every tuple must be uniquely identifiable.
Referential Integrity
Foreign key values must either be NULL or match an existing primary key in the referenced table.
Actions on violation:
CASCADE: Delete/update the referencing rows too.SET NULL: Set the foreign key to NULL.SET DEFAULT: Set to a default value.RESTRICT/NO ACTION: Prevent the delete/update.
CREATE TABLE Enrollments (
student_id INT REFERENCES Students(id) ON DELETE CASCADE,
course_id INT REFERENCES Courses(id) ON DELETE RESTRICT
);
Domain Constraints
Values must be of the correct type and within a specified range.
ALTER TABLE Students ADD CONSTRAINT check_gpa CHECK (gpa >= 0.0 AND gpa <= 4.0);
Check Constraints
Arbitrary boolean conditions on rows.
Assertions
Conditions that must hold for the entire database (across tables). Rarely supported in practice (expensive to enforce).
Triggers
Procedural code executed automatically when certain events occur (INSERT, UPDATE, DELETE).
CREATE TRIGGER update_timestamp
BEFORE UPDATE ON orders
FOR EACH ROW
SET NEW.updated_at = NOW();
Relational Algebra Equivalences
The query optimizer uses these to transform queries:
σ_{p∧q}(R) = σ_p(σ_q(R)) (cascade selections)
σ_p(R ⋈ S) = σ_p(R) ⋈ S (if p involves only R's attrs) (push selection)
π_L(R ⋈ S) = π_L(π_{L∩attrs(R)}(R) ⋈ π_{L∩attrs(S)}(S)) (push projection)
R ⋈ S = S ⋈ R (join commutativity)
(R ⋈ S) ⋈ T = R ⋈ (S ⋈ T) (join associativity)
These rules are the foundation of query optimization (covered in later files).
Applications in CS
- Schema design: Understanding relations, keys, and constraints is essential for designing correct, efficient database schemas.
- Query writing: Relational algebra provides the mental model behind SQL — understanding it helps write better queries.
- ORM design: Object-relational mapping (Diesel, SQLAlchemy, Hibernate) bridges the relational model and application objects.
- Query optimization: Knowing relational algebra equivalences helps understand and influence query plans.
- Data modeling: ER diagrams → relational schema is the standard design process (covered in database design file).