5 min read
On this page

Database Design

Database design transforms requirements into a well-structured schema. The standard process: conceptual design (ER model) → logical design (relational schema) → physical design (indexes, storage).

Entity-Relationship Model

Entities

An entity is a real-world object or concept. Represented as a rectangle.

Entity set: A collection of similar entities (e.g., all Students, all Courses).

Attributes

Properties of an entity. Represented as ovals connected to the entity.

Types:

  • Simple: Atomic value (name, age).
  • Composite: Composed of sub-attributes (address → street, city, zip).
  • Multivalued: Multiple values (phone numbers). Denoted by double oval.
  • Derived: Computed from other attributes (age from birth_date). Denoted by dashed oval.
  • Key attribute: Uniquely identifies an entity. Underlined.

Relationships

An association between entity sets. Represented as a diamond.

[Student] ──── <Enrolls> ──── [Course]

Relationship attributes: Properties of the relationship itself (e.g., grade in Enrolls).

Cardinality

How many entities of one set can be associated with entities of another set.

| Cardinality | Meaning | Example | |---|---|---| | 1:1 (one-to-one) | Each A relates to at most 1 B | Person – Passport | | 1:N (one-to-many) | Each A relates to many Bs | Department – Employees | | M:N (many-to-many) | Each A relates to many Bs and vice versa | Students – Courses |

Participation

Total participation (double line): Every entity must participate in the relationship. Every employee MUST belong to a department.

Partial participation (single line): Some entities may not participate. A student MAY have a thesis advisor.

ER Diagrams

[Student]──(id)──(name)──(gpa)
     │
     │ total
     │
   <Enrolls>──(grade)──(semester)
     │
     │ partial
     │
[Course]──(course_id)──(title)──(credits)

Enhanced ER (EER)

Generalization/Specialization (inheritance):

        [Person]
        /      \
  [Student]  [Employee]
                 /    \
         [Faculty]  [Staff]

Disjoint (d): An entity is in at most one subclass (Student OR Employee, not both).

Overlapping (o): An entity can be in multiple subclasses (Student AND Employee).

Total specialization: Every superclass entity must be in at least one subclass.

Partial specialization: Some superclass entities may not be in any subclass.

Aggregation: Treat a relationship as an entity for participation in other relationships.

ER to Relational Mapping

Rules

Strong entity → Table with all simple attributes. Key attribute becomes primary key.

Weak entity → Table with all attributes + foreign key to the identifying owner + discriminator as part of the primary key.

1:1 relationship → Add foreign key to the side with total participation (or merge tables if both are total).

1:N relationship → Add foreign key to the "many" side (the child table references the parent).

M:N relationship → Create a new junction table with foreign keys to both entities. The combination is the primary key.

Students(id, name, gpa)
Courses(course_id, title, credits)
Enrollments(student_id, course_id, grade, semester)
    PK: (student_id, course_id)
    FK: student_id → Students(id)
    FK: course_id → Courses(course_id)

Multivalued attribute → Separate table with foreign key to the entity.

Composite attribute → Flatten into simple attributes in the table.

Generalization → Three approaches:

  1. One table per subclass (includes superclass attributes). Redundant but simple queries.
  2. One table for superclass + one per subclass (foreign key to superclass). Normalized but requires joins.
  3. One table with all attributes + type discriminator. Simple but many NULLs.

Functional Dependencies

A functional dependency X → Y means: if two tuples agree on X, they must agree on Y.

id → name, email, age    (id determines all other attributes)
zip → city, state         (zip code determines city and state)

Trivial dependency: X → Y where Y ⊆ X (always holds).

Armstrong's Axioms

Sound and complete rules for deriving all functional dependencies:

  1. Reflexivity: If Y ⊆ X, then X → Y.
  2. Augmentation: If X → Y, then XZ → YZ.
  3. Transitivity: If X → Y and Y → Z, then X → Z.

Derived rules:

  • Union: If X → Y and X → Z, then X → YZ.
  • Decomposition: If X → YZ, then X → Y and X → Z.
  • Pseudotransitivity: If X → Y and WY → Z, then WX → Z.

Attribute Closure

The attribute closure X⁺ is the set of all attributes functionally determined by X.

Algorithm: Start with X⁺ = X. For each FD A → B: if A ⊆ X⁺, add B to X⁺. Repeat until no change.

Use: Check if X is a superkey (X⁺ = all attributes). Find candidate keys.

Canonical Cover

A minimal set of FDs equivalent to the original set:

  1. Each FD has a single attribute on the right side.
  2. No FD can be removed without changing the closure.
  3. No attribute can be removed from any FD's left side without changing the closure.

Used in normalization algorithms.

Design Methodology

Requirements Analysis

  1. Identify entities and their attributes.
  2. Identify relationships and their cardinalities.
  3. Identify constraints (uniqueness, participation, business rules).

Conceptual Design

Draw the ER diagram. Iterate with stakeholders. Focus on completeness and correctness, not implementation.

Logical Design

Map ER to relational schema. Normalize. Define constraints, keys, and foreign keys.

Physical Design

Choose indexes, storage parameters, partitioning. Optimize for the expected query workload.

Denormalization

Selectively violate normal forms for performance. Add redundancy to avoid expensive joins.

When: Read-heavy workloads, OLAP, materialized aggregations. Always with awareness of the update anomalies introduced.

Design Patterns

Polymorphic Associations

One table references different entity types.

Single Table Inheritance: All types in one table with a type discriminator. Many NULLs.

Concrete Table Inheritance: Separate table per type. No NULLs but queries span multiple tables.

Class Table Inheritance: Base table + subtype tables. Normalized but requires joins.

Adjacency List (Trees)

CREATE TABLE categories (
    id        INT PRIMARY KEY,
    name      VARCHAR(100),
    parent_id INT REFERENCES categories(id)
);

Simple. Self-referencing foreign key. Recursive queries for tree traversal (recursive CTE).

Nested Sets, Materialized Path, Closure Table

Alternative tree representations with different query/update tradeoffs. Covered in advanced database topics.

Applications in CS

  • Application development: Good schema design prevents bugs, improves performance, and simplifies code.
  • Data warehousing: Star/snowflake schemas for analytical databases (covered in advanced databases).
  • API design: Database schema often drives API structure. ER model → REST resources.
  • Migration management: Schema changes (ALTER TABLE) must be managed carefully. Migration tools (Diesel migrations, Alembic, Flyway).
  • Domain-driven design: Bounded contexts align with database schema boundaries.