6 min read
On this page

Modeling the Real World

Data is a structured description of something real. Every time you write down a contact's name and phone number, fill out a form, or organize a spreadsheet, you are modeling the real world as data. You are deciding what matters, what to capture, and how to structure it.

This chapter explores how to represent real things as structured data, starting with everyday examples and connecting to how technology does the same thing at scale.

What Does It Mean to Represent Something as Data?

The real world is messy, continuous, and infinitely detailed. Data is clean, discrete, and deliberately limited. Turning reality into data means choosing which aspects to capture and which to ignore.

A person is infinitely complex — their appearance, personality, history, relationships, preferences, and more. But a contact list does not try to capture a person. It captures just enough to contact them:

Contact:
  Name:    Maria Santos
  Phone:   555-0142
  Email:   maria.santos@email.com

Three fields. That is all you need to reach Maria. Everything else about her — her favorite color, her childhood memories, the way she laughs — is real but irrelevant for the purpose of a contact list.

This is modeling: choosing the right level of detail for a specific purpose.

Everyday Data Models

A Contact List

Name            Phone        Email
----            -----        -----
Maria Santos    555-0142     maria.santos@email.com
James Park      555-0198     jpark@email.com
Aisha Johnson   555-0267     aisha.j@email.com

This model captures people in terms of contact information. Each row represents a person. Each column represents a property of that person. Every contact list — whether on paper or in your phone — uses this structure.

A Restaurant Menu

Item                Category      Price    Vegetarian
----                --------      -----    ----------
Margherita Pizza    Pizza         $12.99   Yes
Pepperoni Pizza     Pizza         $14.99   No
Caesar Salad        Salad         $9.99    Yes
Grilled Salmon      Entree        $18.99   No
Chocolate Cake      Dessert       $7.99    Yes

The menu models food items with just enough detail for ordering decisions: what it is, what category it belongs to, what it costs, and whether it is vegetarian. It does not model calorie count, ingredients list, or preparation time — unless the restaurant decides those details serve their purpose.

A Class Roster

Student          Grade    Attendance    Parent Contact
-------          -----    ----------    --------------
Emma Wilson      3rd      Present       555-0321
Liam Chen        3rd      Absent        555-0455
Sophia Davis     3rd      Present       555-0189

The teacher needs to know who is in class today and how to reach a parent if needed. This model serves those needs. It does not include shoe size or favorite subject because those are irrelevant to the daily task.

Choosing What to Capture

The most important decision in data modeling is what to include and what to leave out. This depends entirely on the purpose.

Consider modeling a book for different purposes:

For a library catalog:
  Title, Author, ISBN, Genre, Available copies, Location on shelf

For a bookstore:
  Title, Author, ISBN, Price, Publisher, Stock count, Cover image

For a book club:
  Title, Author, Number of pages, Genre, Discussion date, Rating

For a reading tracker:
  Title, Author, Date started, Date finished, Personal notes

Same real-world thing — a book — but four different models because four different purposes require four different sets of information.

Structure Matters

How you organize data affects how easily you can use it. Consider tracking household chores:

Unstructured (just notes)

Monday - did laundry, vacuumed. Maria cleaned the bathroom Tuesday.
Wednesday nothing got done. James took out trash Thursday.

This is hard to search, sort, or summarize. Who did the most chores? What has not been done this week? The answers are buried in text.

Structured (organized into fields)

Date        Person    Chore           Status
----        ------    -----           ------
Monday      You       Laundry         Done
Monday      You       Vacuuming       Done
Tuesday     Maria     Bathroom        Done
Thursday    James     Trash           Done
Wednesday   -         -               Nothing done

Now you can easily answer questions: Who did what? What day had no chores? Structure turns raw information into something you can reason about.

Data Models in Technology

Database Tables

Databases store information in tables — the same rows-and-columns structure as the examples above, but enforced and optimized by software.

customers table:
  id    name            email                  city
  1     Maria Santos    maria@email.com        Chicago
  2     James Park      jpark@email.com        Seattle
  3     Aisha Johnson   aisha.j@email.com      Austin

orders table:
  id    customer_id    date          total
  101   1              2026-03-15    $45.99
  102   3              2026-03-16    $22.50
  103   1              2026-03-17    $31.00

Notice that the orders table uses customer_id to link to the customers table. This relationship — "an order belongs to a customer" — is a core concept in data modeling. Instead of repeating Maria's full details in every order, you store her information once and reference it.

JSON (JavaScript Object Notation)

JSON is a common format for representing structured data, especially on the web:

{
  "name": "Margherita Pizza",
  "category": "Pizza",
  "price": 12.99,
  "vegetarian": true,
  "toppings": ["mozzarella", "tomato sauce", "basil"]
}

JSON can represent nested data — like a list of toppings inside a menu item — which flat tables cannot do as naturally.

Data Models

A data model defines the structure of your data before you start filling it in. It answers questions like:

  • What entities exist? (customers, orders, products)
  • What properties does each entity have? (name, price, date)
  • How do entities relate to each other? (a customer places orders)
  • What rules apply? (price must be positive, email must be unique)
Entity: Product
  - name (text, required)
  - price (number, must be > 0)
  - category (text, from predefined list)
  - in_stock (true/false)

Entity: Order
  - customer (reference to Customer)
  - items (list of references to Product)
  - date (date, defaults to today)
  - status (one of: pending, shipped, delivered, cancelled)

Defining this model upfront prevents chaos later. Without it, you might end up with prices stored as text, missing customer references, or statuses spelled three different ways.

Relationships Between Data

Real-world things have relationships, and data models need to capture them:

One-to-many

One customer has many orders. One teacher has many students. One author has many books.

Author: J.K. Rowling
  -> Book: Harry Potter and the Sorcerer's Stone
  -> Book: Harry Potter and the Chamber of Secrets
  -> Book: The Casual Vacancy

Many-to-many

A student takes many classes. A class has many students.

Student: Emma  -> Class: Math, Class: Science, Class: Art
Student: Liam  -> Class: Math, Class: Music
Class: Math    -> Student: Emma, Student: Liam

One-to-one

One person has one passport. One passport belongs to one person.

Understanding these relationships is essential for building data models that accurately reflect reality.

Common Pitfalls

Capturing too much

Including every conceivable detail makes the model unwieldy and expensive to maintain. A contact list does not need blood type. A menu does not need the chef's personal history with each dish. Capture what you need for your purpose.

Capturing too little

Missing critical fields leads to workarounds and data quality issues. A customer model without an email address means you cannot send order confirmations. Think through your use cases before finalizing the model.

Inconsistent structure

If some contacts have phone numbers and others have "call Maria" in the notes field, the data is inconsistent. Define a clear structure and stick to it.

Ignoring relationships

Modeling entities in isolation misses important connections. An order without a customer reference is an orphan — you know what was ordered but not by whom.

Not planning for change

The real world changes, and your data model will need to change with it. A restaurant that adds catering services will need new fields. Build with some flexibility in mind, but do not over-engineer for changes that may never come.

Key Takeaways

  • Representing the real world as data means choosing which aspects to capture and structuring them for a specific purpose.
  • The same real-world thing can be modeled differently depending on what questions you need to answer.
  • Structure matters: organized data is searchable, sortable, and analyzable in ways that unstructured notes are not.
  • In technology, database tables, JSON, and formal data models are the tools for structuring real-world information.
  • Relationships between entities (one-to-many, many-to-many) are a critical part of any data model.
  • The most important question is always: "What is this data for?" The answer drives every modeling decision.