Reading Strategies

There is no single correct way to read code. Different situations call for different strategies, and the engineer who only reads code one way is slow in most situations. The two fundamental approaches — top-down and bottom-up — each have strengths. Knowing when to use each, and when to combine them, is what separates an engineer who can quickly understand any codebase from one who flounders in unfamiliar territory.

Top-Down Reading

Top-down means starting from the highest level of abstraction and drilling down only as needed. You read the architecture first, the module boundaries second, the interfaces third, and the implementations last — if at all.

When to Use Top-Down

- Joining a new codebase or team
- Reviewing a large pull request (50+ files)
- Evaluating a library or framework before adopting it
- Understanding a system you need to integrate with
- Architecture reviews and design discussions

How to Read Top-Down

Layer 1: Project structure
  Read the directory tree. What are the top-level modules?
  What does the naming suggest about responsibilities?

Layer 2: Public interfaces
  Read the exported types, function signatures, and API endpoints.
  Do NOT read the implementations yet. Just the contracts.

Layer 3: Data flow
  Trace how data moves through the system. What goes in,
  what comes out, what gets stored, what gets transformed?

Layer 4: Implementation (only where needed)
  Now drill into specific implementations, but only the ones
  relevant to your current question.

The discipline of top-down reading is resisting the urge to drill into details too early. When you see a function call, note what it does (from its name and signature) without reading its body. When you see an import, note the dependency without opening that file. Build the map before exploring the territory.

Top-Down Example

Reading a new e-commerce service top-down:

Directory tree:
  src/
    routes/       -> HTTP endpoint definitions
    handlers/     -> Request handling logic
    services/     -> Business logic
    models/       -> Data types and database access
    middleware/   -> Cross-cutting concerns (auth, logging)
    utils/        -> Shared helpers

Read routes/ first:
  POST /orders          -> createOrder handler
  GET  /orders/:id      -> getOrder handler
  POST /orders/:id/pay  -> processPayment handler

Read handler signatures:
  createOrder(req, res) -> calls orderService.create()
  processPayment(req, res) -> calls paymentService.charge()

Read service interfaces:
  orderService.create(items, userId) -> Order
  paymentService.charge(orderId, paymentMethod) -> PaymentResult

At this point you understand the system without reading a single
line of implementation.

Bottom-Up Reading

Bottom-up means starting from a specific function, error, or behavior and working outward. You start with a concrete detail and expand your understanding as needed.

When to Use Bottom-Up

- Debugging a specific bug (start from the error location)
- Understanding a specific feature (start from its entry point)
- Code review of a focused change (start from the diff)
- Answering "how does X work?" (start from X)
- Performance investigation (start from the slow function)

How to Read Bottom-Up

Step 1: Find the specific function or line
Step 2: Read the function body. What does it do?
Step 3: Read the callers. Who calls this and when?
Step 4: Read the callees. What does this depend on?
Step 5: Expand outward only as much as needed

The discipline of bottom-up reading is knowing when to stop expanding. You do not need to understand the entire system to fix a bug in one function. Expand only when your current understanding is insufficient to answer your question.

Bottom-Up Example

Debugging why order totals are sometimes wrong:

Start: calculateTotal() function

  function calculateTotal(items) {
    return items.reduce((sum, item) => sum + item.price * item.quantity, 0);
  }

Question: is item.price always a number?
  -> Check callers: where do items come from?
  -> Found: items come from the database query in getOrderItems()
  -> Check: the database stores price as a string (legacy decision)
  -> string + string = concatenation, not addition

Root cause found without understanding the rest of the system.

Following the Data Flow

One of the most powerful reading strategies is to trace data through the system. Pick a piece of data — a user input, a database record, an API response — and follow it from creation to consumption.

Data flow tracing:
  1. Where does this data originate? (user input, API call, database)
  2. What transformations does it undergo? (parsing, validation, mapping)
  3. Where is it stored? (database, cache, file, memory)
  4. Where is it read back? (queries, cache lookups)
  5. Where does it leave the system? (API response, email, webhook)

This strategy reveals the true architecture of the system. The directory structure shows you how files are organized. The data flow shows you how the system actually works.

Data Flow Example

Following "user email" through a registration system:

  1. Entered in form -> POST /api/register -> body.email
  2. Validated: regex check, MX record lookup
  3. Normalized: lowercase, trim whitespace
  4. Stored: users table, email column (unique index)
  5. Used: login lookup, password reset, notification delivery
  6. Sent outward: welcome email, billing system sync

This flow reveals:
  - Email is case-insensitive (normalized to lowercase)
  - MX validation means invalid domains are rejected at signup
  - The unique index means duplicate detection happens at DB level
  - Billing system sync means email changes need to propagate

Reading Tests to Understand Behavior

Tests are executable documentation. When you need to understand what a function or module does, its tests often explain it more clearly than its implementation.

How to read tests for understanding:

  1. Read the test names as a behavior specification:
     test_create_order_with_valid_items
     test_create_order_fails_with_empty_cart
     test_create_order_applies_discount_code
     test_create_order_rejects_out_of_stock_items

  2. Read the setup (arrange) to understand inputs:
     items = [Item(sku="ABC", quantity=2, price=10.00)]
     discount = DiscountCode("SAVE20", percent=20)

  3. Read the assertions (assert) to understand outputs:
     assert order.total == 16.00
     assert order.status == "pending"
     assert order.discount_applied == True

  4. Read the edge case tests to understand boundaries:
     test_create_order_with_max_items (what is the limit?)
     test_create_order_with_zero_price (is free allowed?)
     test_create_order_with_negative_quantity (input validation)

Tests tell you what the developer considered important. If there is a test for a specific edge case, that edge case has probably caused a bug before.

When to Read Carefully vs When to Skim

Not all code deserves the same level of attention. Learning to calibrate your reading depth saves enormous time.

Read carefully:
  - Code you will modify or extend
  - Security-critical code (auth, encryption, access control)
  - Core business logic (the revenue-generating paths)
  - Code that a bug report points to
  - Interfaces you will depend on

Skim:
  - Boilerplate and configuration
  - Generated code
  - Test utilities and fixtures (unless debugging tests)
  - Third-party library internals (unless debugging the library)
  - UI layout code (unless the bug is visual)
  - Migration files (unless investigating a schema issue)

Skimming means reading function names and signatures without reading bodies. It means noting that a file exists and what it roughly does without understanding every line. It means trusting that the tests pass and the function works as its name suggests.

Calibrating Depth in Code Review

In a pull request review:

  High attention:
    - Changes to public interfaces
    - Changes to database schemas
    - Changes to security or auth logic
    - Deletion of code (what relied on this?)
    - Changes to error handling

  Medium attention:
    - New feature implementation
    - Refactoring of existing logic
    - Test changes

  Low attention:
    - Formatting changes
    - Dependency version bumps
    - Documentation updates
    - Adding new test cases for existing behavior

Combining Strategies

The best reading sessions combine strategies fluidly:

Scenario: understanding a failing feature in an unfamiliar codebase

  1. Top-down: read directory structure, find the feature's module (2 min)
  2. Bottom-up: start from the failing test, read the function it tests (3 min)
  3. Data flow: trace the input from the API endpoint to the failing point (5 min)
  4. Tests: read other tests in the same file for context (3 min)
  5. Bottom-up: follow the dependency that seems wrong (5 min)
  6. Bug found (18 minutes total)

No single strategy would have been as fast. Top-down got you to the right area. Bottom-up focused your attention. Data flow revealed the transformation chain. Tests confirmed expected behavior. Flexibility is the skill.

Real-World Example: Reading a Framework's Source

An engineer needed to understand why a web framework was handling request timeouts differently than documented. Their approach:

1. Top-down: read the framework's src/ directory tree
   -> found a timeout/ module
2. Bottom-up: searched for "timeout" in the module
   -> found the default timeout configuration
3. Data flow: traced the timeout value from config to the HTTP handler
   -> found it was being overridden by middleware
4. Tests: read the timeout tests
   -> found a test that expected the WRONG behavior (the bug was known)
5. Git blame: checked who wrote the overriding middleware
   -> found a commit message: "temporary workaround for #456"
   -> issue #456 was closed 2 years ago but the workaround was never removed

Common Pitfalls

Reading every line with equal attention — most code does not deserve careful reading. Learn to skim boilerplate and focus on logic.
Only reading top-down — top-down is slow when you need to understand a specific behavior. Start from the specific behavior and work outward.
Only reading bottom-up — bottom-up without any architectural context leads to getting lost in implementation details. Get the big picture first when you are new.
Ignoring tests — tests are often clearer than implementation. They show intended behavior, expected inputs, and edge cases in a structured format.
Reading code without running it — reading plus running is faster than reading alone. Set a breakpoint, run the test, and inspect state. The code on screen is static; the running program is alive.

Key Takeaways

Top-down reading starts from architecture and drills into details. Use it for new codebases, large reviews, and system understanding.
Bottom-up reading starts from a specific point and expands outward. Use it for debugging, focused features, and answering specific questions.
Data flow tracing reveals the true architecture. Follow a piece of data from entry to exit to understand how the system actually works.
Tests are executable documentation. Read test names as a behavior spec, setup as input examples, and assertions as expected outputs.
Calibrate reading depth to the code's importance. Read security and business logic carefully. Skim boilerplate and generated code. Not all lines deserve equal attention.