5 min read
On this page

Event-Driven Architecture

Event-driven architecture structures systems around the production, detection, and reaction to events. Instead of services calling each other directly, they communicate by emitting and consuming events, creating loosely coupled systems that can evolve independently.

Events vs Commands

Understanding the distinction between events and commands is foundational to designing event-driven systems correctly.

Events

An event describes something that already happened. It is a fact, immutable, and named in past tense. The publisher does not know or care who consumes it.

Event examples:
  - OrderPlaced { orderId: "abc-123", userId: "u-456", total: 59.99 }
  - PaymentCompleted { paymentId: "pay-789", orderId: "abc-123" }
  - UserRegistered { userId: "u-456", email: "user@example.com" }

Events are owned by the publisher. The publisher defines the schema, and consumers adapt to it. Multiple consumers can react to the same event in different ways.

Commands

A command is a request for something to happen. It is imperative, named as an instruction, and directed at a specific handler. The sender expects exactly one receiver to act on it.

Command examples:
  - ProcessPayment { orderId: "abc-123", amount: 59.99 }
  - SendEmail { to: "user@example.com", template: "welcome" }
  - ReserveInventory { sku: "WIDGET-01", quantity: 2 }

Commands create coupling between sender and receiver. The sender must know which service handles the command and what interface it exposes.

When to Use Each

| Aspect          | Event                              | Command                          |
|-----------------|------------------------------------|----------------------------------|
| Direction       | Broadcast (one to many)            | Targeted (one to one)            |
| Coupling        | Loose (publisher doesn't know)     | Tight (sender knows receiver)    |
| Failure         | Publisher unaffected                | Sender needs to handle failure   |
| Naming          | Past tense (OrderPlaced)           | Imperative (ProcessPayment)      |
| Ownership       | Publisher owns schema              | Receiver owns contract           |

Stripe's architecture illustrates both. When a charge succeeds, Stripe publishes a charge.succeeded event (anyone can subscribe via webhooks). Internally, the charge service sends a RecordRevenue command to the billing service (targeted, one receiver).

Event Bus

An event bus is the infrastructure that routes events from publishers to subscribers. It acts as the central nervous system of an event-driven architecture.

In-Process Event Bus

For monolithic applications, an in-process event bus dispatches events within a single application using the observer pattern.

In-process event bus:
  OrderService.placeOrder()
    --> eventBus.publish(OrderPlaced)
    --> InventoryHandler.handle(OrderPlaced)  [same process]
    --> NotificationHandler.handle(OrderPlaced)  [same process]

MediatR in .NET and Spring's ApplicationEventPublisher are common in-process event bus implementations.

Distributed Event Bus

For microservices, a distributed event bus uses a message broker to route events across network boundaries.

Distributed event bus:
  OrderService --> [Kafka: order-events] --> InventoryService (separate process)
                                          --> NotificationService (separate process)
                                          --> AnalyticsService (separate process)

LinkedIn's event bus processes over 7 trillion messages per day through Kafka. Every user action, system event, and data change flows through their event bus to power features across the platform.

Event Bus Design Considerations

Schema management:
  - Use a schema registry (Confluent Schema Registry, AWS Glue)
  - Version events with backward compatibility
  - Include event type and version in message headers

Event format:
  {
    "eventId": "evt-001",
    "eventType": "order.placed",
    "version": 2,
    "timestamp": "2025-03-15T10:30:00Z",
    "source": "order-service",
    "data": { ... },
    "metadata": { "correlationId": "req-123", "causationId": "evt-000" }
  }

Choreography vs Orchestration

These are two approaches to coordinating multi-step business processes across services.

Choreography

In choreography, there is no central coordinator. Each service listens for events and reacts independently, emitting new events that trigger the next step. The process emerges from the interaction of independent services.

Order placement choreography:
  1. OrderService publishes OrderPlaced
  2. PaymentService hears OrderPlaced, charges card, publishes PaymentCompleted
  3. InventoryService hears PaymentCompleted, reserves stock, publishes StockReserved
  4. ShippingService hears StockReserved, creates shipment, publishes ShipmentCreated
  5. NotificationService hears ShipmentCreated, sends email to customer

Each service owns its step. No service knows the full workflow. This is highly decoupled but makes it hard to understand the overall process by reading any single service's code.

Orchestration

In orchestration, a central orchestrator service directs the workflow. It sends commands to each service and waits for responses before proceeding.

Order placement orchestration:
  OrderOrchestrator:
    1. Send ProcessPayment command to PaymentService, wait for response
    2. Send ReserveInventory command to InventoryService, wait for response
    3. Send CreateShipment command to ShippingService, wait for response
    4. Send SendConfirmation command to NotificationService

The orchestrator contains the business logic for the process. Each participating service is simpler because it only handles its specific task without knowing the broader context.

Comparing the Two Approaches

| Aspect              | Choreography                    | Orchestration                  |
|---------------------|---------------------------------|--------------------------------|
| Coupling            | Very loose                      | Orchestrator coupled to all    |
| Visibility          | Hard to trace full flow         | Clear workflow in one place    |
| Single point failure| No central bottleneck           | Orchestrator is critical path  |
| Complexity growth   | Exponential (event chains)      | Linear (add steps to workflow) |
| Best for            | Simple flows, few steps         | Complex flows, many conditions |

Netflix uses choreography for straightforward event flows like content ingestion notifications. For complex workflows like payment retries with escalation paths, they use orchestration through their Conductor workflow engine.

Uber shifted from pure choreography to orchestration with their Cadence (now Temporal) platform after finding that debugging event chains across dozens of services during a ride lifecycle was nearly impossible.

Saga Pattern

Sagas manage distributed transactions across multiple services without using two-phase commit. Each step has a compensating action that undoes its effect if a later step fails.

Choreography-Based Saga

Each service publishes events and listens for failure events to trigger compensation.

Happy path:
  OrderPlaced --> PaymentCompleted --> StockReserved --> ShipmentCreated

Failure at stock reservation:
  OrderPlaced --> PaymentCompleted --> StockReservationFailed
                                        |
                                        v
                                   PaymentService hears failure, issues refund
                                        |
                                        v
                                   OrderService hears refund, marks order cancelled

Orchestration-Based Saga

The orchestrator explicitly manages both forward actions and compensations.

OrderSagaOrchestrator:
  Step 1: ProcessPayment
    Success: proceed to step 2
    Failure: mark order failed, done
  Step 2: ReserveInventory
    Success: proceed to step 3
    Failure: compensate step 1 (RefundPayment), mark order failed
  Step 3: CreateShipment
    Success: mark order completed
    Failure: compensate step 2 (ReleaseInventory),
             compensate step 1 (RefundPayment), mark order failed

Saga Implementation Considerations

Key requirements:
  - Every action must have a compensating action
  - Compensations must be idempotent (may run more than once)
  - Saga state must be persisted (survives orchestrator restarts)
  - Timeout handling for steps that never respond
  
Compensation examples:
  ProcessPayment     --> RefundPayment
  ReserveInventory   --> ReleaseInventory
  CreateShipment     --> CancelShipment
  DebitAccount       --> CreditAccount

Airbnb uses the saga pattern for booking. When a guest books, the system reserves the listing, charges the guest, and schedules the payout to the host. If the charge fails after the listing is reserved, the compensating action releases the reservation.

Event Sourcing Connection

Event-driven architecture pairs naturally with event sourcing, where the state of an entity is derived from a sequence of events rather than stored as a snapshot.

Traditional: Store current state (order status = "shipped")
Event sourced: Store event history
  1. OrderPlaced { items: [...] }
  2. PaymentCompleted { amount: 59.99 }
  3. OrderShipped { trackingNumber: "1Z999..." }
  
  Current state is derived by replaying these events

This provides a complete audit trail and the ability to rebuild state at any point in time. The event log becomes both the communication mechanism and the source of truth.

Common Pitfalls

  • Event storms. A single event triggers a cascade of events that amplify into millions of messages. Design circuit breakers and monitor event rates.
  • Distributed debugging nightmare. Without correlation IDs and distributed tracing, following an event chain across services is nearly impossible.
  • Schema evolution breakage. Changing an event schema without backward compatibility silently breaks consumers. Always use additive changes and schema registries.
  • Choreography spaghetti. Beyond 5-6 services, choreographed workflows become impossible to understand or debug. Switch to orchestration for complex flows.
  • Missing compensation logic. Implementing the happy path of a saga without compensations means failures leave the system in an inconsistent state.
  • Temporal coupling through synchronous expectations. Publishing an event and then polling for a response reintroduces the coupling that events were meant to eliminate.

Key Takeaways

  • Events describe facts that happened and are broadcast. Commands are directed requests to a specific service. Use events for loose coupling, commands for explicit coordination.
  • Choreography works for simple flows with few steps. Orchestration is better for complex workflows where visibility and control matter.
  • The saga pattern replaces distributed transactions with a sequence of local transactions and compensating actions for rollback.
  • Every event-driven system needs correlation IDs, schema versioning, and dead letter handling to be production-ready.
  • Start with orchestration for new complex workflows. Migrate to choreography only when you have strong observability and the coupling cost of orchestration becomes a bottleneck.