Event-Driven Architecture
Event-driven architecture structures systems around the production, detection, and reaction to events. Instead of services calling each other directly, they communicate by emitting and consuming events, creating loosely coupled systems that can evolve independently.
Events vs Commands
Understanding the distinction between events and commands is foundational to designing event-driven systems correctly.
Events
An event describes something that already happened. It is a fact, immutable, and named in past tense. The publisher does not know or care who consumes it.
Event examples:
- OrderPlaced { orderId: "abc-123", userId: "u-456", total: 59.99 }
- PaymentCompleted { paymentId: "pay-789", orderId: "abc-123" }
- UserRegistered { userId: "u-456", email: "user@example.com" }
Events are owned by the publisher. The publisher defines the schema, and consumers adapt to it. Multiple consumers can react to the same event in different ways.
Commands
A command is a request for something to happen. It is imperative, named as an instruction, and directed at a specific handler. The sender expects exactly one receiver to act on it.
Command examples:
- ProcessPayment { orderId: "abc-123", amount: 59.99 }
- SendEmail { to: "user@example.com", template: "welcome" }
- ReserveInventory { sku: "WIDGET-01", quantity: 2 }
Commands create coupling between sender and receiver. The sender must know which service handles the command and what interface it exposes.
When to Use Each
| Aspect | Event | Command |
|-----------------|------------------------------------|----------------------------------|
| Direction | Broadcast (one to many) | Targeted (one to one) |
| Coupling | Loose (publisher doesn't know) | Tight (sender knows receiver) |
| Failure | Publisher unaffected | Sender needs to handle failure |
| Naming | Past tense (OrderPlaced) | Imperative (ProcessPayment) |
| Ownership | Publisher owns schema | Receiver owns contract |
Stripe's architecture illustrates both. When a charge succeeds, Stripe publishes a charge.succeeded event (anyone can subscribe via webhooks). Internally, the charge service sends a RecordRevenue command to the billing service (targeted, one receiver).
Event Bus
An event bus is the infrastructure that routes events from publishers to subscribers. It acts as the central nervous system of an event-driven architecture.
In-Process Event Bus
For monolithic applications, an in-process event bus dispatches events within a single application using the observer pattern.
In-process event bus:
OrderService.placeOrder()
--> eventBus.publish(OrderPlaced)
--> InventoryHandler.handle(OrderPlaced) [same process]
--> NotificationHandler.handle(OrderPlaced) [same process]
MediatR in .NET and Spring's ApplicationEventPublisher are common in-process event bus implementations.
Distributed Event Bus
For microservices, a distributed event bus uses a message broker to route events across network boundaries.
Distributed event bus:
OrderService --> [Kafka: order-events] --> InventoryService (separate process)
--> NotificationService (separate process)
--> AnalyticsService (separate process)
LinkedIn's event bus processes over 7 trillion messages per day through Kafka. Every user action, system event, and data change flows through their event bus to power features across the platform.
Event Bus Design Considerations
Schema management:
- Use a schema registry (Confluent Schema Registry, AWS Glue)
- Version events with backward compatibility
- Include event type and version in message headers
Event format:
{
"eventId": "evt-001",
"eventType": "order.placed",
"version": 2,
"timestamp": "2025-03-15T10:30:00Z",
"source": "order-service",
"data": { ... },
"metadata": { "correlationId": "req-123", "causationId": "evt-000" }
}
Choreography vs Orchestration
These are two approaches to coordinating multi-step business processes across services.
Choreography
In choreography, there is no central coordinator. Each service listens for events and reacts independently, emitting new events that trigger the next step. The process emerges from the interaction of independent services.
Order placement choreography:
1. OrderService publishes OrderPlaced
2. PaymentService hears OrderPlaced, charges card, publishes PaymentCompleted
3. InventoryService hears PaymentCompleted, reserves stock, publishes StockReserved
4. ShippingService hears StockReserved, creates shipment, publishes ShipmentCreated
5. NotificationService hears ShipmentCreated, sends email to customer
Each service owns its step. No service knows the full workflow. This is highly decoupled but makes it hard to understand the overall process by reading any single service's code.
Orchestration
In orchestration, a central orchestrator service directs the workflow. It sends commands to each service and waits for responses before proceeding.
Order placement orchestration:
OrderOrchestrator:
1. Send ProcessPayment command to PaymentService, wait for response
2. Send ReserveInventory command to InventoryService, wait for response
3. Send CreateShipment command to ShippingService, wait for response
4. Send SendConfirmation command to NotificationService
The orchestrator contains the business logic for the process. Each participating service is simpler because it only handles its specific task without knowing the broader context.
Comparing the Two Approaches
| Aspect | Choreography | Orchestration |
|---------------------|---------------------------------|--------------------------------|
| Coupling | Very loose | Orchestrator coupled to all |
| Visibility | Hard to trace full flow | Clear workflow in one place |
| Single point failure| No central bottleneck | Orchestrator is critical path |
| Complexity growth | Exponential (event chains) | Linear (add steps to workflow) |
| Best for | Simple flows, few steps | Complex flows, many conditions |
Netflix uses choreography for straightforward event flows like content ingestion notifications. For complex workflows like payment retries with escalation paths, they use orchestration through their Conductor workflow engine.
Uber shifted from pure choreography to orchestration with their Cadence (now Temporal) platform after finding that debugging event chains across dozens of services during a ride lifecycle was nearly impossible.
Saga Pattern
Sagas manage distributed transactions across multiple services without using two-phase commit. Each step has a compensating action that undoes its effect if a later step fails.
Choreography-Based Saga
Each service publishes events and listens for failure events to trigger compensation.
Happy path:
OrderPlaced --> PaymentCompleted --> StockReserved --> ShipmentCreated
Failure at stock reservation:
OrderPlaced --> PaymentCompleted --> StockReservationFailed
|
v
PaymentService hears failure, issues refund
|
v
OrderService hears refund, marks order cancelled
Orchestration-Based Saga
The orchestrator explicitly manages both forward actions and compensations.
OrderSagaOrchestrator:
Step 1: ProcessPayment
Success: proceed to step 2
Failure: mark order failed, done
Step 2: ReserveInventory
Success: proceed to step 3
Failure: compensate step 1 (RefundPayment), mark order failed
Step 3: CreateShipment
Success: mark order completed
Failure: compensate step 2 (ReleaseInventory),
compensate step 1 (RefundPayment), mark order failed
Saga Implementation Considerations
Key requirements:
- Every action must have a compensating action
- Compensations must be idempotent (may run more than once)
- Saga state must be persisted (survives orchestrator restarts)
- Timeout handling for steps that never respond
Compensation examples:
ProcessPayment --> RefundPayment
ReserveInventory --> ReleaseInventory
CreateShipment --> CancelShipment
DebitAccount --> CreditAccount
Airbnb uses the saga pattern for booking. When a guest books, the system reserves the listing, charges the guest, and schedules the payout to the host. If the charge fails after the listing is reserved, the compensating action releases the reservation.
Event Sourcing Connection
Event-driven architecture pairs naturally with event sourcing, where the state of an entity is derived from a sequence of events rather than stored as a snapshot.
Traditional: Store current state (order status = "shipped")
Event sourced: Store event history
1. OrderPlaced { items: [...] }
2. PaymentCompleted { amount: 59.99 }
3. OrderShipped { trackingNumber: "1Z999..." }
Current state is derived by replaying these events
This provides a complete audit trail and the ability to rebuild state at any point in time. The event log becomes both the communication mechanism and the source of truth.
Common Pitfalls
- Event storms. A single event triggers a cascade of events that amplify into millions of messages. Design circuit breakers and monitor event rates.
- Distributed debugging nightmare. Without correlation IDs and distributed tracing, following an event chain across services is nearly impossible.
- Schema evolution breakage. Changing an event schema without backward compatibility silently breaks consumers. Always use additive changes and schema registries.
- Choreography spaghetti. Beyond 5-6 services, choreographed workflows become impossible to understand or debug. Switch to orchestration for complex flows.
- Missing compensation logic. Implementing the happy path of a saga without compensations means failures leave the system in an inconsistent state.
- Temporal coupling through synchronous expectations. Publishing an event and then polling for a response reintroduces the coupling that events were meant to eliminate.
Key Takeaways
- Events describe facts that happened and are broadcast. Commands are directed requests to a specific service. Use events for loose coupling, commands for explicit coordination.
- Choreography works for simple flows with few steps. Orchestration is better for complex workflows where visibility and control matter.
- The saga pattern replaces distributed transactions with a sequence of local transactions and compensating actions for rollback.
- Every event-driven system needs correlation IDs, schema versioning, and dead letter handling to be production-ready.
- Start with orchestration for new complex workflows. Migrate to choreography only when you have strong observability and the coupling cost of orchestration becomes a bottleneck.