Prerequisites
Before reading this, you may want to check out:
Case Study: Payment Processing System
A payment processing system handles the movement of money between parties, encompassing operations like charging credit cards, processing refunds, managing account balances, and settling transactions with banks and payment networks. Companies like Stripe, Square, and PayPal have built platforms that abstract the complexity of financial infrastructure behind clean APIs, but the systems behind those APIs are among the most demanding to design correctly.
Payment systems occupy a unique position in system design because correctness is non-negotiable. A social media post that fails to load can be retried; a payment that is processed twice results in real financial harm. This makes idempotency, consistency, and auditability first-class design concerns rather than optimizations. Every state transition must be tracked, every failure mode must be handled explicitly, and the system must produce a provably accurate ledger at all times.
The integration landscape adds further complexity. A payment system must communicate with external processors, card networks, and banking APIs, each with their own protocols, latency characteristics, and failure modes. The system must handle partial failures gracefully: what happens when a charge succeeds at the processor but the confirmation is lost in transit? Designing for these edge cases requires careful state machine design and reconciliation processes.
Key Challenges
- Idempotency: Guaranteeing that retried or duplicated requests never result in double-charging, using idempotency keys, request deduplication, and careful state machine transitions.
- Consistency and correctness: Maintaining an accurate, auditable ledger using techniques like double-entry bookkeeping, saga patterns, and distributed transaction management to ensure money is never created or destroyed.
- External processor integration: Interfacing with third-party payment gateways and banking APIs that have varying reliability, latency, and error semantics, while handling timeouts and ambiguous responses.
- Audit trails and compliance: Recording every state change immutably for regulatory compliance, dispute resolution, and financial reconciliation, often across multiple jurisdictions with different requirements.
- Fault tolerance and recovery: Designing robust failure handling so that crashes, network partitions, or downstream outages never leave transactions in an inconsistent state, using persistent queues and reconciliation jobs.
Prerequisites
- 03-reliability -- Fault tolerance, retry strategies, and exactly-once semantics that are critical for financial correctness.
- 04-data-systems -- ACID transactions, database consistency models, and durable storage patterns for ledger management.
- 10-security-architecture -- Encryption, authentication, PCI compliance, and secure handling of sensitive financial data.