SQL vs NoSQL

Database type selection decision tree

Overview

Choosing between SQL and NoSQL is not about which is "better." It is about which data model fits your access patterns, consistency requirements, and operational constraints. Most production systems use both.

Relational Databases (SQL)

Relational databases store data in tables with rows and columns, enforcing a predefined schema. They excel at structured data with complex relationships.

Core Strengths

ACID transactions guarantee data integrity
Joins allow flexible querying across related entities
Mature tooling, widespread expertise
Strong consistency by default

When to Use

Financial systems where transactions must be atomic
Applications with complex queries and reporting needs
Data with well-defined relationships and structure
Systems where schema enforcement prevents bugs

Popular Choices

PostgreSQL: Feature-rich, extensible, strong JSON support. Used by Instagram, Stripe, Reddit.
MySQL: Battle-tested, read-heavy workloads. Used by Facebook, Twitter, Shopify.
Amazon Aurora: Cloud-native MySQL/PostgreSQL compatible. Used by Samsung, Dow Jones.

Limitations

Vertical scaling hits hardware limits
Horizontal sharding is complex and application-managed
Schema migrations on large tables can cause downtime
Rigid schema slows iteration on rapidly changing data models

Document Stores

Document databases store data as flexible JSON-like documents. Each document can have a different structure.

Core Strengths

Schema flexibility allows rapid iteration
Documents map naturally to application objects
Horizontal scaling is built into the architecture
Reads of a single entity are fast since related data is embedded

When to Use

Content management systems with varied content types
User profiles with different attributes per user
Product catalogs where items have different fields
Prototyping and MVP development

Popular Choices

MongoDB: Most popular document store. Used by Forbes, Toyota, Uber.
Amazon DynamoDB: Fully managed, single-digit millisecond latency. Used by Lyft, Netflix, Airbnb.
Couchbase: Strong mobile sync capabilities. Used by Amadeus, Comcast.

Limitations

No joins means data duplication or multiple queries
Lack of schema enforcement can lead to data inconsistency
Transactions across documents are limited or expensive
Aggregation queries are slower than SQL equivalents

Key-Value Stores

The simplest NoSQL model. Every item is stored as a key paired with a value. The database does not interpret the value.

Core Strengths

Extremely fast reads and writes (sub-millisecond)
Simple API: GET, PUT, DELETE
Easy to scale horizontally by partitioning keys
Predictable performance at any scale

When to Use

Session management and user state
Shopping carts and temporary data
Caching layers
Real-time leaderboards and counters
Feature flags and configuration

Popular Choices

Redis: In-memory with optional persistence. Used by Twitter, GitHub, Stack Overflow.
Amazon DynamoDB: Can function as a key-value store with single-digit ms latency.
etcd: Distributed key-value store for configuration. Used by Kubernetes.

Limitations

No query capability beyond key lookup
No relationships between values
Large values degrade performance
Range queries require careful key design

Column-Family Stores

Data is organized by columns rather than rows. Related columns are grouped into families, optimized for reading large datasets with known query patterns.

Core Strengths

Excellent write throughput for time-series and event data
Efficient compression since columns contain similar data types
Scales to petabytes across commodity hardware
Tunable consistency per operation

When to Use

Time-series data (IoT sensors, metrics, logs)
Write-heavy workloads with known read patterns
Analytics on large datasets
Event logging at massive scale

Popular Choices

Apache Cassandra: Peer-to-peer, no single point of failure. Used by Netflix, Apple, Instagram.
Apache HBase: Built on HDFS, strong consistency. Used by Facebook Messenger.
Google Bigtable: Managed column store. Used by Google Maps, Gmail.

Limitations

Poor for ad-hoc queries; you must design around known access patterns
No joins or subqueries
Read-before-write patterns are expensive
Data modeling requires significant upfront planning

Graph Databases

Graph databases store entities as nodes and relationships as edges. They are purpose-built for traversing connections.

Core Strengths

Relationship traversal in constant time regardless of dataset size
Natural representation of connected data
Pattern matching queries are intuitive
Discovering indirect relationships is efficient

When to Use

Social networks (friends of friends, recommendations)
Fraud detection (finding suspicious connection patterns)
Knowledge graphs and ontologies
Network topology and dependency mapping
Recommendation engines

Popular Choices

Neo4j: Most popular graph database. Used by eBay, NASA, Walmart.
Amazon Neptune: Managed graph database. Used by Samsung, Pearson.
TigerGraph: Designed for deep-link analytics. Used by Intuit, UnitedHealth.

Limitations

Not suited for bulk data processing
Sharding a graph is fundamentally hard (connected data resists partitioning)
Smaller community and fewer tools than relational databases
Aggregate queries across all nodes are slow

Decision Framework

Start with your access patterns, not your data structure.

Ask yourself:
1. Do I need ACID transactions across multiple entities? -> SQL
2. Is my data highly connected with relationship-heavy queries? -> Graph
3. Do I need flexible schemas with entity-level reads? -> Document
4. Is this time-series or write-heavy with known queries? -> Column-Family
5. Do I just need fast key-based lookups? -> Key-Value

Real-world pattern:
- PostgreSQL for core transactional data
- Redis for caching and session management
- Elasticsearch for full-text search
- Cassandra for event logging
- Neo4j for recommendation graph

Real-World Trade-Offs

Uber migrated from PostgreSQL to a custom solution built on MySQL and Schemaless (document-like) because PostgreSQL's write amplification and replication model did not scale for their write-heavy workload.

Discord moved from MongoDB to Cassandra for message storage because they needed better write performance and horizontal scaling. They later migrated from Cassandra to ScyllaDB (Cassandra-compatible) for better tail latency.

Airbnb uses a polyglot persistence strategy: MySQL for bookings and payments, Elasticsearch for search, Redis for caching, and HBase for analytics data.

Common Pitfalls

Choosing NoSQL to avoid schema design: You still need a data model. NoSQL just moves the schema into application code where it is harder to enforce.
Premature optimization for scale: PostgreSQL handles most workloads well into the millions of rows. Start relational unless you have a specific reason not to.
Ignoring operational complexity: Every database you add is another system to monitor, back up, and debug at 3 AM.
Using a document store when you need joins: If your queries constantly require data from multiple collections, you either duplicate data everywhere or make many round trips. Consider SQL.
Treating NoSQL as schema-less: Schema-less really means schema-on-read. The schema exists in your application code, and inconsistencies surface as runtime errors.

Key Takeaways

SQL databases are the right default choice for most applications. Start here unless you have clear reasons not to.
NoSQL databases solve specific problems: flexible schemas, horizontal scale, sub-millisecond latency, or specialized query patterns.
Polyglot persistence (using multiple databases) is the norm in large systems. Use each database for what it does best.
Your access patterns should drive your database choice, not your data structure.
Every database involves trade-offs. Understand what you are giving up, not just what you are gaining.
Operational complexity is a real cost. Each additional database multiplies your on-call burden.