SQL vs NoSQL

Overview
Choosing between SQL and NoSQL is not about which is "better." It is about which data model fits your access patterns, consistency requirements, and operational constraints. Most production systems use both.
Relational Databases (SQL)
Relational databases store data in tables with rows and columns, enforcing a predefined schema. They excel at structured data with complex relationships.
Core Strengths
- ACID transactions guarantee data integrity
- Joins allow flexible querying across related entities
- Mature tooling, widespread expertise
- Strong consistency by default
When to Use
- Financial systems where transactions must be atomic
- Applications with complex queries and reporting needs
- Data with well-defined relationships and structure
- Systems where schema enforcement prevents bugs
Popular Choices
- PostgreSQL: Feature-rich, extensible, strong JSON support. Used by Instagram, Stripe, Reddit.
- MySQL: Battle-tested, read-heavy workloads. Used by Facebook, Twitter, Shopify.
- Amazon Aurora: Cloud-native MySQL/PostgreSQL compatible. Used by Samsung, Dow Jones.
Limitations
- Vertical scaling hits hardware limits
- Horizontal sharding is complex and application-managed
- Schema migrations on large tables can cause downtime
- Rigid schema slows iteration on rapidly changing data models
Document Stores
Document databases store data as flexible JSON-like documents. Each document can have a different structure.
Core Strengths
- Schema flexibility allows rapid iteration
- Documents map naturally to application objects
- Horizontal scaling is built into the architecture
- Reads of a single entity are fast since related data is embedded
When to Use
- Content management systems with varied content types
- User profiles with different attributes per user
- Product catalogs where items have different fields
- Prototyping and MVP development
Popular Choices
- MongoDB: Most popular document store. Used by Forbes, Toyota, Uber.
- Amazon DynamoDB: Fully managed, single-digit millisecond latency. Used by Lyft, Netflix, Airbnb.
- Couchbase: Strong mobile sync capabilities. Used by Amadeus, Comcast.
Limitations
- No joins means data duplication or multiple queries
- Lack of schema enforcement can lead to data inconsistency
- Transactions across documents are limited or expensive
- Aggregation queries are slower than SQL equivalents
Key-Value Stores
The simplest NoSQL model. Every item is stored as a key paired with a value. The database does not interpret the value.
Core Strengths
- Extremely fast reads and writes (sub-millisecond)
- Simple API: GET, PUT, DELETE
- Easy to scale horizontally by partitioning keys
- Predictable performance at any scale
When to Use
- Session management and user state
- Shopping carts and temporary data
- Caching layers
- Real-time leaderboards and counters
- Feature flags and configuration
Popular Choices
- Redis: In-memory with optional persistence. Used by Twitter, GitHub, Stack Overflow.
- Amazon DynamoDB: Can function as a key-value store with single-digit ms latency.
- etcd: Distributed key-value store for configuration. Used by Kubernetes.
Limitations
- No query capability beyond key lookup
- No relationships between values
- Large values degrade performance
- Range queries require careful key design
Column-Family Stores
Data is organized by columns rather than rows. Related columns are grouped into families, optimized for reading large datasets with known query patterns.
Core Strengths
- Excellent write throughput for time-series and event data
- Efficient compression since columns contain similar data types
- Scales to petabytes across commodity hardware
- Tunable consistency per operation
When to Use
- Time-series data (IoT sensors, metrics, logs)
- Write-heavy workloads with known read patterns
- Analytics on large datasets
- Event logging at massive scale
Popular Choices
- Apache Cassandra: Peer-to-peer, no single point of failure. Used by Netflix, Apple, Instagram.
- Apache HBase: Built on HDFS, strong consistency. Used by Facebook Messenger.
- Google Bigtable: Managed column store. Used by Google Maps, Gmail.
Limitations
- Poor for ad-hoc queries; you must design around known access patterns
- No joins or subqueries
- Read-before-write patterns are expensive
- Data modeling requires significant upfront planning
Graph Databases
Graph databases store entities as nodes and relationships as edges. They are purpose-built for traversing connections.
Core Strengths
- Relationship traversal in constant time regardless of dataset size
- Natural representation of connected data
- Pattern matching queries are intuitive
- Discovering indirect relationships is efficient
When to Use
- Social networks (friends of friends, recommendations)
- Fraud detection (finding suspicious connection patterns)
- Knowledge graphs and ontologies
- Network topology and dependency mapping
- Recommendation engines
Popular Choices
- Neo4j: Most popular graph database. Used by eBay, NASA, Walmart.
- Amazon Neptune: Managed graph database. Used by Samsung, Pearson.
- TigerGraph: Designed for deep-link analytics. Used by Intuit, UnitedHealth.
Limitations
- Not suited for bulk data processing
- Sharding a graph is fundamentally hard (connected data resists partitioning)
- Smaller community and fewer tools than relational databases
- Aggregate queries across all nodes are slow
Decision Framework
Start with your access patterns, not your data structure.
Ask yourself:
1. Do I need ACID transactions across multiple entities? -> SQL
2. Is my data highly connected with relationship-heavy queries? -> Graph
3. Do I need flexible schemas with entity-level reads? -> Document
4. Is this time-series or write-heavy with known queries? -> Column-Family
5. Do I just need fast key-based lookups? -> Key-Value
Real-world pattern:
- PostgreSQL for core transactional data
- Redis for caching and session management
- Elasticsearch for full-text search
- Cassandra for event logging
- Neo4j for recommendation graph
Real-World Trade-Offs
Uber migrated from PostgreSQL to a custom solution built on MySQL and Schemaless (document-like) because PostgreSQL's write amplification and replication model did not scale for their write-heavy workload.
Discord moved from MongoDB to Cassandra for message storage because they needed better write performance and horizontal scaling. They later migrated from Cassandra to ScyllaDB (Cassandra-compatible) for better tail latency.
Airbnb uses a polyglot persistence strategy: MySQL for bookings and payments, Elasticsearch for search, Redis for caching, and HBase for analytics data.
Common Pitfalls
- Choosing NoSQL to avoid schema design: You still need a data model. NoSQL just moves the schema into application code where it is harder to enforce.
- Premature optimization for scale: PostgreSQL handles most workloads well into the millions of rows. Start relational unless you have a specific reason not to.
- Ignoring operational complexity: Every database you add is another system to monitor, back up, and debug at 3 AM.
- Using a document store when you need joins: If your queries constantly require data from multiple collections, you either duplicate data everywhere or make many round trips. Consider SQL.
- Treating NoSQL as schema-less: Schema-less really means schema-on-read. The schema exists in your application code, and inconsistencies surface as runtime errors.
Key Takeaways
- SQL databases are the right default choice for most applications. Start here unless you have clear reasons not to.
- NoSQL databases solve specific problems: flexible schemas, horizontal scale, sub-millisecond latency, or specialized query patterns.
- Polyglot persistence (using multiple databases) is the norm in large systems. Use each database for what it does best.
- Your access patterns should drive your database choice, not your data structure.
- Every database involves trade-offs. Understand what you are giving up, not just what you are gaining.
- Operational complexity is a real cost. Each additional database multiplies your on-call burden.