The Scaling Milestones

Scaling is not a continuous process. It happens in discrete jumps. You are fine, you are fine, you are fine, and then something breaks. You fix that thing. You are fine again for a while. Then something else breaks.

Each milestone has a predictable set of problems and a predictable set of solutions. You do not need to solve all of them at once. You need to solve them as you arrive at each milestone.

This is the roadmap.

0-100 Users: Anything Works

At this stage, your architecture does not matter. A $5 VPS running a monolith with SQLite can serve 100 users without breaking a sweat. A free-tier Heroku dyno can handle it. A Raspberry Pi could probably handle it.

0-100 users architecture:
- One server (VPS, PaaS, or serverless)
- One database (SQLite, PostgreSQL, or managed DB)
- One deploy process (git push)
- Zero caching
- Zero background jobs (unless email)
- Zero CDN
- Zero load balancing

The only thing that matters at this stage is shipping features and talking to users. If you spend even one day on infrastructure optimization with 100 users, you are making a mistake.

What breaks at 100 users: nothing (if it breaks, your code has bugs, not scale problems)
What to focus on: product-market fit
Infrastructure time budget: less than 1 hour per month

Craigslist ran on a single server for years. It served millions of users on architecture that most engineers would consider inadequate. The product was valuable. That was enough.

100-1,000 Users: Basic Hygiene

At this stage, you start noticing things. Pages that were instant now take a second or two. The database log shows queries that scan entire tables. Your email delivery is slow because it happens synchronously in the request.

100-1,000 users problems:
- Slow queries (missing indexes, N+1 queries)
- No error tracking (you find out about bugs from users)
- Slow page loads from unoptimized assets
- Email and notifications blocking the request cycle
- No monitoring (you find out about downtime from users)

The fixes are straightforward and well-documented.

Add Database Indexes

The single highest-impact optimization you can make. Most ORMs have tools to identify slow queries. Add indexes to columns you filter, sort, or join on.

Common indexes to add:
- Foreign keys (user_id, team_id, etc.)
- Columns in WHERE clauses
- Columns in ORDER BY clauses
- Unique constraints (email, username)
- Composite indexes for common query patterns

Example impact:
Before index: SELECT * FROM orders WHERE user_id = 123 -> 800ms (full table scan)
After index:  SELECT * FROM orders WHERE user_id = 123 -> 2ms (index lookup)

Fix N+1 Queries

N+1 queries are the most common performance problem in ORM-based applications. You query a list of items, then make one additional query per item for related data.

N+1 example:
- Query 1: SELECT * FROM posts LIMIT 20
- Query 2-21: SELECT * FROM users WHERE id = ? (one per post)
Total: 21 queries

Fixed with eager loading:
- Query 1: SELECT * FROM posts LIMIT 20
- Query 2: SELECT * FROM users WHERE id IN (1, 2, 3, ..., 20)
Total: 2 queries

Move Slow Work to Background Jobs

Email sending, PDF generation, image processing, webhook delivery, anything that takes more than a few hundred milliseconds should move to a background job.

Background job setup:
- Simple: cron job that processes a queue table
- Better: Redis-backed job queue (Sidekiq, BullMQ, Celery)
- Simplest cloud: managed queue service (SQS, Cloud Tasks)

Add Error Tracking

You should hear about errors before your users tell you. Sentry has a free tier that covers most startups. Set it up before you hit 1,000 users.

Minimum monitoring stack at 1K users:
- Error tracking: Sentry (free tier)
- Uptime monitoring: UptimeRobot (free tier)
- Basic logging: whatever your platform provides
Total cost: $0

1,000-10,000 Users: Real Infrastructure

At this milestone, you need actual infrastructure decisions. Single-server setups start showing strain. Database connections become a bottleneck. Static assets slow down for geographically distant users.

1,000-10,000 users problems:
- Database connection limits
- Read-heavy queries competing with writes
- Static asset delivery speed
- Background job queue growing faster than processing
- Deploy downtime affecting real users
- Memory pressure on a single server

Connection Pooling

Most databases have a hard connection limit. PostgreSQL defaults to 100 connections. When your app server opens a connection per request, you hit that limit fast.

Connection pooling solutions:
- PgBouncer: external connection pooler for PostgreSQL
- Built-in pooling: most ORMs support connection pooling
- Managed databases: usually include connection pooling

Impact:
Without pooling: 100 max concurrent requests (PostgreSQL default)
With pooling: thousands of concurrent requests with 20-50 actual connections

Read Replicas

If your application is read-heavy (most web apps are), a read replica offloads SELECT queries from your primary database.

Read replica setup:
- Primary database: handles all writes (INSERT, UPDATE, DELETE)
- Read replica: handles read queries (SELECT)
- Application routes reads to replica, writes to primary

Common split:
- 80-90% of queries are reads -> replica handles most traffic
- Primary only handles writes -> less contention, faster writes
- Replication lag: usually under 100ms, acceptable for most reads

CDN for Static Assets

A Content Delivery Network serves static files (images, CSS, JavaScript) from servers close to your users. Cloudflare's free tier is more than enough.

CDN setup:
- Put Cloudflare in front of your domain (free)
- Serve static assets with cache headers
- Result: static files load from the nearest edge server
- Impact: 50-80% reduction in page load time for distant users

Background Job Workers

Scale your background job processing independently from your web server. If jobs are backing up, add more workers.

Background job scaling:
- Separate worker process from web server
- Monitor queue depth (alert if growing consistently)
- Scale workers based on queue depth, not web traffic
- Use job priorities (email delivery > analytics processing)

Zero-Downtime Deploys

At 1,000+ users, someone is always using your app. Deploys that take your app offline for 30 seconds start to matter.

Zero-downtime deploy options:
- Rolling deploy: replace instances one at a time
- Blue-green deploy: run two versions, switch traffic
- PaaS default: most platforms handle this automatically
- Simple approach: run two instances behind a load balancer

10,000-100,000 Users: Extract & Scale

This is where architecture starts to matter. Single-server monoliths hit real limits. You need to think about horizontal scaling, service extraction, and operational maturity.

10,000-100,000 users problems:
- Single server cannot handle peak traffic
- Database is a bottleneck even with read replicas
- Certain features have different scaling needs
- Deploys take longer and are riskier
- Incidents are more frequent and more impactful
- Team is growing, needs clear service ownership

Horizontal Scaling

Run multiple instances of your application behind a load balancer. This is the first step toward handling traffic beyond what a single server can serve.

Horizontal scaling requirements:
- Stateless application (no in-memory sessions)
- External session store (Redis)
- Shared file storage (S3) instead of local disk
- Load balancer (nginx, ALB, Cloudflare)
- Health check endpoint for the load balancer

Service Extraction

Do not break your monolith into microservices all at once. Extract services one at a time, starting with the components that have the most distinct scaling needs.

Good candidates for extraction:
- Image or file processing (CPU-intensive, different scaling)
- Search (specialized infrastructure like Elasticsearch)
- Real-time features (WebSocket servers scale differently)
- Email and notification delivery (high throughput, async)
- Analytics processing (can lag behind, heavy writes)

Bad candidates for extraction:
- Core business logic (keep it in the monolith)
- User authentication (use a provider)
- Anything tightly coupled to other features

Application-Level Caching

Beyond CDN caching, you need caching inside your application. Redis is the standard tool.

What to cache:
- Expensive database queries that are read often
- API responses from external services
- Computed values (dashboards, reports, aggregations)
- User session data
- Rate limiting counters

Cache invalidation strategy:
- Time-based expiry (simplest, good enough for most cases)
- Event-based invalidation (update cache when data changes)
- Cache-aside pattern (check cache, fall back to DB, populate cache)

Operational Maturity

At this scale, you need real operational practices.

Operational requirements at 10K-100K:
- Centralized logging (Datadog, Papertrail, or ELK)
- Application performance monitoring (APM)
- On-call rotation (even if it is just two people)
- Incident response playbook
- Automated alerting on key metrics
- Database backup testing (actually restore from backup)
- Load testing before major launches

100,000+ Users: You Are Not a Startup Anymore

At this point, you need specialists. Your infrastructure challenges are unique to your traffic patterns, data model, and product. Generic advice is less useful.

100,000+ users reality:
- Hire infrastructure engineers
- Hire database specialists
- Invest in observability platforms
- Consider multi-region deployment
- Database sharding may be necessary
- Custom caching layers
- Dedicated SRE function
- Formal capacity planning

Shopify at this scale has teams dedicated to database infrastructure, caching layers, and CDN optimization. You cannot run this with a generalist team and Stack Overflow answers.

The good news: if you have 100,000+ users, you have revenue. Use that revenue to hire the people who know how to solve these problems.

Common Pitfalls

Skipping milestones. You cannot jump from 100-user architecture to 100,000-user architecture. Each step builds on the previous one. Add connection pooling before you add read replicas. Add read replicas before you add sharding.

Optimizing everything at once. When you hit a milestone, one or two things will break. Fix those things. Do not redesign your entire architecture because your database queries are slow.

Ignoring the database. At every milestone, the database is a factor. Indexes at 1K users. Connection pooling at 5K. Read replicas at 10K. Caching at 50K. The database is almost always the constraint.

Scaling before measuring. If you add a caching layer without knowing which queries are slow, you might cache the wrong things. Measure first, optimize second.

Over-extracting services. Two services communicating over HTTP are slower and harder to debug than two modules in the same process. Extract services when you have a clear scaling reason, not when you want a cleaner architecture.

Key Takeaways

Scaling happens in discrete jumps. Each user milestone has predictable problems and predictable solutions.
0-100 users: nothing matters except product. 100-1K: basic hygiene. 1K-10K: real infrastructure. 10K-100K: extract and scale. 100K+: hire specialists.
The database is the bottleneck at every stage. Indexes, connection pooling, read replicas, and caching solve most database scaling problems.
Horizontal scaling requires a stateless application. Plan for this from the start by using external session stores and shared file storage.
Extract services one at a time, starting with components that have distinct scaling needs. Do not break your monolith all at once.
At 100,000+ users, you have revenue. Use it to hire people who have solved these problems before.