The Production Checklist

Your prototype works on localhost. You ran it yesterday. You showed it to three people. It looked great. Now you want to put it in front of real customers.

You are not ready.

The gap between "it works on my machine" and "it works reliably for paying customers" is larger than most engineers expect. It is not about adding features. It is about adding resilience. Error handling, logging, monitoring, backups, deployment processes, health checks, SSL, environment configuration — the unsexy infrastructure that separates a demo from a product.

This chapter is the checklist. Everything you need to cross the gap from prototype to production.

The Gap

A prototype proves that your idea can work. Production proves that it can work reliably, repeatedly, and for people who are not you.

Prototype assumptions:
- One user (you)
- One environment (your laptop)
- One browser (yours)
- Happy path only (you know how to use it)
- Manual recovery (you can restart it)
- No data loss concerns (it is test data)
- No uptime requirements (it can be down)

Production realities:
- Many users (with different browsers, devices, expectations)
- Multiple environments (staging, production)
- Unhappy paths (users do unexpected things)
- Automated recovery (you are asleep when it breaks)
- Data is precious (users trust you with their data)
- Uptime matters (downtime costs money and trust)

The checklist below covers the minimum requirements for going to production. Not every item is critical for every app, but skipping too many means your first week with real users will be firefighting instead of learning.

Error Handling

Your prototype probably crashes when something unexpected happens. In production, it needs to fail gracefully.

Error handling checklist:
- API endpoints return proper HTTP status codes (400, 404, 500)
- User-facing errors show helpful messages, not stack traces
- Unexpected errors are caught at the top level
- Failed external API calls are retried or handled gracefully
- Database connection failures do not crash the process
- File upload failures give clear feedback
- Form validation errors explain what went wrong

The principle is simple: no user should ever see a stack trace, a white screen, or a cryptic error. They should see a message that tells them what happened and what to do about it.

Stripe's early engineering culture had a rule: every error a user sees should be actionable. "Something went wrong" is not actionable. "Your card was declined — please try a different payment method" is.

Error handling priorities:
1. Payment and checkout flows (money is involved)
2. Authentication flows (users are locked out)
3. Data submission forms (users lose work)
4. Core feature paths (the reason users are here)
5. Everything else (handle it, but prioritize the above)

Logging

You cannot debug production problems without logs. Your prototype probably uses console.log. Production needs structured, persistent logging.

Logging checklist:
- Application logs are written to a persistent location (not just stdout)
- Log levels are used correctly (error, warn, info, debug)
- Requests are logged with method, path, status code, and duration
- Errors include stack traces and relevant context
- Sensitive data is NOT logged (passwords, tokens, PII)
- Logs are searchable (use a log aggregation service or structured format)

You do not need a fancy logging infrastructure on day one. Writing structured JSON logs to stdout and having your hosting platform capture them is enough. Railway, Render, Fly.io, and Heroku all capture stdout logs.

Minimum logging for production:
- Every incoming request: method, path, status, duration
- Every error: message, stack trace, request context
- Every external API call: service, endpoint, status, duration
- Every background job: job type, status, duration
- Authentication events: login, logout, failed attempt

Do NOT log:
- Passwords or password hashes
- Full credit card numbers
- Session tokens or API keys
- Personal data beyond what is needed for debugging

Papertrail, Logtail, and Datadog all have free or cheap tiers suitable for early-stage startups. Pick one and set it up before you launch.

Monitoring

Monitoring tells you when something is wrong before your users tell you. This is non-negotiable for production.

Monitoring checklist:
- Uptime check (is the site reachable?)
- Error rate tracking (are errors increasing?)
- Response time tracking (are pages getting slower?)
- Database health (are connections available? are queries slow?)
- Disk space (are you running out?)
- SSL certificate expiry (will your cert expire without warning?)

The minimum viable monitoring stack costs nothing.

Free monitoring stack:
- Uptime: UptimeRobot (free, 50 monitors, 5-minute checks)
- Errors: Sentry (free tier, 5K events per month)
- Metrics: your hosting platform's built-in dashboard
- SSL expiry: UptimeRobot checks this automatically

Total cost: $0
Setup time: 1-2 hours

BetterStack, Checkly, and Grafana Cloud also have free tiers if you want more sophisticated monitoring later. But start with the basics.

Backups

Your database will eventually have data that cannot be recreated. Customer data, transaction records, configuration. If you lose it, you might lose the business.

Backup checklist:
- Automated daily database backups
- Backups stored in a different location than the database
- Backup retention for at least 30 days
- Tested restore process (actually try restoring from a backup)
- File storage backups (if users upload files)

Most managed database services include automated backups. Supabase, PlanetScale, Neon, and AWS RDS all provide daily backups by default. If you are running your own PostgreSQL, set up pg_dump on a cron job and store the output in S3.

Backup reality check:
- A backup you have never restored is not a backup
- A backup on the same server as the database is not a backup
- A backup with no retention policy is a ticking time bomb
- Test your restore process at least once before launch

GitLab famously lost production data in 2017 because their backup processes were not working correctly. They had five different backup strategies. None of them worked when they needed them. Test your backups.

Deploy Process

Manual deployment is a source of human error. Your prototype might deploy by SSHing into a server and running git pull. Production needs something more reliable.

Deploy process checklist:
- Deployment is a single command or automatic on merge
- Rollback is possible (revert to previous version quickly)
- Environment variables are not hardcoded
- Database migrations run as part of the deploy
- Zero-downtime deploys (or at minimum, very brief downtime)
- Deploy logs exist (who deployed what and when)

For most startups, the simplest path is a PaaS that deploys on git push. Railway, Render, Fly.io, and Vercel all do this. Push to main, the platform builds and deploys.

Deploy process by platform:
- Vercel: git push triggers deploy, automatic preview for PRs
- Railway: git push triggers deploy, instant rollback
- Render: git push triggers deploy, automatic SSL
- Fly.io: flyctl deploy command, multi-region
- Heroku: git push heroku main, simple rollback
- VPS: set up a GitHub Action to SSH and deploy

If you are on a VPS, a simple GitHub Actions workflow that SSHs into your server, pulls the latest code, and restarts the service is infinitely better than doing it manually.

Health Checks

A health check endpoint tells your load balancer, monitoring service, and deployment platform whether your application is working.

Health check implementation:
- Create a /health endpoint that returns 200 when the app is healthy
- Check database connectivity in the health check
- Check critical external service connectivity (if applicable)
- Return 503 if any critical dependency is down
- Keep the health check fast (under 1 second)

Example health check response:
Status 200
{
  "status": "healthy",
  "database": "connected",
  "version": "1.2.3",
  "uptime": "3d 14h"
}

Your monitoring service pings this endpoint. Your load balancer uses it to route traffic. Your deploy process uses it to confirm the new version is working.

SSL & Environment Configuration

SSL is covered in the security chapter, but it belongs on the production checklist too. No exceptions.

SSL checklist:
- All traffic served over HTTPS
- HTTP redirects to HTTPS
- SSL certificate auto-renews (Let's Encrypt or platform-managed)
- HSTS header set (tells browsers to always use HTTPS)

Environment configuration ensures your application behaves correctly in each environment without code changes.

Environment configuration checklist:
- All secrets in environment variables (not in code)
- Database URLs configured per environment
- API keys for third-party services per environment
- Feature flags or environment-specific behavior clearly separated
- .env file for local development, platform config for production
- No hardcoded URLs (use environment variables for API endpoints)

The Pre-Launch Checklist

Putting it all together. Run through this before your first real user touches the system.

Pre-launch production checklist:

Security:
[ ] HTTPS enabled and enforced
[ ] Passwords hashed with bcrypt or argon2
[ ] No secrets in source code
[ ] SQL injection prevented (parameterized queries or ORM)
[ ] .env in .gitignore

Reliability:
[ ] Error handling on all critical paths
[ ] Graceful error messages for users
[ ] Health check endpoint exists and works
[ ] Automated database backups configured
[ ] Backup restore tested at least once

Observability:
[ ] Structured logging in place
[ ] Error tracking set up (Sentry or equivalent)
[ ] Uptime monitoring configured
[ ] Basic response time tracking

Deployment:
[ ] Deploy is automated (git push or single command)
[ ] Rollback process documented and tested
[ ] Environment variables configured for production
[ ] Database migrations run automatically
[ ] Zero-downtime deploy or brief maintenance window

Operations:
[ ] Domain and DNS configured
[ ] SSL certificate auto-renews
[ ] On-call contact method (even if it is just your phone)
[ ] Customer communication channel for outages (even if it is email)

What Can Wait

Not everything needs to be perfect on day one. Some production concerns can be deferred until you have more users and more revenue.

Can wait until later:
- Automated testing pipeline (test critical paths manually for now)
- Staging environment (deploy to production, fix forward)
- Performance optimization (optimize when you measure problems)
- Multi-region deployment (one region is fine)
- Horizontal scaling (one server is fine for a while)
- Comprehensive documentation (comments on the confusing parts)
- Formal incident response plan (know who to call)

These are all important. They are just not launch-blocking.

Common Pitfalls

Launching without monitoring. You will find out about outages from customer emails instead of alerts. This is embarrassing and avoidable with a free UptimeRobot account.

Launching without backups. Your database will have a problem eventually. A corrupted migration, an accidental deletion, a disk failure. Without backups, this is catastrophic.

Over-engineering the deploy process. You do not need Kubernetes, blue-green deployments, or canary releases. You need git push to trigger a deploy that you can roll back. Start there.

Forgetting about rollback. Every deploy should have a rollback plan. "Revert the last commit and deploy again" is a fine plan. Having no plan is not.

Not testing on a device other than your own. Your app works on your MacBook in Chrome. Does it work on a phone? In Safari? On a slow connection? Test on at least one mobile device and one non-Chrome browser before launch.

Key Takeaways

The gap between prototype and production is error handling, logging, monitoring, backups, deploys, health checks, SSL, and environment config.
None of this is glamorous. All of it is necessary.
The minimum viable production stack costs $0 using free tiers of monitoring and error tracking services.
Test your backups. A backup you have never restored is not a backup.
Automate your deploys. Manual deployment is a source of human error that compounds with every deploy.
Launch with the checklist complete, not with every feature polished. Reliability beats features for your first real users.