Fair Usage & Quotas
Rate Limits vs Quotas
Rate limiting and quotas solve different problems. Rate limiting protects the server from being overwhelmed in the short term. Quotas protect the business model by controlling total consumption over longer periods.
Rate limit: 100 requests per minute (protects infrastructure)
Quota: 10,000 requests per day (protects the business)
A client can respect rate limits perfectly — never exceeding 100 requests per minute — and still exhaust their daily quota of 10,000 requests in under two hours. The two mechanisms work together.
Quota Tiers
Most API-based products structure their pricing around usage quotas tied to subscription tiers.
A Typical Tier Structure
{
"tiers": {
"free": {
"price": 0,
"daily_requests": 100,
"monthly_requests": 1000,
"rate_limit_per_minute": 10,
"features": ["basic_endpoints"]
},
"starter": {
"price": 29,
"daily_requests": 10000,
"monthly_requests": 200000,
"rate_limit_per_minute": 60,
"features": ["basic_endpoints", "search", "webhooks"]
},
"professional": {
"price": 99,
"daily_requests": 100000,
"monthly_requests": 2000000,
"rate_limit_per_minute": 600,
"features": ["basic_endpoints", "search", "webhooks", "bulk_operations", "analytics"]
},
"enterprise": {
"price": "custom",
"daily_requests": "custom",
"monthly_requests": "custom",
"rate_limit_per_minute": "custom",
"features": ["all"]
}
}
}
Real-World Examples
OpenAI API: Tiers based on usage and payment history. Free tier: 3 requests/minute. Tier 1 (after first payment): 500 requests/minute. Higher tiers unlock as spending increases.
Google Maps Platform: $200 free monthly credit. After that, pay-per-request pricing with per-API quotas (e.g., Geocoding: 50 requests/second).
Twilio: Pay-per-message with account-level rate limits. No daily quota — usage is metered and billed directly.
Burst Limits vs Sustained Limits
Quotas operate at two time scales. Burst limits prevent short-term spikes. Sustained limits control long-term consumption.
How They Interact
Burst limit: 100 requests/second (max instantaneous rate)
Sustained limit: 10,000 requests/hour (max sustained rate)
Daily quota: 100,000 requests/day (max total consumption)
A client sending 100 requests/second can sustain that for only 100 seconds before hitting the hourly sustained limit. This prevents a client from burning through their daily quota in a single burst.
Time 0:00 100 req/s -> allowed (within burst limit)
Time 0:01 100 req/s -> allowed
...
Time 1:40 100 req/s -> blocked (10,000 hourly limit reached after 100 seconds)
Time 1:00:00 Hourly limit resets, requests resume
AWS API Gateway Example
AWS API Gateway uses both:
{
"throttle": {
"burst_limit": 5000,
"rate_limit": 10000
}
}
The burst limit (token bucket capacity) allows short spikes. The rate limit (tokens per second) controls sustained throughput.
Usage Tracking & Billing
Tracking Consumption
Every request increments a counter. For quota enforcement, these counters must be accurate, durable, and fast.
{
"client_id": "app_456",
"period": "2024-03-22",
"usage": {
"total_requests": 7432,
"read_requests": 6891,
"write_requests": 541,
"search_requests": 204,
"bytes_transferred": 15728640
},
"quota": {
"daily_limit": 10000,
"remaining": 2568
}
}
Metered vs Prepaid Models
Prepaid (quota-based): Pay for a tier, get a fixed allocation. Simple, predictable billing. If you exceed the quota, requests are rejected or overage fees apply.
Metered (pay-per-use): Pay for what you consume. No hard caps, but costs can surprise you. AWS, Twilio, and most cloud providers use this model.
Hybrid: A base quota included in the subscription, with metered pricing for overages. Stripe's API does not have request quotas — they charge per transaction instead.
{
"billing": {
"plan": "professional",
"base_requests_included": 200000,
"overage_price_per_1000": 0.50,
"current_month_usage": 187432,
"overage_requests": 0,
"estimated_overage_cost": 0
}
}
Quota Headers
Communicate quota status alongside rate limit headers:
HTTP/1.1 200 OK
X-RateLimit-Limit: 600
X-RateLimit-Remaining: 584
X-RateLimit-Reset: 1711036800
X-Quota-Limit: 10000
X-Quota-Remaining: 2568
X-Quota-Reset: 1711065600
When the quota is exhausted, return 429 with a clear distinction from rate limit rejections:
{
"error": {
"type": "quota_exceeded",
"status": 429,
"message": "Daily API quota exceeded. Your plan allows 10,000 requests per day. Upgrade your plan or wait until midnight UTC for the quota to reset.",
"quota_limit": 10000,
"quota_used": 10000,
"quota_resets_at": "2024-03-23T00:00:00Z",
"upgrade_url": "https://dashboard.example.com/billing/upgrade"
}
}
Communicating Limits to Developers
Documentation
Publish limits prominently. Do not bury them in fine print.
## Rate Limits & Quotas
| Plan | Requests/Minute | Requests/Day | Requests/Month |
|-------------|----------------|-------------|----------------|
| Free | 10 | 100 | 1,000 |
| Starter | 60 | 10,000 | 200,000 |
| Professional | 600 | 100,000 | 2,000,000 |
| Enterprise | Custom | Custom | Custom |
All plans are subject to a burst limit of 10x the per-minute rate.
Write endpoints have a separate limit at 1/10 of the read limit.
Response Headers
Every response should include both rate limit and quota headers. Clients should not have to guess their remaining budget.
Developer Dashboard
Provide a dashboard showing real-time and historical usage:
{
"dashboard": {
"current_period": "2024-03-01 to 2024-03-31",
"usage_by_day": [
{"date": "2024-03-20", "requests": 8742},
{"date": "2024-03-21", "requests": 9156},
{"date": "2024-03-22", "requests": 7432}
],
"usage_by_endpoint": [
{"endpoint": "GET /users", "requests": 15234},
{"endpoint": "POST /orders", "requests": 3891},
{"endpoint": "GET /search", "requests": 2107}
],
"projected_monthly_usage": 267000,
"quota_monthly_limit": 200000,
"projected_overage": 67000
}
}
Stripe's dashboard shows API request volume over time, broken down by endpoint and status code. Twilio shows usage with cost projections.
Approaching-Limit Warnings
Notify developers before they hit the wall:
X-Quota-Warning: You have used 80% of your daily quota (8,000/10,000)
Send email notifications at 50%, 80%, and 100% of quota. Developers should not discover quota exhaustion from production errors.
Graceful Degradation
When a client approaches or exceeds limits, the API should degrade gracefully rather than crash or return unhelpful errors.
Degradation Strategies
Reduce response detail: Return summary data instead of full objects when the client is near their limit.
{
"data": [
{"id": "user_123", "name": "Jane Smith"},
{"id": "user_456", "name": "John Doe"}
],
"meta": {
"degraded": true,
"reason": "Approaching quota limit. Full user objects are not included. Upgrade your plan for complete responses."
}
}
Disable expensive features: If the client exceeds a threshold, disable search, analytics, or other high-cost endpoints while keeping basic read operations available.
Queue instead of reject: For non-time-sensitive operations (reports, exports, bulk operations), accept the request and process it when capacity is available.
{
"status": "queued",
"message": "Your export request has been queued due to high demand. You will be notified when it is ready.",
"estimated_completion": "2024-03-22T13:00:00Z",
"status_url": "/api/v1/exports/export_789/status"
}
Return cached data: Serve stale cached data with a header indicating it is not fresh, rather than rejecting the request entirely.
HTTP/1.1 200 OK
X-Cache: HIT
X-Cache-Age: 300
Warning: 110 - "Response is stale"
What Not to Do
Never silently drop data or return partial results without indicating it. Never return a 200 OK with an error in the body. Never crash the entire service because one client exceeded their quota.
Common Pitfalls
Treating rate limits and quotas as the same thing. Rate limits protect infrastructure (requests per second). Quotas protect the business (total requests per billing period). You need both.
No grace period for quota overages. Cutting off a production application at exactly 10,000 requests with no warning is hostile. Provide warnings at 50%, 80%, and 90%. Consider a small overage buffer (5-10%) before hard cutoff.
Opaque billing. If developers cannot see their usage in real time, they cannot manage costs. Provide dashboards, usage APIs, and projected cost estimates.
Quota resets at inconvenient times. Resetting daily quotas at midnight in the provider's timezone instead of the client's can be confusing. Use midnight UTC and document it clearly.
Not offering a way to increase limits. Every quota-exceeded error should include a path to resolution: upgrade URL, sales contact, or self-service limit increase. Developers stuck at a hard wall with no way forward will switch to a competitor.
Conflating burst and sustained limits in documentation. "100 requests per minute" is ambiguous. Can I send 100 in the first second? Clearly separate burst limits (instantaneous maximum) from sustained limits (average rate) in documentation.
Key Takeaways
- Rate limits protect the server from short-term overload; quotas protect the business model by controlling total consumption over billing periods.
- Structure quota tiers with clear, predictable limits; combine burst limits for short-term spikes with sustained limits for long-term consumption.
- Communicate limits through documentation, response headers, and developer dashboards; notify developers before they hit the wall, not after.
- Degrade gracefully when clients approach limits: reduce detail, queue non-urgent work, serve cached data, but never silently drop data or crash.
- Every quota-exceeded error should include a clear path to resolution, whether that is an upgrade URL, a reset time, or a sales contact.