Implementing Rate Limits
Communicating Limits to Clients
Rate limiting without clear communication is a disservice to API consumers. Clients need to know their limits, their current usage, and what to do when they hit the ceiling. Standard HTTP headers and response bodies make this possible.
Rate Limit Headers
The de facto standard headers communicate rate limit status on every response, not just on rejections.
X-RateLimit-Limit
The maximum number of requests allowed in the current window.
X-RateLimit-Limit: 1000
X-RateLimit-Remaining
The number of requests remaining in the current window.
X-RateLimit-Remaining: 742
X-RateLimit-Reset
The time when the current window resets, typically as a Unix timestamp.
X-RateLimit-Reset: 1711036800
Some APIs use seconds until reset instead of a timestamp. GitHub uses Unix timestamps. Stripe uses seconds remaining. Pick one and document it.
Retry-After
Included only on 429 responses. Tells the client how long to wait before retrying, in seconds.
Retry-After: 30
Or as an HTTP date:
Retry-After: Sat, 22 Mar 2025 12:00:00 GMT
A Complete Response
A successful request with rate limit headers:
HTTP/1.1 200 OK
Content-Type: application/json
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 742
X-RateLimit-Reset: 1711036800
{
"data": { ... }
}
The IETF RateLimit Header Fields Draft
The IETF has proposed standard headers (RFC 9110 compatible) to replace the X- prefix convention:
RateLimit-Limit: 1000
RateLimit-Remaining: 742
RateLimit-Reset: 258
Under this draft, RateLimit-Reset is always seconds remaining, not a timestamp. While adoption is still growing, new APIs should consider this format.
Per-Key Rate Limiting Strategies
Different identifiers serve different purposes. Most APIs layer multiple strategies.
By API Key
The most common approach. Each API key has its own rate limit bucket.
{
"api_key": "sk_live_abc123",
"plan": "professional",
"limits": {
"requests_per_minute": 600,
"requests_per_day": 50000
}
}
This limits the application as a whole. Stripe rate limits by API key: 100 read requests/second and 100 write requests/second in live mode.
By IP Address
Rate limiting by IP protects unauthenticated endpoints (login, registration, public APIs). It is also a defense against credential stuffing.
IP: 203.0.113.42
Limit: 60 requests/minute for unauthenticated endpoints
Be careful with IP-based limiting behind proxies and NAT. Many users can share a single IP. Use the X-Forwarded-For header cautiously — it can be spoofed. Trust it only from known proxies.
By User ID
After authentication, rate limit by the authenticated user. This prevents a single user from monopolizing resources even if they use multiple API keys or clients.
{
"user_id": "user_123",
"limits": {
"requests_per_minute": 100,
"requests_per_hour": 5000
}
}
Layered Limits
Production APIs combine these strategies:
1. Per-IP: 60 requests/minute (protects against unauthenticated abuse)
2. Per-key: 600 requests/minute (protects against application-level abuse)
3. Per-user: 100 requests/minute (protects against user-level abuse)
4. Global: 10000 requests/second (protects against total system overload)
A request must pass all applicable limits. If any layer rejects, the request returns 429.
Different Limits for Different Endpoints
Not all endpoints cost the same. A database-intensive search endpoint consumes far more resources than reading a cached configuration value.
Read vs Write Limits
{
"limits": {
"read_endpoints": {
"per_minute": 600,
"description": "GET requests"
},
"write_endpoints": {
"per_minute": 60,
"description": "POST, PUT, PATCH, DELETE requests"
}
}
}
GitHub applies this pattern: 5,000 requests/hour for authenticated read requests, but lower limits on content creation endpoints to prevent spam.
Per-Endpoint Limits
Some endpoints need individual limits:
{
"endpoint_limits": {
"POST /api/v1/search": {
"per_minute": 30,
"reason": "Expensive full-text search"
},
"POST /api/v1/exports": {
"per_hour": 10,
"reason": "Generates large files"
},
"GET /api/v1/users/:id": {
"per_minute": 600,
"reason": "Cached, lightweight"
}
}
}
Communicating Endpoint-Specific Limits
When an endpoint has a different limit than the default, include it in the headers:
HTTP/1.1 200 OK
X-RateLimit-Limit: 30
X-RateLimit-Remaining: 12
X-RateLimit-Reset: 1711036800
X-RateLimit-Resource: search
The X-RateLimit-Resource header (used by GitHub) tells the client which rate limit pool was consumed, so they can track multiple limits independently.
The 429 Response
When a client exceeds the rate limit, return 429 Too Many Requests with a clear, actionable response body.
Response Format
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1711036800
{
"error": {
"type": "rate_limit_exceeded",
"status": 429,
"message": "Rate limit exceeded. You have sent too many requests. Please retry after 30 seconds.",
"limit": 100,
"remaining": 0,
"reset_at": "2024-03-22T12:00:00Z",
"retry_after": 30,
"resource": "default"
}
}
What to Include in the Response
The response body should answer three questions:
- Which limit did I hit? Include the limit value and resource identifier.
- When can I retry? Include
retry_afterin seconds andreset_atas a timestamp. - What should I do? Include a human-readable message.
Stripe's Approach
Stripe returns a structured error:
{
"error": {
"type": "rate_limit_error",
"message": "Too many requests hit the API too quickly. We recommend an exponential backoff of your requests."
}
}
Stripe also recommends exponential backoff in their documentation: wait 1 second, then 2, then 4, up to a maximum. This is good guidance to include in your API documentation.
Client-Side Rate Limit Handling
Exponential Backoff with Jitter
When clients receive a 429, they should back off exponentially with random jitter to avoid the thundering herd problem.
Attempt 1: wait random(0, 1) seconds
Attempt 2: wait random(0, 2) seconds
Attempt 3: wait random(0, 4) seconds
Attempt 4: wait random(0, 8) seconds
Maximum: wait random(0, 32) seconds
Without jitter, all clients retry at the same time, creating another spike. The randomness spreads retries across time.
Respecting Retry-After
If the server provides Retry-After, use it instead of calculating your own backoff:
1. Receive 429 with Retry-After: 30
2. Wait 30 seconds
3. Retry the request
4. If still 429, add exponential backoff on top
Common Pitfalls
Not including rate limit headers on successful responses. Clients need to see their remaining quota on every response, not just when they are rejected. Without this, they cannot proactively manage their request rate.
Returning 429 without Retry-After. The client has no idea when to retry. Some will retry immediately, making the problem worse. Always include Retry-After.
Using 503 instead of 429. 503 Service Unavailable means the server is overloaded or down. 429 means the specific client has exceeded its limit. The distinction matters for monitoring and client behavior.
Rate limiting by IP in a mobile API. Mobile clients frequently change IPs (switching between WiFi and cellular). IP-based limits will be inconsistent. Use authenticated rate limiting (by user or API key) for mobile APIs.
Setting the same limits for all endpoints. A search endpoint that hits the database with a full-text query costs 100x more than serving a cached configuration value. Differentiate limits by endpoint cost.
Not documenting rate limits. If developers have to discover limits by hitting them, they will be frustrated. Document all limits, tiers, and the expected headers in your API documentation. Stripe, GitHub, and Twilio all have dedicated rate limit documentation pages.
Key Takeaways
- Include
X-RateLimit-Limit,X-RateLimit-Remaining, andX-RateLimit-Resetheaders on every response so clients can manage their usage proactively. - Layer rate limits by IP, API key, and user ID to protect against different types of abuse.
- Set different limits for different endpoints based on their resource cost; a search endpoint is not the same as a health check.
- Always include
Retry-Afteron 429 responses and recommend exponential backoff with jitter in your documentation. - Monitor rate limit hit rates to tune limits; too many 429s means limits are too low, zero 429s means limits are too high to be protective.