GraphQL
Overview
GraphQL is a query language for APIs developed by Facebook in 2012 and open-sourced in 2015. It lets clients specify exactly what data they need in a single request. This solves real problems with REST APIs but introduces its own set of challenges.
How GraphQL Works
REST approach (multiple requests):
GET /users/123 -> { id, name, email, avatar_url, ... }
GET /users/123/posts -> [{ id, title, body, created_at, ... }, ...]
GET /users/123/followers -> [{ id, name, ... }, ...]
3 HTTP requests. Each returns fields you may not need.
GraphQL approach (single request):
POST /graphql
query {
user(id: "123") {
name
posts(first: 5) {
title
created_at
}
followerCount
}
}
1 HTTP request. Returns exactly the fields requested.
Schema Definition
type User {
id: ID!
name: String!
email: String!
avatar_url: String
posts(first: Int, after: String): PostConnection!
followerCount: Int!
createdAt: DateTime!
}
type Post {
id: ID!
title: String!
body: String!
author: User!
comments(first: Int): [Comment!]!
createdAt: DateTime!
}
type Comment {
id: ID!
body: String!
author: User!
}
type PostConnection {
edges: [PostEdge!]!
pageInfo: PageInfo!
}
type PostEdge {
node: Post!
cursor: String!
}
type PageInfo {
hasNextPage: Boolean!
endCursor: String
}
type Query {
user(id: ID!): User
post(id: ID!): Post
searchPosts(query: String!, first: Int): PostConnection!
}
type Mutation {
createPost(input: CreatePostInput!): Post!
updatePost(id: ID!, input: UpdatePostInput!): Post!
deletePost(id: ID!): Boolean!
}
Resolvers
Resolvers are functions that fetch the data for each field in the schema. They are where the actual data loading logic lives.
Resolver structure (conceptual):
Query.user(parent, args, context):
return database.users.findById(args.id)
User.posts(user, args, context):
return database.posts.findByAuthor(user.id, limit=args.first)
User.followerCount(user, args, context):
return database.followers.countByUser(user.id)
Post.author(post, args, context):
return database.users.findById(post.author_id)
Each field can have its own resolver.
Parent resolvers pass data to child resolvers.
Context carries shared state (auth, database connections).
The N+1 Problem
The N+1 problem is the most common performance issue in GraphQL and must be addressed from the start.
The Problem
Query:
query {
posts(first: 20) {
title
author {
name
}
}
}
Naive execution:
1 query: SELECT * FROM posts LIMIT 20 (1 query)
For each post: SELECT * FROM users WHERE id = post.author_id (20 queries)
Total: 21 database queries for 1 GraphQL request
With duplicate authors:
If 10 posts share the same author, you still query that
author 10 times.
The Solution: DataLoader
DataLoader batches and deduplicates resolver calls:
Without DataLoader:
Resolve author for post 1: SELECT * FROM users WHERE id = 5
Resolve author for post 2: SELECT * FROM users WHERE id = 8
Resolve author for post 3: SELECT * FROM users WHERE id = 5 (duplicate!)
...20 separate queries
With DataLoader:
Collect all author IDs in a single tick: [5, 8, 5, 12, 8, ...]
Deduplicate: [5, 8, 12, ...]
Single batch query: SELECT * FROM users WHERE id IN (5, 8, 12, ...)
Distribute results back to each resolver
Result: 2 queries instead of 21.
DataLoader is essential for any production GraphQL server.
Every major GraphQL framework has a DataLoader implementation.
Subscriptions
Subscriptions enable real-time updates by maintaining a persistent connection (typically WebSocket) between client and server.
Subscription definition:
type Subscription {
messageAdded(channelId: ID!): Message!
orderStatusChanged(orderId: ID!): Order!
}
Client subscribes:
subscription {
messageAdded(channelId: "general") {
id
body
author {
name
}
}
}
Server pushes updates whenever a new message is added to the channel.
Implementation options:
- WebSocket (most common)
- Server-Sent Events (SSE) for server-to-client only
- Long polling (fallback)
Scaling challenges:
- WebSocket connections are stateful and sticky
- Need pub/sub infrastructure (Redis, Kafka) for multi-server
- Connection management and reconnection logic
- Memory overhead per connected client
Pagination in GraphQL
The Relay specification defines a standard pagination approach.
Connection-based pagination (Relay specification):
query {
user(id: "123") {
posts(first: 10, after: "cursor_abc") {
edges {
node {
id
title
}
cursor
}
pageInfo {
hasNextPage
endCursor
}
}
}
}
Why this pattern:
- Cursor-based (stable under insertions/deletions)
- Standardized across all connections in the schema
- Metadata (pageInfo) separate from data (edges)
- Each edge has its own cursor for fine-grained control
When GraphQL Makes Sense
Good Fit
- Multiple client types: Mobile app needs less data than web app. GraphQL lets each client request exactly what it needs without maintaining separate endpoints.
- Deeply nested data: Social graphs, content with comments and reactions, organizational hierarchies. One request fetches the full tree.
- Rapid frontend iteration: Frontend teams can add fields to queries without backend changes, as long as the schema supports it.
- API aggregation: A GraphQL gateway that federates multiple backend services into a unified API.
Real-World Success Stories
Facebook built GraphQL to solve the performance problems of their mobile app. Fetching a news feed required data from dozens of services; REST required dozens of round trips.
GitHub replaced their REST API v3 with GraphQL API v4. Users can query repositories, issues, pull requests, and users in a single request with exactly the fields they need.
Shopify uses GraphQL for their storefront API, allowing merchants and developers to query product data efficiently across diverse storefronts.
Airbnb uses GraphQL as their API layer, particularly for complex listing pages that aggregate data from many services.
When GraphQL Does NOT Make Sense
Poor Fit
- Simple CRUD APIs: If your API is straightforward create/read/update/delete with flat resources, REST is simpler and has better tooling.
- File uploads: GraphQL handles file uploads poorly. Most implementations fall back to REST for uploads.
- Caching: REST caching with HTTP infrastructure (CDNs, browser cache, ETags) works out of the box. GraphQL uses POST for all requests, defeating URL-based caching.
- Small teams with single clients: If one backend serves one frontend, GraphQL's flexibility adds overhead without benefit.
- Public APIs for third parties: REST is universally understood. GraphQL requires learning a new query language.
Signs GraphQL is the wrong choice:
- Your API has 5-10 endpoints with simple request/response
- All clients need the same data shape
- You rely heavily on HTTP caching (CDN, browser cache)
- Your team has no GraphQL experience and tight deadlines
- The API is primarily write-heavy (mutations are verbose in GraphQL)
Security Considerations
Query depth limiting:
Prevent deeply nested queries that could overload the server.
query { user { posts { comments { author { posts { ... } } } } } }
Set a maximum depth (e.g., 10 levels).
Query complexity analysis:
Assign a cost to each field and reject queries exceeding a budget.
Simple field: cost 1
List field: cost 10 * requested items
Reject if total cost > 1000
Rate limiting:
Cannot use simple request counting (one GraphQL request can vary
wildly in cost). Rate limit by query complexity instead.
Introspection:
GraphQL supports schema introspection by default.
Disable in production to prevent exposing your full schema.
Persisted queries:
Client sends a hash of a pre-registered query instead of the
full query text. Prevents arbitrary query execution.
Used by: Apollo, Relay
Performance Optimization
DataLoader:
Batch and deduplicate database queries per request.
Essential, not optional.
Query planning:
Analyze the query before execution to optimize data fetching.
Look ahead at requested fields to avoid unnecessary joins.
Response caching:
Cache full responses for identical queries.
Use persisted query IDs as cache keys.
Automatic persisted queries (APQ):
Client sends query hash first. If server recognizes it,
execute from cache. If not, client sends full query.
Reduces request size and enables server-side optimization.
Schema stitching vs federation:
Schema stitching: Merge multiple GraphQL schemas at the gateway.
Federation (Apollo): Each service owns part of the graph,
gateway composes them. Better for large organizations.
Common Pitfalls
- Ignoring the N+1 problem: Without DataLoader, GraphQL APIs are slower than the REST APIs they replace. This is the number one mistake.
- Over-fetching from the database: Just because the client requests specific fields does not mean your resolvers should be smart about it. Start by fetching full objects and optimize later.
- No query complexity limits: Without limits, a single malicious query can bring down your server.
- Using GraphQL for everything: Not every service needs GraphQL. Internal service-to-service calls are often better served by gRPC.
- Schema design that mirrors the database: Design the schema around client needs, not your database tables. The schema is a product, not a reflection of internals.
- Neglecting error handling: GraphQL returns 200 even for errors. Errors are in the response body. Clients must check the
errorsfield, and monitoring tools must parse response bodies.
Key Takeaways
- GraphQL solves real problems: over-fetching, under-fetching, and multiple round trips. Use it when these problems exist.
- DataLoader is mandatory for production GraphQL servers. Without it, the N+1 problem makes GraphQL slower than REST.
- GraphQL excels when multiple clients with different data needs share a single API, or when data is deeply nested and interconnected.
- GraphQL is a poor fit for simple CRUD, file uploads, heavily cached public APIs, or write-heavy systems.
- Security requires active effort: limit query depth, analyze complexity, disable introspection in production, and consider persisted queries.
- Design your GraphQL schema around client use cases, not database structure. The schema is a contract with your consumers.