GraphQL

What is GraphQL?

GraphQL is a query language for APIs and a runtime for fulfilling those queries. Created by Facebook in 2012 (open-sourced in 2015), it lets clients request exactly the data they need — no more, no less — through a single endpoint.

Unlike REST, where the server decides what data each endpoint returns, GraphQL puts the client in control of the response shape.

The Problem GraphQL Solves

REST APIs often suffer from two inefficiencies:

Over-fetching — An endpoint returns more data than the client needs. A mobile app displaying a user's name and avatar gets the entire user object with 30 fields.

Under-fetching — Building one view requires multiple requests to different endpoints.

# REST: 3 requests to build a user profile page
GET /users/123          -> { id, name, email, avatar_url, ... }
GET /users/123/posts    -> [{ id, title, created_at, ... }, ...]
GET /users/123/followers -> [{ id, name, ... }, ...]

# GraphQL: 1 request
query {
  user(id: 123) {
    name
    avatarUrl
    posts(first: 5) {
      title
      createdAt
    }
    followersCount
  }
}

The client describes the exact shape of the data it needs, and the server returns precisely that — nothing more.

Schema Design

The schema is the contract between client and server. It defines types, queries, mutations, and subscriptions.

type User {
  id: ID!
  name: String!
  email: String!
  posts(first: Int, after: String): PostConnection!
  followersCount: Int!
}

type Post {
  id: ID!
  title: String!
  content: String!
  author: User!
  tags: [String!]!
  createdAt: DateTime!
}

# Relay-style connection for cursor-based pagination
type PostConnection {
  edges: [PostEdge!]!
  pageInfo: PageInfo!
  totalCount: Int!
}

type PostEdge {
  node: Post!
  cursor: String!
}

type PageInfo {
  hasNextPage: Boolean!
  hasPreviousPage: Boolean!
  startCursor: String
  endCursor: String
}

type Query {
  user(id: ID!): User
  posts(filter: PostFilter, first: Int, after: String): PostConnection!
  post(id: ID!): Post
}

input PostFilter {
  authorId: ID
  tag: String
  createdAfter: DateTime
}

type Mutation {
  createPost(input: CreatePostInput!): CreatePostPayload!
  updateUser(id: ID!, input: UpdateUserInput!): UpdateUserPayload!
  deletePost(id: ID!): DeletePostPayload!
}

input CreatePostInput {
  title: String!
  content: String!
  tags: [String!]
}

type CreatePostPayload {
  post: Post
  errors: [UserError!]!
}

type UserError {
  field: String!
  message: String!
}

Schema design principles:

Use ! for non-nullable fields — be intentional about what can be null
Prefer Relay-style connections for paginated lists
Return payload types from mutations (not raw objects) so you can include errors
Use input types for mutation arguments
Design from the client's perspective, not the database schema

Resolvers

Resolvers are functions that populate each field in the schema. The GraphQL runtime calls resolvers to build the response.

# Conceptual resolver structure (Python for clarity)
class Query:
    def user(self, info, id):
        return db.users.find_by_id(id)

class User:
    def posts(self, info, first=10, after=None):
        return db.posts.find_by_author(self.id, first=first, after=after)

    def followers_count(self, info):
        return db.followers.count(user_id=self.id)

Each field can have its own resolver. If no resolver is defined, the runtime uses a default that returns the field of the same name from the parent object.

The N+1 Problem

The most critical performance issue in GraphQL.

query {
  posts(first: 20) {
    edges {
      node {
        title
        author {    # This triggers a separate DB query per post
          name
        }
      }
    }
  }
}

A naive implementation: 1 query for 20 posts + 20 queries for each post's author = 21 queries. If multiple posts share an author, those are redundant queries.

DataLoader

DataLoader solves N+1 by batching and deduplicating requests within a single execution cycle.

# Without DataLoader: 20 individual queries
SELECT * FROM users WHERE id = 1;
SELECT * FROM users WHERE id = 2;
SELECT * FROM users WHERE id = 1;  # duplicate!
...

# With DataLoader: 1 batched query
SELECT * FROM users WHERE id IN (1, 2, 3, 5, 8);

How DataLoader works:

During a single GraphQL execution tick, all author resolvers request a user ID
DataLoader collects all requested IDs
At the end of the tick, it makes one batched query
Results are distributed back to the individual resolvers
Duplicate IDs are deduplicated automatically

DataLoader is available in every major language: dataloader (JS), aiodataloader (Python), async-graphql has built-in DataLoader support (Rust).

Mutations

Mutations are how clients modify data. Design them carefully.

mutation {
  createPost(input: { title: "GraphQL Guide", content: "...", tags: ["api"] }) {
    post {
      id
      title
      createdAt
    }
    errors {
      field
      message
    }
  }
}

Mutation design guidelines:

Name mutations as verb + noun: createPost, updateUser, cancelOrder
Accept a single input argument — easier to evolve
Return a payload type with both the result and potential errors
Make mutations as specific as possible: archivePost instead of generic updatePost(input: { archived: true })

Subscriptions

Subscriptions enable real-time updates over WebSocket connections.

subscription {
  postCreated(authorId: "123") {
    id
    title
    author {
      name
    }
  }
}

The server pushes updates to the client whenever a matching event occurs. Under the hood, this typically uses WebSockets (via graphql-ws protocol) with a pub/sub system (Redis, Kafka) on the server.

When subscriptions work well: Chat, notifications, live dashboards, collaborative editing.

When they don't: High-frequency data (stock tickers) — WebSockets add overhead vs. raw TCP/gRPC streaming.

When GraphQL Beats REST

Scenario	REST	GraphQL
Simple CRUD API	Simple, well-understood	Overkill
Mobile apps with varying data needs	Multiple endpoints or custom ones	Clients request exactly what they need
Public API for external developers	Familiar, easy to cache	Steeper learning curve
Complex, deeply nested data	Multiple round trips	Single query
File uploads	Native multipart support	Awkward (separate endpoint)
Real-time updates	Requires WebSockets separately	Built-in subscription support
Microservices aggregation	API gateway stitching	Federation composes schemas

GraphQL shines when:

Multiple client types (web, iOS, Android) need different data shapes
The data model is deeply relational
Teams want to iterate on the frontend without backend changes
You need to aggregate data from multiple services (via federation)

GraphQL is wrong when:

The API is simple CRUD with few consumers
You need aggressive HTTP caching (GraphQL uses POST, harder to cache)
File upload/download is a primary use case
The team lacks GraphQL experience and the project is time-constrained

GitHub's Dual API Approach

GitHub is the best example of running REST and GraphQL side by side:

REST API (v3) — Stable, well-documented, used for simple operations. Every resource has a URL. Great for scripts, CLI tools, and simple integrations.
GraphQL API (v4) — Used for complex queries. Fetch a repository with its issues, PRs, contributors, and labels in one request. Powers GitHub's own web frontend.

GitHub didn't replace REST with GraphQL — they added GraphQL for use cases where REST was inefficient. REST endpoints still exist, are maintained, and are the recommended starting point for most integrations.

Key lesson: REST and GraphQL are complementary, not competing. Choose based on the use case.

GraphQL Security Considerations

GraphQL's flexibility creates unique security challenges:

Query depth limiting — Prevent deeply nested queries that could cause exponential work: { user { friends { friends { friends { ... } } } } }
Query complexity analysis — Assign costs to fields and reject queries exceeding a budget
Rate limiting by query cost — Not all queries are equal. A simple { user { name } } is cheaper than { users(first: 100) { posts { comments { author } } } }
Introspection in production — Disable schema introspection in production to avoid leaking your full API surface
Persisted queries — In production, only allow pre-approved query hashes. Eliminates arbitrary query execution.

Rust: async-graphql Example

STRUCTURE User:
    id ← ID
    name ← string
    email ← string

OBJECT QueryRoot:

    PROCEDURE USER(context, id):
        db ← GET Database FROM context
        RETURN AWAIT db.FIND_USER(id)

    PROCEDURE POSTS(context, first, after):
        // Relay-style cursor pagination built-in
        NOT YET IMPLEMENTED

schema ← BUILD_SCHEMA(QueryRoot, MutationRoot, SubscriptionRoot)
    SET max depth ← 10
    SET max complexity ← 200
    SET data ← database
    FINISH