Filtering, Sorting & Search

Letting Clients Get What They Need

Most API consumers do not need every record in a collection. Filtering, sorting, and search let clients narrow results to exactly what they need, reducing bandwidth, improving response times, and simplifying client-side logic.

Filtering

Filtering restricts the result set based on field values. The simplest and most common pattern uses query parameters.

Basic Field Filtering

GET /api/v1/orders?status=active
GET /api/v1/orders?status=active&currency=usd
GET /api/v1/users?role=admin

Each query parameter maps to a field on the resource. The server applies an equality check: return only records where the field matches the value.

Range Filters

For numeric and date fields, equality is rarely enough. Support range operators with suffixed parameter names:

GET /api/v1/orders?created_after=2024-01-01&created_before=2024-03-31
GET /api/v1/products?price_min=10&price_max=50
GET /api/v1/users?age_gte=18&age_lte=65

Common suffix conventions:

_gt    greater than
_gte   greater than or equal
_lt    less than
_lte   less than or equal
_after alias for _gt on date fields
_before alias for _lt on date fields

Stripe uses this pattern for date filtering:

GET /v1/charges?created[gte]=1609459200&created[lte]=1640995199

Multiple Values (IN Filter)

Allow comma-separated values for "any of" queries:

GET /api/v1/orders?status=pending,processing,shipped

The server interprets this as: return orders where status is pending OR processing OR shipped.

Null & Existence Checks

GET /api/v1/users?email=null          (users with no email)
GET /api/v1/users?email=!null         (users with an email set)
GET /api/v1/orders?coupon_code=exists  (orders that used a coupon)

Boolean Filters

GET /api/v1/users?verified=true
GET /api/v1/products?in_stock=false

Real-World Example: GitHub Issues API

GitHub provides rich filtering on its issues endpoint:

GET /repos/octocat/hello-world/issues?state=open&labels=bug,urgent&assignee=octocat&since=2024-01-01T00:00:00Z&sort=created&direction=desc

This returns open issues labeled "bug" or "urgent," assigned to octocat, created after January 2024, sorted by creation date descending.

Sorting

Sorting controls the order of results. Most APIs support sorting by one or more fields.

Single-Field Sorting

GET /api/v1/users?sort=created_at&order=desc
GET /api/v1/products?sort=price&order=asc

Alternative Sort Syntax

Some APIs use a more compact syntax with a prefix for direction:

GET /api/v1/users?sort=-created_at        (descending)
GET /api/v1/users?sort=name               (ascending, default)

The - prefix for descending is used by JSON:API and several other API standards.

Multi-Field Sorting

Sort by multiple fields to break ties:

GET /api/v1/orders?sort=status,-created_at

This sorts by status ascending first, then by creation date descending within each status group.

Sort Stability

Always include a unique field (like id) as the final sort key, even if the client does not request it. This ensures consistent ordering when the primary sort key has duplicate values — critical for reliable pagination.

Client requests:  sort=created_at
Server applies:   ORDER BY created_at DESC, id DESC

Allowed Sort Fields

Not every field should be sortable. Sorting requires database indexes, and sorting by unindexed fields on large tables causes slow queries. Explicitly whitelist sortable fields and reject unknown ones:

{
  "error": {
    "type": "invalid_parameter",
    "status": 400,
    "message": "Cannot sort by 'biography'. Allowed sort fields: created_at, name, email, updated_at."
  }
}

Search

Search is fundamentally different from filtering. Filtering matches exact or range values on specific fields. Search matches text across one or more fields, often with relevance ranking.

Simple Text Search

GET /api/v1/users?q=john

The q parameter is the conventional name for search queries. The server searches across relevant text fields (name, email, bio) and returns matches ranked by relevance.

Search vs Filter

They can be combined:

GET /api/v1/users?q=john&role=admin&sort=-created_at

This searches for "john" among admin users, sorted by creation date descending. The search narrows the result set first, then filtering and sorting apply.

Full-Text Search Considerations

Simple LIKE '%john%' queries work for small datasets but do not scale. For production search:

Use a dedicated search engine (Elasticsearch, Typesense, Meilisearch)
Support quoted phrases: q="john smith"
Handle typos and fuzzy matching
Return relevance scores if useful to the client

GitHub Code Search

GitHub's code search API demonstrates a sophisticated search interface:

GET /search/code?q=addClass+in:file+language:js+repo:jquery/jquery

This searches for "addClass" in files, filtered to JavaScript, in the jQuery repository. The search syntax is powerful but specific to GitHub's use case.

Field Selection (Sparse Fieldsets)

Let clients request only the fields they need. This reduces payload size and database load.

Basic Field Selection

GET /api/v1/users?fields=id,name,email

{
  "data": [
    {"id": "user_123", "name": "Jane Smith", "email": "jane@example.com"},
    {"id": "user_456", "name": "John Doe", "email": "john@example.com"}
  ]
}

Without field selection, the response might include 20 fields per user. With it, the client gets only what it needs.

Google APIs Approach

Google uses the fields parameter with dot notation for nested objects:

GET /gmail/v1/users/me/messages?fields=messages(id,snippet,labelIds)

JSON:API Sparse Fieldsets

JSON:API uses a per-type syntax:

GET /api/v1/articles?fields[articles]=title,body&fields[author]=name

Performance Benefits

Field selection is not just about bandwidth. If the server maps selected fields to the SQL query, it can avoid expensive joins or computed fields:

fields=id,name       -> SELECT id, name FROM users (fast)
fields=id,name,stats -> SELECT id, name, ... + join to stats table (slower)

Nested Filtering

For APIs with related resources, allow filtering on nested object properties:

GET /api/v1/orders?customer.country=US
GET /api/v1/articles?author.name=John

Implementation Considerations

Nested filtering requires joins, which adds complexity:

GET /api/v1/orders?customer.country=US

SQL: SELECT orders.* FROM orders
     JOIN customers ON orders.customer_id = customers.id
     WHERE customers.country = 'US'

Limit nesting to one level deep. Deeply nested filters (?order.customer.address.city=London) create complex queries and make the API harder to understand.

Keep It Simple

The temptation is to build a query language. Resist it unless you are building a query API.

The Slippery Slope

Level 1: ?status=active                        (good)
Level 2: ?status=active&created_after=2024-01  (good)
Level 3: ?filter[status][eq]=active             (getting complex)
Level 4: ?filter=status eq 'active' and (price gt 10 or category in ('a','b'))  (OData)
Level 5: Custom query DSL with nested boolean logic  (you built a database)

Most APIs should stop at Level 2. If clients need Level 4+ query capabilities, consider GraphQL or a dedicated query endpoint.

When Complex Filtering Is Justified

Analytics APIs where ad-hoc querying is the core use case
Search-as-a-service APIs (Algolia, Elasticsearch)
Data warehouse APIs where users build custom reports

For these, a structured filter object in a POST request body is cleaner than overloading query parameters.

Common Pitfalls

Allowing filters on unindexed fields. Filtering on a column without an index causes a full table scan. Only expose filters for indexed fields, and document which fields are filterable.

Silently ignoring unknown filter parameters. If a client sends ?stauts=active (typo), silently ignoring it returns all records. Validate parameter names and return 400 for unknown ones.

Not validating filter values. ?price_min=abc should return a 400 error, not a database error or empty results. Validate types before querying.

Building a query language when you do not need one. Simple equality and range filters cover 90% of use cases. Adding boolean operators, nested conditions, and custom syntax increases complexity for both the server and the client.

Returning all fields by default. If your resource has 50 fields including expensive computed ones, returning everything by default wastes bandwidth and server resources. Consider a reasonable default field set, with opt-in for additional fields.

Making search case-sensitive. Users expect ?q=john to find "John." Use case-insensitive search by default.

Key Takeaways

Use query parameters for filtering with simple equality, range, and multi-value patterns; follow conventions like Stripe's bracket syntax or suffix-based operators.
Support sorting with explicit ascending/descending direction; always add a unique tiebreaker for stable pagination.
Keep search separate from filtering; use the q parameter for text search and dedicated search infrastructure for scale.
Offer field selection to reduce payload size and server load; map selected fields to the database query when possible.
Keep the query interface simple for most APIs; reserve complex filter DSLs for analytics or search-focused APIs where ad-hoc querying is the core use case.