Rate Limiting

The quota

The API enforces a per-user request quota over a sliding window. The current production setting is 240 requests per 60 seconds, with a short-term burst cap of 20 requests per 1 second enforced alongside it. Both must have headroom for a request to be allowed; either being empty produces a 429. The limiter is global across all endpoints: a request to Order Flow consumes the same bucket as a request to Net Drift.

Quota facts

240 requests per 60 seconds, sliding window.
20 requests per 1second burst cap, enforced alongside the long-term quota. Stops a caller from spending the full minute's allowance inside a single second.
Counted per user, not per API key. Multiple keys on the same account share one bucket.
Every endpoint draws from the same bucket. There is no per-endpoint quota.
The values above are the current production setting and are subject to change. Read X-RateLimit-Limit at runtime rather than hard-coding the cap in the client.

Response headers

Every response carries three X-RateLimit-*headers describing the user's live quota state. Watching them lets a client pace itself proactively rather than waiting to be rejected.

X-RateLimit headers

X-RateLimit-Limit: the request quota for the configured window.
X-RateLimit-Remaining: how many requests are left after this one, before the next rejection.
X-RateLimit-Reset: seconds until the bucket is fully refilled.
Headers ride on every response, both allowed and denied. Watching them lets a client pace itself without ever hitting a 429.

Example · allowed request

200 OK · 23 requests used

HTTP/1.1 200 OK
Content-Type: application/json
X-RateLimit-Limit: 240
X-RateLimit-Remaining: 217
X-RateLimit-Reset: 47

{ "data": { /* ... */ } }

The 429 response

When either bandwidth is empty, the request is rejected with 429 Too Many Requests. The body is the standard RFC 9457 problem detail with five extension fields, plus the Retry-After header per RFC 7231 so clients have multiple ways to read the same number.

429 Too Many Requests

HTTP/1.1 429 Too Many Requests
Content-Type: application/problem+json
X-RateLimit-Limit: 240
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 55
Retry-After: 1

{
  "type": "https://quantdata.us/errors/rate-limit-exceeded",
  "title": "Rate Limit Exceeded",
  "status": 429,
  "detail": "Rate limit exceeded. Retry in 1 seconds.",
  "instance": "/v1/options/tool/gainers-losers",
  "limit": 240,
  "windowSeconds": 60,
  "burstLimit": 20,
  "burstWindowSeconds": 1,
  "retryAfterSeconds": 1
}

retryAfterSeconds in the body is the same value as the Retry-Afterheader: two spellings of "wait this long". X-RateLimit-Remaining on a denied response is 0; the limit and reset headers are still populated so a client can keep its quota model current without parsing the body.

The detail message is intentionally neutral about which bandwidth fired; instead, the body publishes both limit pairs (limit / windowSeconds for the long-term cap; burstLimit / burstWindowSeconds for the burst) so a client can infer the cause from Retry-After. A Retry-After of 1 almost always means the burst cap; larger values mean the long-term cap is exhausted.

Backing off

A clean backoff is two rules: sleep at least Retry-After when you get a 429, and consider slowing down before the quota runs out.

React to 429

On a 429, sleep at least Retry-After seconds (or retryAfterSeconds from the body) before retrying. Add a small random jitter (50–500ms) if you have multiple workers paused on the same response, otherwise they all retry in lockstep and re-trip the limiter.

Pace from the headers

When X-RateLimit-Remaining drops near zero, sleep X-RateLimit-Reset / Remaining seconds between requests. That stretches the remaining quota across the window and avoids the cliff entirely.

Don't retry forever

Two or three consecutive 429s usually means the workload exceeds the quota for this account. Cap retries and surface the error rather than spinning. Contact us if you need a higher cap.

Pseudocode

Backoff sketch

# Read the cap once at process start.
limit = response.headers["X-RateLimit-Limit"]  # 240

for request in workload:
    response = post(request)

    if response.status_code == 429:
        # Sleep at least Retry-After seconds, plus a small jitter to avoid
        # thundering-herd if multiple workers all paused at the same time.
        sleep_seconds = int(response.headers["Retry-After"]) + random.random()
        time.sleep(sleep_seconds)
        response = post(request)  # retry once after the sleep

    # Optional proactive pacing: if Remaining gets low, slow down before
    # the limiter rejects us in the first place.
    remaining = int(response.headers["X-RateLimit-Remaining"])
    reset     = int(response.headers["X-RateLimit-Reset"])
    if remaining < 10:
        time.sleep(reset / max(remaining, 1))

Shared across endpoints

The bucket is a single counter per user. A request to /v1/options/tool/order-flow/consolidated consumes the same token a request to /v1/equities/tool/equity-prints would have. There is no way to save quota for paginated endpoints, or vice-versa.

This means heavy workloads that mix endpoint families should plan their pacing against the total request volume, not per-endpoint volume. A walk of an Order Flow result set that needs 30 pages, plus an Equity Print walk that needs 20, plus a few Heat Map calls is one 53-request workload against the 240-per-60s ceiling.

Where to go next

Errors

The full RFC 9457 problem shape and every error category.

Authentication

Bearer scheme and the six authentication-error responses.

Pagination

Cursor pagination; page through a result set without burning more quota than you need to.

Projection

Trim response payloads to just the fields you need; smaller payloads keep latency low.

Field Reference

Every filterable field on the options and equities surfaces.

Browse Endpoints

Browse every endpoint, grouped by surface.

Quickstart

Send a first authenticated request, then practice pacing the headers.