Rate Limiting

The quota

The API enforces a per-user request quota over a sliding window. The current production setting is 240 requests per 60 seconds. The limiter is global across all endpoints: a request to Order Flow consumes the same bucket as a request to Net Drift.

Quota facts
  • 240 requests per 60 seconds, sliding window.
  • Counted per user, not per API key. Multiple keys on the same account share one bucket.
  • Every endpoint draws from the same bucket. There is no per-endpoint quota.
  • The values above are the current production setting and are subject to change. Read X-RateLimit-Limit at runtime rather than hard-coding the cap in the client.

Response headers

Every response carries three X-RateLimit-*headers describing the user's live quota state. Watching them lets a client pace itself proactively rather than waiting to be rejected.

X-RateLimit headers
  • X-RateLimit-Limit: the request quota for the configured window.
  • X-RateLimit-Remaining: how many requests are left after this one, before the next rejection.
  • X-RateLimit-Reset: seconds until the bucket is fully refilled.
  • Headers ride on every response, both allowed and denied. Watching them lets a client pace itself without ever hitting a 429.

Example · allowed request

200 OK · 23 requests used
HTTP/1.1 200 OK
Content-Type: application/json
X-RateLimit-Limit: 240
X-RateLimit-Remaining: 217
X-RateLimit-Reset: 47

{ "data": { /* ... */ } }

The 429 response

When the bucket is empty, the request is rejected with 429 Too Many Requests. The body is the standard RFC 9457 problem detail with three extension fields, plus the Retry-After header per RFC 7231 so clients have multiple ways to read the same number.

429 Too Many Requests
HTTP/1.1 429 Too Many Requests
Content-Type: application/problem+json
X-RateLimit-Limit: 240
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 47
Retry-After: 3

{
  "type": "https://quantdata.us/errors/rate-limit-exceeded",
  "title": "Rate Limit Exceeded",
  "status": 429,
  "detail": "Rate limit of 240 requests per 60 seconds exceeded. Retry in 3 seconds.",
  "instance": "/v1/options/tool/gainers-losers",
  "limit": 240,
  "windowSeconds": 60,
  "retryAfterSeconds": 3
}

retryAfterSeconds in the body is the same value as the Retry-Afterheader: two spellings of "wait this long". X-RateLimit-Remaining on a denied response is 0; the limit and reset headers are still populated so a client can keep its quota model current without parsing the body.

Backing off

A clean backoff is two rules: sleep at least Retry-After when you get a 429, and consider slowing down before the quota runs out.

React to 429
On a 429, sleep at least Retry-After seconds (or retryAfterSeconds from the body) before retrying. Add a small random jitter (50–500ms) if you have multiple workers paused on the same response, otherwise they all retry in lockstep and re-trip the limiter.
Pace from the headers
When X-RateLimit-Remaining drops near zero, sleep X-RateLimit-Reset / Remaining seconds between requests. That stretches the remaining quota across the window and avoids the cliff entirely.
Don't retry forever
Two or three consecutive 429s usually means the workload exceeds the quota for this account. Cap retries and surface the error rather than spinning. Contact us if you need a higher cap.

Pseudocode

Backoff sketch
# Read the cap once at process start.
limit = response.headers["X-RateLimit-Limit"]  # 240

for request in workload:
    response = post(request)

    if response.status_code == 429:
        # Sleep at least Retry-After seconds, plus a small jitter to avoid
        # thundering-herd if multiple workers all paused at the same time.
        sleep_seconds = int(response.headers["Retry-After"]) + random.random()
        time.sleep(sleep_seconds)
        response = post(request)  # retry once after the sleep

    # Optional proactive pacing: if Remaining gets low, slow down before
    # the limiter rejects us in the first place.
    remaining = int(response.headers["X-RateLimit-Remaining"])
    reset     = int(response.headers["X-RateLimit-Reset"])
    if remaining < 10:
        time.sleep(reset / max(remaining, 1))

Shared across endpoints

The bucket is a single counter per user. A request to /v1/options/tool/order-flow/consolidated consumes the same token a request to /v1/equities/tool/equity-prints would have. There is no way to save quota for paginated endpoints, or vice-versa.

This means heavy workloads that mix endpoint families should plan their pacing against the total request volume, not per-endpoint volume. A walk of an Order Flow result set that needs 30 pages, plus an Equity Print walk that needs 20, plus a few Heat Map calls is one 53-request workload against the 240-per-60s ceiling.

Where to go next