Rate Limiting
The quota
The API enforces a per-user request quota over a sliding window. The current production setting is 240 requests per 60 seconds. The limiter is global across all endpoints: a request to Order Flow consumes the same bucket as a request to Net Drift.
240requests per60seconds, sliding window.- Counted per user, not per API key. Multiple keys on the same account share one bucket.
- Every endpoint draws from the same bucket. There is no per-endpoint quota.
- The values above are the current production setting and are subject to change. Read
X-RateLimit-Limitat runtime rather than hard-coding the cap in the client.
Response headers
Every response carries three X-RateLimit-*headers describing the user's live quota state. Watching them lets a client pace itself proactively rather than waiting to be rejected.
X-RateLimit-Limit: the request quota for the configured window.X-RateLimit-Remaining: how many requests are left after this one, before the next rejection.X-RateLimit-Reset: seconds until the bucket is fully refilled.- Headers ride on every response, both allowed and denied. Watching them lets a client pace itself without ever hitting a
429.
Example · allowed request
HTTP/1.1 200 OK
Content-Type: application/json
X-RateLimit-Limit: 240
X-RateLimit-Remaining: 217
X-RateLimit-Reset: 47
{ "data": { /* ... */ } }The 429 response
When the bucket is empty, the request is rejected with 429 Too Many Requests. The body is the standard RFC 9457 problem detail with three extension fields, plus the Retry-After header per RFC 7231 so clients have multiple ways to read the same number.
HTTP/1.1 429 Too Many Requests
Content-Type: application/problem+json
X-RateLimit-Limit: 240
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 47
Retry-After: 3
{
"type": "https://quantdata.us/errors/rate-limit-exceeded",
"title": "Rate Limit Exceeded",
"status": 429,
"detail": "Rate limit of 240 requests per 60 seconds exceeded. Retry in 3 seconds.",
"instance": "/v1/options/tool/gainers-losers",
"limit": 240,
"windowSeconds": 60,
"retryAfterSeconds": 3
}retryAfterSeconds in the body is the same value as the Retry-Afterheader: two spellings of "wait this long". X-RateLimit-Remaining on a denied response is 0; the limit and reset headers are still populated so a client can keep its quota model current without parsing the body.
Backing off
A clean backoff is two rules: sleep at least Retry-After when you get a 429, and consider slowing down before the quota runs out.
429, sleep at least Retry-After seconds (or retryAfterSeconds from the body) before retrying. Add a small random jitter (50–500ms) if you have multiple workers paused on the same response, otherwise they all retry in lockstep and re-trip the limiter.X-RateLimit-Remaining drops near zero, sleep X-RateLimit-Reset / Remaining seconds between requests. That stretches the remaining quota across the window and avoids the cliff entirely.429s usually means the workload exceeds the quota for this account. Cap retries and surface the error rather than spinning. Contact us if you need a higher cap.Pseudocode
# Read the cap once at process start. limit = response.headers["X-RateLimit-Limit"] # 240 for request in workload: response = post(request) if response.status_code == 429: # Sleep at least Retry-After seconds, plus a small jitter to avoid # thundering-herd if multiple workers all paused at the same time. sleep_seconds = int(response.headers["Retry-After"]) + random.random() time.sleep(sleep_seconds) response = post(request) # retry once after the sleep # Optional proactive pacing: if Remaining gets low, slow down before # the limiter rejects us in the first place. remaining = int(response.headers["X-RateLimit-Remaining"]) reset = int(response.headers["X-RateLimit-Reset"]) if remaining < 10: time.sleep(reset / max(remaining, 1))
Shared across endpoints
The bucket is a single counter per user. A request to /v1/options/tool/order-flow/consolidated consumes the same token a request to /v1/equities/tool/equity-prints would have. There is no way to save quota for paginated endpoints, or vice-versa.
This means heavy workloads that mix endpoint families should plan their pacing against the total request volume, not per-endpoint volume. A walk of an Order Flow result set that needs 30 pages, plus an Equity Print walk that needs 20, plus a few Heat Map calls is one 53-request workload against the 240-per-60s ceiling.