Rate Limits & Quotas

NovaKit implements rate limiting and quota systems to ensure fair usage and maintain service quality for all users.

Rate Limits by Plan

Rate limits are applied per API key and vary by subscription plan:

Plan	Requests/Min	Max Burst
Free	30	30
Pro Monthly	60	120
Business Monthly	120	300
CLI Pro	120	300
CLI Team	300	600

Endpoint-Specific Limits

Different API endpoints have specific rate limits:

Endpoint	Limit	Window
Chat Completions	30 req	per minute
Image Generation	10 req	per minute
Image Editing	10 req	per minute
Video Generation	5 req	per hour
Text-to-Speech	20 req	per minute
Speech-to-Text	10 req	per minute
Music Generation	10 req	per hour

Rate limits are enforced using a sliding window algorithm. Limits reset continuously, not at fixed intervals.

Rate Limit Headers

All API responses include headers to help you track your rate limit status:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1703123456

Header	Description
`X-RateLimit-Limit`	Maximum requests allowed in the current window
`X-RateLimit-Remaining`	Remaining requests in the current window
`X-RateLimit-Reset`	Unix timestamp when the window resets

Handling Rate Limits

When you exceed the rate limit, you'll receive a 429 Too Many Requests response:

{
  "error": "Rate limit exceeded",
  "retryAfter": 30
}

The Retry-After header indicates how many seconds to wait:

HTTP/1.1 429 Too Many Requests
Retry-After: 30

Exponential Backoff Example

import time
import requests

def make_request_with_retry(url, headers, data, max_retries=5):
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, json=data)

        if response.status_code == 429:
            retry_after = int(response.headers.get('Retry-After', 2 ** attempt))
            print(f"Rate limited. Retrying in {retry_after}s...")
            time.sleep(retry_after)
            continue

        return response

    raise Exception("Max retries exceeded")

Usage Quotas

Your account has usage quotas based on your subscription plan. Each resource type has its own quota bucket.

1 credit = 1,000 tokens (input + output combined)

Resource	Free	Pro	Business	CLI Pro	CLI Team
Credits	100	200	800	2,000	10,000
Images	-	150	600	-	-
Image edits	-	75	300	-	-
Video (min)	-	15	60	-	-
STT (min)	-	60	300	-	-
TTS (chars)	-	200K	800K	-	-
Music tracks	-	20	80	-	-

See the Credits & Quotas guide for detailed information.

Checking Your Quota

Use the /quota endpoint to check your current usage:

curl https://www.novakit.ai/api/v1/quota \
  -H "Authorization: Bearer sk_your_api_key"

Response:

{
  "org_id": "org_abc123",
  "plan": {
    "code": "pro_monthly",
    "name": "Pro Monthly",
    "kind": "recurring",
    "entitled": true,
    "period_end": "2025-02-01T00:00:00Z"
  },
  "quotas": {
    "credits": {
      "remaining": 170,
      "limit": 200,
      "used": 30,
      "usage_percent": 15
    },
    "image_generations": {
      "remaining": 138,
      "limit": 150,
      "used": 12,
      "usage_percent": 8
    },
    "video_seconds": {
      "remaining": 810,
      "limit": 900,
      "used": 90,
      "usage_percent": 10
    }
  }
}

Quota Exceeded Errors

When you exceed a quota, you'll receive a 402 Payment Required response:

{
  "error": "Quota exceeded for image_generations"
}

Model Tier Multipliers

Different model tiers consume quota at different rates:

Tier	Multiplier	Example Models
Basic	1x	GPT-4o Mini, Claude Haiku
Standard	1.5-2x	GPT-4o, Claude Sonnet
Powerful	2-3x	GPT-4 Turbo, Claude Opus

For example, using a "Powerful" tier model for image generation might consume 2-3x the quota of a "Basic" tier model.

Usage Tracking

Every API response includes usage information:

{
  "choices": [...],
  "usage": {
    "prompt_tokens": 150,
    "completion_tokens": 200,
    "total_tokens": 350
  },
  "model": "openai/gpt-4o-mini",
  "model_tier": "basic"
}

For image/video generation:

{
  "data": [...],
  "usage": {
    "generations_used": 1,
    "quota_multiplier": 1.5,
    "generations_remaining": 919
  }
}

Best Practices

1. Monitor Your Usage

Check the /quota endpoint regularly and set up alerts before hitting limits.

2. Implement Caching

Cache responses when appropriate to reduce API calls:

from functools import lru_cache

@lru_cache(maxsize=100)
def get_cached_response(prompt_hash):
    return make_api_call(prompt_hash)

3. Use Streaming for Chat

Streaming doesn't reduce token usage, but provides better UX for long responses.

4. Batch Requests When Possible

Some endpoints support batch parameters (e.g., n for image generation).

5. Choose Appropriate Models

Use faster, cheaper models for simple tasks:

Quick tasks: GPT-4o Mini, Claude Haiku
Complex tasks: GPT-4o, Claude Sonnet
Critical tasks: GPT-4 Turbo, Claude Opus

Async Mode for Heavy Operations

For resource-intensive operations, use async mode to avoid timeouts:

# Start async job
curl -X POST "https://www.novakit.ai/api/v1/videos/generations?async=true" \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A sunset over the ocean"}'

# Response
{
  "id": "job_abc123",
  "status": "pending",
  "created": 1703123456
}

# Poll for completion (with long-polling)
curl "https://www.novakit.ai/api/v1/jobs/job_abc123?poll=true" \
  -H "Authorization: Bearer sk_your_api_key"

The poll=true parameter enables long-polling, which waits up to 30 seconds for the job to complete before returning.

Increasing Your Limits

If you need higher rate limits or quotas:

Upgrade your plan - Higher plans include more generous limits
Contact sales - For enterprise needs, we offer custom limits
Optimize usage - Use caching and efficient patterns to maximize your allocation

Rate Limits & Quotas

On this page