NovaKitv1.0

Rate Limits & Quotas

Understanding rate limits, quotas, and usage tracking in the NovaKit API

Rate Limits & Quotas

NovaKit implements rate limiting and quota systems to ensure fair usage and maintain service quality for all users.

Rate Limits by Plan

Rate limits are applied per API key and vary by subscription plan:

PlanRequests/MinMax Burst
Free3030
Pro Monthly60120
Business Monthly120300
CLI Pro120300
CLI Team300600

Endpoint-Specific Limits

Different API endpoints have specific rate limits:

EndpointLimitWindow
Chat Completions30 reqper minute
Image Generation10 reqper minute
Image Editing10 reqper minute
Video Generation5 reqper hour
Text-to-Speech20 reqper minute
Speech-to-Text10 reqper minute
Music Generation10 reqper hour

Rate limits are enforced using a sliding window algorithm. Limits reset continuously, not at fixed intervals.

Rate Limit Headers

All API responses include headers to help you track your rate limit status:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1703123456
HeaderDescription
X-RateLimit-LimitMaximum requests allowed in the current window
X-RateLimit-RemainingRemaining requests in the current window
X-RateLimit-ResetUnix timestamp when the window resets

Handling Rate Limits

When you exceed the rate limit, you'll receive a 429 Too Many Requests response:

{
  "error": "Rate limit exceeded",
  "retryAfter": 30
}

The Retry-After header indicates how many seconds to wait:

HTTP/1.1 429 Too Many Requests
Retry-After: 30

Exponential Backoff Example

import time
import requests

def make_request_with_retry(url, headers, data, max_retries=5):
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, json=data)

        if response.status_code == 429:
            retry_after = int(response.headers.get('Retry-After', 2 ** attempt))
            print(f"Rate limited. Retrying in {retry_after}s...")
            time.sleep(retry_after)
            continue

        return response

    raise Exception("Max retries exceeded")

Usage Quotas

Your account has usage quotas based on your subscription plan. Each resource type has its own quota bucket.

1 credit = 1,000 tokens (input + output combined)

ResourceFreeProBusinessCLI ProCLI Team
Credits1002008002,00010,000
Images-150600--
Image edits-75300--
Video (min)-1560--
STT (min)-60300--
TTS (chars)-200K800K--
Music tracks-2080--

See the Credits & Quotas guide for detailed information.

Checking Your Quota

Use the /quota endpoint to check your current usage:

curl https://www.novakit.ai/api/v1/quota \
  -H "Authorization: Bearer sk_your_api_key"

Response:

{
  "org_id": "org_abc123",
  "plan": {
    "code": "pro_monthly",
    "name": "Pro Monthly",
    "kind": "recurring",
    "entitled": true,
    "period_end": "2025-02-01T00:00:00Z"
  },
  "quotas": {
    "credits": {
      "remaining": 170,
      "limit": 200,
      "used": 30,
      "usage_percent": 15
    },
    "image_generations": {
      "remaining": 138,
      "limit": 150,
      "used": 12,
      "usage_percent": 8
    },
    "video_seconds": {
      "remaining": 810,
      "limit": 900,
      "used": 90,
      "usage_percent": 10
    }
  }
}

Quota Exceeded Errors

When you exceed a quota, you'll receive a 402 Payment Required response:

{
  "error": "Quota exceeded for image_generations"
}

Model Tier Multipliers

Different model tiers consume quota at different rates:

TierMultiplierExample Models
Basic1xGPT-4o Mini, Claude Haiku
Standard1.5-2xGPT-4o, Claude Sonnet
Powerful2-3xGPT-4 Turbo, Claude Opus

For example, using a "Powerful" tier model for image generation might consume 2-3x the quota of a "Basic" tier model.

Usage Tracking

Every API response includes usage information:

{
  "choices": [...],
  "usage": {
    "prompt_tokens": 150,
    "completion_tokens": 200,
    "total_tokens": 350
  },
  "model": "openai/gpt-4o-mini",
  "model_tier": "basic"
}

For image/video generation:

{
  "data": [...],
  "usage": {
    "generations_used": 1,
    "quota_multiplier": 1.5,
    "generations_remaining": 919
  }
}

Best Practices

1. Monitor Your Usage

Check the /quota endpoint regularly and set up alerts before hitting limits.

2. Implement Caching

Cache responses when appropriate to reduce API calls:

from functools import lru_cache

@lru_cache(maxsize=100)
def get_cached_response(prompt_hash):
    return make_api_call(prompt_hash)

3. Use Streaming for Chat

Streaming doesn't reduce token usage, but provides better UX for long responses.

4. Batch Requests When Possible

Some endpoints support batch parameters (e.g., n for image generation).

5. Choose Appropriate Models

Use faster, cheaper models for simple tasks:

  • Quick tasks: GPT-4o Mini, Claude Haiku
  • Complex tasks: GPT-4o, Claude Sonnet
  • Critical tasks: GPT-4 Turbo, Claude Opus

Async Mode for Heavy Operations

For resource-intensive operations, use async mode to avoid timeouts:

# Start async job
curl -X POST "https://www.novakit.ai/api/v1/videos/generations?async=true" \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A sunset over the ocean"}'

# Response
{
  "id": "job_abc123",
  "status": "pending",
  "created": 1703123456
}

# Poll for completion (with long-polling)
curl "https://www.novakit.ai/api/v1/jobs/job_abc123?poll=true" \
  -H "Authorization: Bearer sk_your_api_key"

The poll=true parameter enables long-polling, which waits up to 30 seconds for the job to complete before returning.

Increasing Your Limits

If you need higher rate limits or quotas:

  1. Upgrade your plan - Higher plans include more generous limits
  2. Contact sales - For enterprise needs, we offer custom limits
  3. Optimize usage - Use caching and efficient patterns to maximize your allocation

On this page