Rate Limits & Quotas
Understanding rate limits, quotas, and usage tracking in the NovaKit API
Rate Limits & Quotas
NovaKit implements rate limiting and quota systems to ensure fair usage and maintain service quality for all users.
Rate Limits by Plan
Rate limits are applied per API key and vary by subscription plan:
| Plan | Requests/Min | Max Burst |
|---|---|---|
| Free | 30 | 30 |
| Pro Monthly | 60 | 120 |
| Business Monthly | 120 | 300 |
| CLI Pro | 120 | 300 |
| CLI Team | 300 | 600 |
Endpoint-Specific Limits
Different API endpoints have specific rate limits:
| Endpoint | Limit | Window |
|---|---|---|
| Chat Completions | 30 req | per minute |
| Image Generation | 10 req | per minute |
| Image Editing | 10 req | per minute |
| Video Generation | 5 req | per hour |
| Text-to-Speech | 20 req | per minute |
| Speech-to-Text | 10 req | per minute |
| Music Generation | 10 req | per hour |
Rate limits are enforced using a sliding window algorithm. Limits reset continuously, not at fixed intervals.
Rate Limit Headers
All API responses include headers to help you track your rate limit status:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1703123456| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests allowed in the current window |
X-RateLimit-Remaining | Remaining requests in the current window |
X-RateLimit-Reset | Unix timestamp when the window resets |
Handling Rate Limits
When you exceed the rate limit, you'll receive a 429 Too Many Requests response:
{
"error": "Rate limit exceeded",
"retryAfter": 30
}The Retry-After header indicates how many seconds to wait:
HTTP/1.1 429 Too Many Requests
Retry-After: 30Exponential Backoff Example
import time
import requests
def make_request_with_retry(url, headers, data, max_retries=5):
for attempt in range(max_retries):
response = requests.post(url, headers=headers, json=data)
if response.status_code == 429:
retry_after = int(response.headers.get('Retry-After', 2 ** attempt))
print(f"Rate limited. Retrying in {retry_after}s...")
time.sleep(retry_after)
continue
return response
raise Exception("Max retries exceeded")Usage Quotas
Your account has usage quotas based on your subscription plan. Each resource type has its own quota bucket.
1 credit = 1,000 tokens (input + output combined)
| Resource | Free | Pro | Business | CLI Pro | CLI Team |
|---|---|---|---|---|---|
| Credits | 100 | 200 | 800 | 2,000 | 10,000 |
| Images | - | 150 | 600 | - | - |
| Image edits | - | 75 | 300 | - | - |
| Video (min) | - | 15 | 60 | - | - |
| STT (min) | - | 60 | 300 | - | - |
| TTS (chars) | - | 200K | 800K | - | - |
| Music tracks | - | 20 | 80 | - | - |
See the Credits & Quotas guide for detailed information.
Checking Your Quota
Use the /quota endpoint to check your current usage:
curl https://www.novakit.ai/api/v1/quota \
-H "Authorization: Bearer sk_your_api_key"Response:
{
"org_id": "org_abc123",
"plan": {
"code": "pro_monthly",
"name": "Pro Monthly",
"kind": "recurring",
"entitled": true,
"period_end": "2025-02-01T00:00:00Z"
},
"quotas": {
"credits": {
"remaining": 170,
"limit": 200,
"used": 30,
"usage_percent": 15
},
"image_generations": {
"remaining": 138,
"limit": 150,
"used": 12,
"usage_percent": 8
},
"video_seconds": {
"remaining": 810,
"limit": 900,
"used": 90,
"usage_percent": 10
}
}
}Quota Exceeded Errors
When you exceed a quota, you'll receive a 402 Payment Required response:
{
"error": "Quota exceeded for image_generations"
}Model Tier Multipliers
Different model tiers consume quota at different rates:
| Tier | Multiplier | Example Models |
|---|---|---|
| Basic | 1x | GPT-4o Mini, Claude Haiku |
| Standard | 1.5-2x | GPT-4o, Claude Sonnet |
| Powerful | 2-3x | GPT-4 Turbo, Claude Opus |
For example, using a "Powerful" tier model for image generation might consume 2-3x the quota of a "Basic" tier model.
Usage Tracking
Every API response includes usage information:
{
"choices": [...],
"usage": {
"prompt_tokens": 150,
"completion_tokens": 200,
"total_tokens": 350
},
"model": "openai/gpt-4o-mini",
"model_tier": "basic"
}For image/video generation:
{
"data": [...],
"usage": {
"generations_used": 1,
"quota_multiplier": 1.5,
"generations_remaining": 919
}
}Best Practices
1. Monitor Your Usage
Check the /quota endpoint regularly and set up alerts before hitting limits.
2. Implement Caching
Cache responses when appropriate to reduce API calls:
from functools import lru_cache
@lru_cache(maxsize=100)
def get_cached_response(prompt_hash):
return make_api_call(prompt_hash)3. Use Streaming for Chat
Streaming doesn't reduce token usage, but provides better UX for long responses.
4. Batch Requests When Possible
Some endpoints support batch parameters (e.g., n for image generation).
5. Choose Appropriate Models
Use faster, cheaper models for simple tasks:
- Quick tasks: GPT-4o Mini, Claude Haiku
- Complex tasks: GPT-4o, Claude Sonnet
- Critical tasks: GPT-4 Turbo, Claude Opus
Async Mode for Heavy Operations
For resource-intensive operations, use async mode to avoid timeouts:
# Start async job
curl -X POST "https://www.novakit.ai/api/v1/videos/generations?async=true" \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{"prompt": "A sunset over the ocean"}'
# Response
{
"id": "job_abc123",
"status": "pending",
"created": 1703123456
}
# Poll for completion (with long-polling)
curl "https://www.novakit.ai/api/v1/jobs/job_abc123?poll=true" \
-H "Authorization: Bearer sk_your_api_key"The poll=true parameter enables long-polling, which waits up to 30 seconds for the job to complete before returning.
Increasing Your Limits
If you need higher rate limits or quotas:
- Upgrade your plan - Higher plans include more generous limits
- Contact sales - For enterprise needs, we offer custom limits
- Optimize usage - Use caching and efficient patterns to maximize your allocation