NovaKitv1.0

Text-to-Speech

Convert text to natural-sounding speech with voice cloning

Text-to-Speech

Convert text into natural-sounding speech with multiple voice options and voice cloning capabilities.

Try it Now

Test the Text-to-Speech API directly in your browser:

Endpoint

POST /audio/speech

Required scope: tts

Request Body

{
  "input": "Hello, welcome to our API!",
  "model": "fal-ai/f5-tts",
  "voice": "nova",
  "speed": 1.0,
  "response_format": "mp3",
  "reference_audio": null,
  "reference_text": null
}

Parameters

ParameterTypeRequiredDefaultDescription
inputstringYes-Text to synthesize (max 10,000 chars)
modelstringNofal-ai/f5-ttsTTS model to use
voicestringNonovaVoice preset or custom voice
speednumberNo1.0Speech speed (0.25-4.0)
response_formatstringNomp3Output format: mp3, wav, opus, aac
reference_audiostringNo-Audio URL for voice cloning
reference_textstringNo-Transcript of reference audio

Available Models

Model IDTierFeatures
fal-ai/f5-ttsStandardVoice cloning, natural speech (default)
fal-ai/mayaStandardHigh-quality synthesis
fal-ai/chatterbox/text-to-speech/turboFastFast generation
fal-ai/minimax/speech-2.6-hdPremiumHD quality speech
fal-ai/minimax/speech-2.6-turboFastFast MiniMax
fal-ai/index-tts-2/text-to-speechStandardIndex TTS 2.0
fal-ai/chatterbox/text-to-speech/multilingualStandardMulti-language support

Voice Presets

VoiceDescription
alloyNeutral, balanced
echoWarm, conversational
fableExpressive, storytelling
onyxDeep, authoritative
novaFriendly, clear
shimmerSoft, gentle

Voice presets work best with the default model. For custom voices, use the voice cloning feature with reference_audio.

Response

{
  "created": 1703123456,
  "audio_url": "https://fal.media/files/audio123.mp3",
  "duration_seconds": 3.5,
  "character_count": 35,
  "model": "fal-ai/f5-tts",
  "model_tier": "standard",
  "usage": {
    "chars_used": 35,
    "quota_multiplier": 1.25,
    "chars_remaining": 199965
  }
}

Quota & Pricing

TTS uses the tts_chars quota bucket. Character usage is multiplied by the model tier:

TierMultiplierExample Models
Fast1xF5-TTS Fast
Standard1.25xF5-TTS, MetaVoice
Premium1.5xVoice cloning models

Plan limits:

PlanTTS Characters/Month
Free-
Pro200,000
Business800,000

Examples

curl -X POST https://www.novakit.ai/api/v1/audio/speech \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Welcome to NovaKit! We are excited to have you here.",
    "voice": "nova",
    "speed": 1.1
  }'
import requests

response = requests.post(
    "https://www.novakit.ai/api/v1/audio/speech",
    headers={
        "Authorization": "Bearer sk_your_api_key",
        "Content-Type": "application/json"
    },
    json={
        "input": "Welcome to NovaKit! We are excited to have you here.",
        "voice": "nova",
        "speed": 1.1
    }
)

audio_url = response.json()["audio_url"]
print(f"Audio: {audio_url}")

# Download the audio file
audio_response = requests.get(audio_url)
with open("output.mp3", "wb") as f:
    f.write(audio_response.content)
const response = await fetch(
  "https://www.novakit.ai/api/v1/audio/speech",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer sk_your_api_key",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      input: "Welcome to NovaKit! We are excited to have you here.",
      voice: "nova",
      speed: 1.1,
    }),
  }
);

const data = await response.json();
console.log(`Audio URL: ${data.audio_url}`);

Voice Cloning

Clone any voice using a reference audio sample with the F5-TTS model:

response = requests.post(
    "https://www.novakit.ai/api/v1/audio/speech",
    headers={"Authorization": "Bearer sk_your_api_key"},
    json={
        "input": "This is my cloned voice speaking.",
        "model": "fal-ai/f5-tts",
        "reference_audio": "https://example.com/voice-sample.mp3",
        "reference_text": "Hello, this is a sample of my voice."
    }
)

For best voice cloning results:

  • Use a clear audio sample (5-15 seconds)
  • Provide accurate transcript via reference_text
  • Avoid background noise in the reference
  • MP3 or WAV formats work best

Model-Specific Inputs

Pass additional parameters using model_inputs:

{
  "input": "Hello world",
  "model": "fal-ai/metavoice",
  "model_inputs": {
    "guidance_scale": 3.0,
    "top_p": 0.95
  }
}

Error Handling

Common errors you may encounter:

StatusErrorSolution
400input is requiredProvide non-empty text
400input text exceeds maximum lengthKeep under 10,000 characters
400speed must be between 0.25 and 4.0Adjust speed value
402TTS character limit exceededUpgrade plan or wait for reset
403Model tier not allowedUpgrade plan for premium models

Use Cases

Use CaseRecommended Settings
Podcast introsvoice: "echo", speed: 0.95
Audiobook narrationvoice: "fable", speed: 0.9
Voice assistantsvoice: "nova", speed: 1.1
Announcementsvoice: "onyx", speed: 1.0
Meditation guidesvoice: "shimmer", speed: 0.85
Custom brand voiceUse voice cloning with your audio

On this page