Text-to-Speech

Convert text into natural-sounding speech with multiple voice options and voice cloning capabilities.

Try it Now

Test the Text-to-Speech API directly in your browser:

Endpoint

POST /audio/speech

Required scope: tts

Request Body

{
  "input": "Hello, welcome to our API!",
  "model": "fal-ai/f5-tts",
  "voice": "nova",
  "speed": 1.0,
  "response_format": "mp3",
  "reference_audio": null,
  "reference_text": null
}

Parameters

Parameter	Type	Required	Default	Description
`input`	string	Yes	-	Text to synthesize (max 10,000 chars)
`model`	string	No	`fal-ai/f5-tts`	TTS model to use
`voice`	string	No	`nova`	Voice preset or custom voice
`speed`	number	No	`1.0`	Speech speed (0.25-4.0)
`response_format`	string	No	`mp3`	Output format: `mp3`, `wav`, `opus`, `aac`
`reference_audio`	string	No	-	Audio URL for voice cloning
`reference_text`	string	No	-	Transcript of reference audio

Available Models

Model ID	Tier	Features
`fal-ai/f5-tts`	Standard	Voice cloning, natural speech (default)
`fal-ai/maya`	Standard	High-quality synthesis
`fal-ai/chatterbox/text-to-speech/turbo`	Fast	Fast generation
`fal-ai/minimax/speech-2.6-hd`	Premium	HD quality speech
`fal-ai/minimax/speech-2.6-turbo`	Fast	Fast MiniMax
`fal-ai/index-tts-2/text-to-speech`	Standard	Index TTS 2.0
`fal-ai/chatterbox/text-to-speech/multilingual`	Standard	Multi-language support

Voice Presets

Voice	Description
`alloy`	Neutral, balanced
`echo`	Warm, conversational
`fable`	Expressive, storytelling
`onyx`	Deep, authoritative
`nova`	Friendly, clear
`shimmer`	Soft, gentle

Voice presets work best with the default model. For custom voices, use the voice cloning feature with reference_audio.

Response

{
  "created": 1703123456,
  "audio_url": "https://fal.media/files/audio123.mp3",
  "duration_seconds": 3.5,
  "character_count": 35,
  "model": "fal-ai/f5-tts",
  "model_tier": "standard",
  "usage": {
    "chars_used": 35,
    "quota_multiplier": 1.25,
    "chars_remaining": 199965
  }
}

Quota & Pricing

TTS uses the tts_chars quota bucket. Character usage is multiplied by the model tier:

Tier	Multiplier	Example Models
Fast	1x	F5-TTS Fast
Standard	1.25x	F5-TTS, MetaVoice
Premium	1.5x	Voice cloning models

Plan limits:

Plan	TTS Characters/Month
Free	-
Pro	200,000
Business	800,000

Examples

curl -X POST https://www.novakit.ai/api/v1/audio/speech \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Welcome to NovaKit! We are excited to have you here.",
    "voice": "nova",
    "speed": 1.1
  }'

import requests

response = requests.post(
    "https://www.novakit.ai/api/v1/audio/speech",
    headers={
        "Authorization": "Bearer sk_your_api_key",
        "Content-Type": "application/json"
    },
    json={
        "input": "Welcome to NovaKit! We are excited to have you here.",
        "voice": "nova",
        "speed": 1.1
    }
)

audio_url = response.json()["audio_url"]
print(f"Audio: {audio_url}")

# Download the audio file
audio_response = requests.get(audio_url)
with open("output.mp3", "wb") as f:
    f.write(audio_response.content)

const response = await fetch(
  "https://www.novakit.ai/api/v1/audio/speech",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer sk_your_api_key",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      input: "Welcome to NovaKit! We are excited to have you here.",
      voice: "nova",
      speed: 1.1,
    }),
  }
);

const data = await response.json();
console.log(`Audio URL: ${data.audio_url}`);

Voice Cloning

Clone any voice using a reference audio sample with the F5-TTS model:

response = requests.post(
    "https://www.novakit.ai/api/v1/audio/speech",
    headers={"Authorization": "Bearer sk_your_api_key"},
    json={
        "input": "This is my cloned voice speaking.",
        "model": "fal-ai/f5-tts",
        "reference_audio": "https://example.com/voice-sample.mp3",
        "reference_text": "Hello, this is a sample of my voice."
    }
)

For best voice cloning results:

Use a clear audio sample (5-15 seconds)
Provide accurate transcript via reference_text
Avoid background noise in the reference
MP3 or WAV formats work best

Model-Specific Inputs

Pass additional parameters using model_inputs:

{
  "input": "Hello world",
  "model": "fal-ai/metavoice",
  "model_inputs": {
    "guidance_scale": 3.0,
    "top_p": 0.95
  }
}

Error Handling

Common errors you may encounter:

Status	Error	Solution
400	`input is required`	Provide non-empty text
400	`input text exceeds maximum length`	Keep under 10,000 characters
400	`speed must be between 0.25 and 4.0`	Adjust speed value
402	`TTS character limit exceeded`	Upgrade plan or wait for reset
403	`Model tier not allowed`	Upgrade plan for premium models

Use Cases

Use Case	Recommended Settings
Podcast intros	`voice: "echo"`, `speed: 0.95`
Audiobook narration	`voice: "fable"`, `speed: 0.9`
Voice assistants	`voice: "nova"`, `speed: 1.1`
Announcements	`voice: "onyx"`, `speed: 1.0`
Meditation guides	`voice: "shimmer"`, `speed: 0.85`
Custom brand voice	Use voice cloning with your audio

Text-to-Speech

On this page