Text-to-Speech
Convert text to natural-sounding speech with voice cloning
Text-to-Speech
Convert text into natural-sounding speech with multiple voice options and voice cloning capabilities.
Try it Now
Test the Text-to-Speech API directly in your browser:
Endpoint
POST /audio/speechRequired scope: tts
Request Body
{
"input": "Hello, welcome to our API!",
"model": "fal-ai/f5-tts",
"voice": "nova",
"speed": 1.0,
"response_format": "mp3",
"reference_audio": null,
"reference_text": null
}Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
input | string | Yes | - | Text to synthesize (max 10,000 chars) |
model | string | No | fal-ai/f5-tts | TTS model to use |
voice | string | No | nova | Voice preset or custom voice |
speed | number | No | 1.0 | Speech speed (0.25-4.0) |
response_format | string | No | mp3 | Output format: mp3, wav, opus, aac |
reference_audio | string | No | - | Audio URL for voice cloning |
reference_text | string | No | - | Transcript of reference audio |
Available Models
| Model ID | Tier | Features |
|---|---|---|
fal-ai/f5-tts | Standard | Voice cloning, natural speech (default) |
fal-ai/maya | Standard | High-quality synthesis |
fal-ai/chatterbox/text-to-speech/turbo | Fast | Fast generation |
fal-ai/minimax/speech-2.6-hd | Premium | HD quality speech |
fal-ai/minimax/speech-2.6-turbo | Fast | Fast MiniMax |
fal-ai/index-tts-2/text-to-speech | Standard | Index TTS 2.0 |
fal-ai/chatterbox/text-to-speech/multilingual | Standard | Multi-language support |
Voice Presets
| Voice | Description |
|---|---|
alloy | Neutral, balanced |
echo | Warm, conversational |
fable | Expressive, storytelling |
onyx | Deep, authoritative |
nova | Friendly, clear |
shimmer | Soft, gentle |
Voice presets work best with the default model. For custom voices, use the voice cloning feature with reference_audio.
Response
{
"created": 1703123456,
"audio_url": "https://fal.media/files/audio123.mp3",
"duration_seconds": 3.5,
"character_count": 35,
"model": "fal-ai/f5-tts",
"model_tier": "standard",
"usage": {
"chars_used": 35,
"quota_multiplier": 1.25,
"chars_remaining": 199965
}
}Quota & Pricing
TTS uses the tts_chars quota bucket. Character usage is multiplied by the model tier:
| Tier | Multiplier | Example Models |
|---|---|---|
| Fast | 1x | F5-TTS Fast |
| Standard | 1.25x | F5-TTS, MetaVoice |
| Premium | 1.5x | Voice cloning models |
Plan limits:
| Plan | TTS Characters/Month |
|---|---|
| Free | - |
| Pro | 200,000 |
| Business | 800,000 |
Examples
curl -X POST https://www.novakit.ai/api/v1/audio/speech \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"input": "Welcome to NovaKit! We are excited to have you here.",
"voice": "nova",
"speed": 1.1
}'import requests
response = requests.post(
"https://www.novakit.ai/api/v1/audio/speech",
headers={
"Authorization": "Bearer sk_your_api_key",
"Content-Type": "application/json"
},
json={
"input": "Welcome to NovaKit! We are excited to have you here.",
"voice": "nova",
"speed": 1.1
}
)
audio_url = response.json()["audio_url"]
print(f"Audio: {audio_url}")
# Download the audio file
audio_response = requests.get(audio_url)
with open("output.mp3", "wb") as f:
f.write(audio_response.content)const response = await fetch(
"https://www.novakit.ai/api/v1/audio/speech",
{
method: "POST",
headers: {
"Authorization": "Bearer sk_your_api_key",
"Content-Type": "application/json",
},
body: JSON.stringify({
input: "Welcome to NovaKit! We are excited to have you here.",
voice: "nova",
speed: 1.1,
}),
}
);
const data = await response.json();
console.log(`Audio URL: ${data.audio_url}`);Voice Cloning
Clone any voice using a reference audio sample with the F5-TTS model:
response = requests.post(
"https://www.novakit.ai/api/v1/audio/speech",
headers={"Authorization": "Bearer sk_your_api_key"},
json={
"input": "This is my cloned voice speaking.",
"model": "fal-ai/f5-tts",
"reference_audio": "https://example.com/voice-sample.mp3",
"reference_text": "Hello, this is a sample of my voice."
}
)For best voice cloning results:
- Use a clear audio sample (5-15 seconds)
- Provide accurate transcript via
reference_text - Avoid background noise in the reference
- MP3 or WAV formats work best
Model-Specific Inputs
Pass additional parameters using model_inputs:
{
"input": "Hello world",
"model": "fal-ai/metavoice",
"model_inputs": {
"guidance_scale": 3.0,
"top_p": 0.95
}
}Error Handling
Common errors you may encounter:
| Status | Error | Solution |
|---|---|---|
| 400 | input is required | Provide non-empty text |
| 400 | input text exceeds maximum length | Keep under 10,000 characters |
| 400 | speed must be between 0.25 and 4.0 | Adjust speed value |
| 402 | TTS character limit exceeded | Upgrade plan or wait for reset |
| 403 | Model tier not allowed | Upgrade plan for premium models |
Use Cases
| Use Case | Recommended Settings |
|---|---|
| Podcast intros | voice: "echo", speed: 0.95 |
| Audiobook narration | voice: "fable", speed: 0.9 |
| Voice assistants | voice: "nova", speed: 1.1 |
| Announcements | voice: "onyx", speed: 1.0 |
| Meditation guides | voice: "shimmer", speed: 0.85 |
| Custom brand voice | Use voice cloning with your audio |