Request a tool
All toolsMCP serverRequest a toolPlatformsCategories
AI Text-to-Speech Voiceover icon

AI Text-to-Speech Voiceover

Turn any script into a natural AI voiceover audio file (MP3/WAV/Opus/AAC). Pick a voice, speed, and format. Good for faceless videos, IVR, and narration.

Run this in the cloudRun on Apify →

YouTube & Creator Tools

How it works

  1. 1
    Open it on Apify

    Hit Run on Apify — it opens the tool in the cloud, no install.

  2. 2
    Set the inputs

    Adjust text, texts, voice (sensible defaults are pre-filled).

  3. 3
    Click Run

    The tool runs on Apify’s cloud and collects the data for you.

  4. 4
    Export the results

    Download as JSON, CSV or Excel, or pipe straight into your app, Google Sheets, or an AI agent.

Inputs

FieldWhat it doesType
textThe text to convert to speech. Long scripts are chunked and stitched automatically.string
textsArray of strings OR objects (uses script/scriptText/text/narration). One audio file per item.array
voiceAI voice.string
modeltts-1 (fast) or tts-1-hd (higher quality).string
formatOutput audio format.string
speedPlayback speed 0.25–4.0 (1.0 = normal).string
openaiApiKeyYour OpenAI key (TTS). Kept private.string
baseUrlOpenAI-compatible base URL. Default https://api.openai.com/v1.string

What you get

A structured dataset — each result includes fields like:

_demo_noticeaudioKeyaudioUrlcharacterschunksdurationSecondsformatindexmodeltextPreviewvoice

Export every run as JSON, CSV or Excel, or send it to your app, a database, Google Sheets, or an AI agent.

3 ready-to-run use cases

Faceless YouTube AI Voiceover from a Script

Paste a narration script and get a steady AI voiceover track for faceless YouTube videos, ready to drop under your B-roll. MP3 or WAV, adjustable pace.

IVR Phone Menu Voice Prompts - Text to Speech

Turn "press 1 for sales" menu text into clear IVR voice prompts for your phone system, exported as AAC. Telephony-ready greetings without studio time.

Batch Text to Speech: One Audio File per Line

Got a list of lines? Each one comes back as its own AI voiceover file, ideal for app prompts, UI sounds, and clip sets. Pick the voice, speed, and format.

AI Text-to-Speech Voiceover

Turns a block of text or a full script into a natural-sounding AI voiceover file. Pick a voice, set the speed, and choose MP3, WAV, Opus, or AAC. It's meant for the usual narration jobs: faceless videos, audiobooks, IVR prompts, explainer voiceovers.

How it works

The actor sends your text to an OpenAI-compatible TTS endpoint. Long scripts get split at sentence boundaries into chunks under ~3,500 characters, each chunk is synthesized separately, and the parts are stitched back into one file with ffmpeg (using stream copy, so there's no re-encode and no quality loss). Each finished audio file is saved to the run's key-value store and a row is pushed to the dataset.

Input

Nothing is strictly required by the schema, but in practice you need an openaiApiKey and at least one of text or texts. If neither is provided the run errors out.

FieldRequiredNotes
textone of text/textsThe script to voice, as a single string.
textsone of text/textsBatch mode. Array of strings, or objects keyed by script / scriptText / text / narration. One audio file per item.
voicenoalloy, echo, fable, onyx, nova, shimmer. Default onyx (deep male). nova and shimmer are female.
modelnotts-1 (fast, default) or tts-1-hd (higher quality, costs more on the OpenAI side).
formatnomp3 (default), wav, opus, or aac.
speednoPlayback speed from 0.25 to 4.0. Default 1.0. Values outside that range are clamped.
openaiApiKeyyes in practiceYour OpenAI key, used for the TTS call. Stored as a secret. Falls back to the OPENAI_API_KEY env var if set.
baseUrlnoAdvanced. Point at any OpenAI-compatible /audio/speech endpoint. Defaults to https://api.openai.com/v1.

Output

Each input item produces one audio file in the key-value store and one dataset record. The record includes audioKey and audioUrl (where to fetch the file), durationSeconds, characters, chunks (how many pieces the script was split into), plus the voice, model, and resolved format. Failed items get a record with ok: false and the error message instead of stopping the whole run.

Example

{
  "text": "Welcome back to the channel. Today we're looking at one of the strangest mysteries of the deep ocean.",
  "voice": "onyx",
  "model": "tts-1",
  "format": "mp3",
  "speed": 1.0,
  "openaiApiKey": "sk-..."
}

Pricing

$0.04 per voiceover, pay per result, no subscription. The OpenAI TTS usage is billed separately on your own key.

Notes

This actor calls OpenAI for synthesis, so it needs your own OpenAI API key. Individual chunks are capped at 4,000 characters before they're sent, which keeps each request within the model's per-call limit; there's no hard limit on total script length since long inputs are chunked and concatenated.