Word-Level Timestamps for Karaoke & TikTok Captions
Need word-by-word captions that pop in sync? This returns per-word start and end times for animated TikTok and Reels karaoke subtitles.
How it works
- 1Open it on Apify
Hit Run on Apify — it opens the tool in the cloud, no install.
- 2Set the inputs
Adjust
mediaUrl,mediaUrls,language(sensible defaults are pre-filled). - 3Click Run
The tool runs on Apify’s cloud and collects the data for you.
- 4Export the results
Download as JSON, CSV or Excel, or pipe straight into your app, Google Sheets, or an AI agent.
Inputs
| Field | What it does | Type |
|---|---|---|
mediaUrl | Public URL to a video or audio file (mp4, mov, mp3, wav, m4a, webm). Use this for a single file, or mediaUrls for a batch. | string |
mediaUrls | Transcribe several files in one run — one dataset row per URL. Each item is a public video/audio URL. | array |
language | Spoken language ISO code, or 'auto' to detect. | string |
wordTimestamps | Return per-word start/end times (great for karaoke captions). | boolean |
outputFormats | Which subtitle/text files to also produce: srt, vtt, txt. | array |
openaiApiKey | Your OpenAI (Whisper) key. Kept private. | string |
model | Transcription model. Default whisper-1. | string |
baseUrl | OpenAI-compatible base URL. Default https://api.openai.com/v1. | string |
What you get
A structured dataset — each result includes fields like:
_demo_noticedurationSecondslanguagesegmentCountsegmentssourceUrlsrtKeytextvttKeywordCountExport every run as JSON, CSV or Excel, or send it to your app, a database, Google Sheets, or an AI agent.
More use cases for Video & Audio Transcriber — Word-Level + SRT/VTT
MP4 to SRT: Generate Timed Subtitles From a Video URL
Turn an MP4 video URL into a timed SRT subtitle file with auto-detected language, ready to upload as captions to YouTube or Vimeo.
Podcast MP3 to Text: Transcribe Episodes to Transcript
Podcasters get a clean text transcript from any MP3 episode URL, ready to paste into show notes, a blog post, or searchable archives.