hpr4596 :: Adding voice-over audio track created using text to speech on the movie subtitles
Ken uses piper to convert srt subtitle files into a new audio track to add to foreign language movie

Hosted by Ken Fallon on Monday, 2026-03-16 is flagged as Clean and is released under a CC-BY-SA license.
voice over, Lectoring, mpv, srt, movie, ffmpeg, kdenlive, audacity, subrip, avidemux, LosslessCut, ffsubsync.
1.
Listen in ogg,
opus,
or mp3 format. Play now:
Duration: 00:28:09
Download the transcription and
subtitles.
general.
We’ll explain why we’re doing it, what it is, and cover some useful tools along the way.
I’ve been watching movies recommended to me by my colleagues.
As I work for a global company, the recommendations are often “Foreign Language”, which by definition is every movie to someone.
It’s often difficult to read the subtitles, or they are distracting from the acting.
So I thought of converting the subtitles to speech for inclusion as an audio track, to produce a Voice Over or Lectoring audio track.
Lectoring aka Voice Over Translations
First used is soviet countries to read the news and propaganda from a lectors - the first podcasts ?
In Polish, lektor is also used to mean “off-screen reader” or “voice-over artist”. A lektor is a (usually male) reader who provides the Polish voice-over on foreign-language programmes and films where the voice-over translation technique is used. This is the standard localization technique on Polish television and (as an option) on many DVDs; full dubbing is generally reserved for children’s material.
Example: Night of the Living Dead
To give you an idea of what this sounds like I’m going to play you an example of the out of copyright movie, Night of the Living Dead .
In the United States, Night of the Living Dead was mistakenly released into the public domain because the original distributor failed to replace the copyright notice when changing the film’s name
Original
First the original sound track, then the same clip with the voice over track.
Voice Over
Proof of Concept
As a native English speaker I find it difficult to follow those Voice Over tracks as I am trying to focus on the underlying audio. In discussions with Polish friends, it seems that this is not a problem when Polish is your native language. To put that to the test I wanted to try it out on a movie to see if that were indeed the case.
I asked on Mastodon for a non English movie that was Creative Commons but did have English Subtitles, and HPR host Windigo had the answer.
2009 Nasty Old People is a 2009 Swedish film directed by Hanna Sköld, Tangram Film. It premiered on 10 October 2009 at Kontrapunkt in Malmö, and on file sharing site The Pirate Bay. The film is available as an authorized and legal download under the Creative Commons license CC BY-NC-SA.
So my idea was to take each bit of subtitle text, convert it to audio, then have the generated audio play at the same time the subtitle appears on the screen.
We use piper to process shows here on HPR, and we also generate srt, or SubRip subtitle files for each show.
SRT or SubRip files are the easiest subtitle file to work with.
From https://en.wikipedia.org/wiki/SubRip
The SubRip file format is described on the Matroska multimedia container format website as “perhaps the most basic of all subtitle formats.”
SubRip (SubRip Text) files are named with the
extension
.srt
, and contain formatted lines of plain text in groups separated by a blank line.
Subtitles are numbered sequentially, starting at 1. The timecode format used is hours:minutes:seconds,milliseconds with time units fixed to two zero-padded digits and fractions fixed to three zero-padded digits (00:00:00,000).
The comma (,) is used for fractional separator .
- A numeric counter identifying each sequential subtitle
-
The time that the subtitle should appear on the screen, followed by
–>and the time it should disappear - Subtitle text itself on one or more lines
- A blank line containing no text, indicating the end of this subtitle
I downloaded the movie from the Internet Archive , and then used Piper voice to convert a minutes worth of subtitles.
piper_voice: A fast and local neural text-to-speech engine that embeds espeak-ng for phonemization. GPL-3.0 license
Once I had the audio prepared for a sample of the subtitles, it was over to audacity to create a new subtitle audio track.
Audacity is the world’s most popular audio editing and recording app GPL v2 or later,
Timing the segments would be a problem, if it were not for the fact that Audacity supports srt files as Labels.
File > Import > Lables. Then select the srt file
The subtitle track with the text of the audio will be displayed. I could then Import each Audio segment and line them up with the subtitle track for to get the correct timing.
Each subtitles segment created a new separate audio file which I then exported.
I then used Kdenlive to open the video and import the audio and subtitle tracks.
Kdenlive: is the acronym for KDE Non-Linear Video Editor. It works on Linux, Windows, macOS, and BSD. GPL-3.0-or-later
There is a good article on adding by Jean-Marc on How to Add Subtitles Easily in Kdenlive
Project > Subtitles > Add Subtitle Track
Select the Subtitle file
Align the subtitle and audio track.
After rendering the segment out I was satisfied that this was something worth doing.
The script
The script can be found on the episode page for this show on the HPR site, and I put it together as a proof of concept.
It creates a new audio track for the subtitles, and merges this with the original sound track to create a new selectable sound track.
It begins by creating a length of silent audio that is as long as up to the first subtitle time segment begin timestamp.
The first subtitle segment is converted from text to speech using Piper voice
That segment of audio is added to the initial silence track.
We check the total length so far, and then see if there is supposed to be silence between the last and next subtitle segment begin timestamp.
If there is, then a filler piece of silence is added until the next subtitle should appear.
If not then the audio for both subtitles play immediately after one another.
I was worried that the subtitle audio would then lag behind the on screen dialogue but it works surprisingly well. Even long series of dialogue sort themselves out after a bit.
We do this over and over again for each subtitle, right up to the very end of the movie.
This new subtitle to speech audio track is then merged back into the media file as a new audio track.
96 00:15:06,240 --> 00:15:10,640 It will be two years before it's this big
97 00:15:12,840 --> 00:15:17,840 But don't you bother. By then I'll be long gone
98 00:15:19,840 --> 00:15:22,400 It was just a question
99 00:15:22,880 --> 00:15:25,480 Porridge?
Original
First the original sound track, then the same clip with the voice over track.
Voice Over
Lessons learned
Now that I have done this for a lot of movies, there a few tips for getting the best output.
The creation of the audio track usually goes well, but you can run into issues with the merging of the new track back into the movie.
Preparation
The first thing you need is a subtitle file which will be the basis of the voice you will be listening to. It should be good quality so that it matches when the actors speak.
It’s important to clean up this before you use it, fixing spelling mistakes and removing html that will get rendered. Listening to three hours of “I L Zero ve y Zero u”, or “less than forward slash I, greater than”, or “L am from Lndia” can get a bit tedious.
You should also try and get versions that translate the songs as well.
Getting a SRT file from the media.
As many Subtitles are taken from a DVDs they can often be poor Optical character recognition versions of the bitmap-based streams. So a picture of string “Hello World” rather than the letters.
ffmpeg
By far the easiest and best way to get the subtitles is to extract it from the movie itself, provided it’s a separate track.
ffmpeg is a complete, cross-platform solution to record, convert and stream audio and video. LGPL-2.1-or-later, GPL-2.0-or-later
ffmpeg -y -hide_banner -loglevel error -txt_format text -i "${this_movie_file}" "${this_srt_file}"
Getting a SRT file from the web.
If that fails you can try to get the subtitle files from the Internet.
Select your language with the highest subtitle rating.
You can check the media using the mpv media player.
mpv is a media player based on MPlayer and mplayer2. It supports a wide variety of video file formats, audio and video codecs, and subtitle types. GPLv2+, parts under LGPLv2.1+, some optional parts under GPLv3
Name the srt file with the same prefix as the movie and
mpv
will play it. You can also use the
--sub-files=
option as well.
mpv "${this_movie_file}" --sub-files="${this_srt_file}"
Scrub through the file to see if the timing is correct. The subtitles can be toggled using the
j
key.
Fixing Timing issues
It’s very important to get the subtitles to align, otherwise the voices will be out of sync.
When the subtitles don’t match up, it’s usually that they need to have the start offset corrected.
ffsubsync will automatically try and adjust the offset of the first subtitle to the first use of speech in a movie.
ffsubsync: Language-agnostic automatic synchronization of subtitles with video, so that subtitles are aligned to the correct starting point within the video. MIT license
pip install ffsubsync
ffs video.mp4 -i unsynchronized.srt -o synchronized.srt
LosslessCut will allow you to quickly remove additional trailers, or ads, at the beginning, so that ffsubsync will have a better chance of working if they are trimmed away.
LosslessCut: aims to be the ultimate cross platform FFmpeg GUI for extremely fast and lossless operations on video, audio, subtitle and other related media files. GPL-2.0 license
If that fails to match up the subtitles, you can use
mpv keyboard shortcuts
, move to the first speech segment an then press the
Ctrl+Shift+Left
and
Ctrl+Shift+Right
to adjust subtitle delay so that the next or previous subtitle is displayed. It will also show a number giving the miliseconds the delay is, eg
-148416
miliseconds or
-148.416
seconds.
You can use many tools to adjust the subtitles, and I tried out SRT Offset .
srt-offset: A simple command-line tool to offset SRT subtitle files. This tool allows you to adjust the timing of subtitles in SRT files, which can be useful when subtitles are out of sync with the video. MIT license
srt-offset -i input.srt -offset -148.416 -o output.srt
Manually adding the new subtitle to speech audio track
If that presents an issue then you can use avidemux to just add the new audio track.
Avidemux: is a free video editor designed for simple cutting, filtering and encoding tasks. GPL V2
Open Avidemux, and select “File > Open”, to select the movie.
Then go to “Audio > Select Track”
Select the next unselected track and tick “Enabled”, “Add Audio Track”
Then pick the new mixed track, in this example
.~NastyOldPeople_mixed.mp3
Conclusion
I now find it much easier to watch a movie with the voice over track. It gets to a point where I don’t even notice it is there and just hear the actors speak in their own language, and I just know what they are saying.
Links
- 2009 Nasty Old People
- A Spanish voice-over translation
- avidemux
- by Jean-Marc on How to Add Subtitles Easily in Kdenlive
- container format
- Decimal separator
- extension
- ffmpeg
- ffmpeg on wikipedia
- ffsubsync
- GPL-3.0 license
- GPL v2 or later
- Kdenlive
- LGPL-2.1
- LosslessCut
- Matroska
- MIT license
- Movie on Archive.org
- mpv
- mpv keyboard shortcuts
- mpv wikipedia
- Nasty Old People from the Internet Archive
- Night of the Living Dead
- Noc żywych trupów | Film grozy | Polski lektor
- OpenSubtitles
- opensubtitles.org
- Optical character recognition
- Piper voice
- SRT Offset
- srt, or SubRip subtitle files
- SubRip
- Timecode
- Voice-over translation
- Whisper