Creating Lecture Transcripts

Steps
Process media files into a singular mp3.
I used ffmpeg
to convert and concatenate audio files - this builds on the command-line techniques from my GitHub API workflow:
ffmpeg -i video.mp4 -q:a 0 -map a audio.mp3
ffmpeg -i "concat:audio1.mp3|audio2.mp3" -acodec copy audiofull.mp3
TODO: add notes on the ffmpeg
arguments.
Once I had the full mp3 file for the class, I used whisper
by OpenAI to process it to get audio transcripts. Luckily, most of the class is in English, and there are digressions in Tamizh. And whisper handles this beautifully.
whisper audio.mp3 --model turbo --language English -o ./date -f srt
Arguments and points:
- The
model
turbo has a good balance between speed and accuracy - Setting
language
is important, aswhisper
might not be detected is accurately (it takes the first 30s).whisper
supports a couple of dozen languages. - output directory (
-o
) will hold all output files, and-f
lets you filter what output files are needed. If-o
isn't specified, it will not save anything. - The tool outputs everything to the terminal console, in case you don't want to save any file.
Usecase
I joined a cohort of brahmanas learning the way of the sandyavandana by the shastras. This was essentially fuelled by my want to learn more about the shastras, and what better place than learning deeper into the first ever initiation I got into.
The cohort is organised as a 90-minute zoom call, with over 300 people joining the call consistently. The guru is a vast ocean of knowledge, which kind of means he goes off into tangents that are hard to track.
I've been recording voice notes from this lecture series, especially of the Q&A session that doesn't have an end time after the class. I've seen the Q&A last 2 hours, before I dropped off the call for dinner. The point being, there is tons of value in the voice notes, and unlocking that something I'm looking forward to. For more command-line text processing techniques, check out Recursively search and replace text in all files in MacOS and Extracting information out of logseq.