Creating Lecture Transcripts

How to Create Lecture Transcripts Using ffmpeg and Whisper AI

Steps

Process media files into a singular mp3.

I used ffmpeg to convert and concatenate audio files - this builds on the command-line techniques from my GitHub API workflow:

ffmpeg -i video.mp4 -q:a 0 -map a audio.mp3
ffmpeg -i "concat:audio1.mp3|audio2.mp3" -acodec copy audiofull.mp3

TODO: add notes on the ffmpeg arguments.


Once I had the full mp3 file for the class, I used whisper by OpenAI to process it to get audio transcripts. Luckily, most of the class is in English, and there are digressions in Tamizh. And whisper handles this beautifully.

whisper audio.mp3 --model turbo --language English -o ./date -f srt

Arguments and points:

  • The model turbo has a good balance between speed and accuracy
  • Setting language is important, as whisper might not be detected is accurately (it takes the first 30s). whisper supports a couple of dozen languages.
  • output directory (-o) will hold all output files, and -f lets you filter what output files are needed. If -o isn't specified, it will not save anything.
  • The tool outputs everything to the terminal console, in case you don't want to save any file.

Usecase

I joined a cohort of brahmanas learning the way of the sandyavandana by the shastras. This was essentially fuelled by my want to learn more about the shastras, and what better place than learning deeper into the first ever initiation I got into.

The cohort is organised as a 90-minute zoom call, with over 300 people joining the call consistently. The guru is a vast ocean of knowledge, which kind of means he goes off into tangents that are hard to track.

I've been recording voice notes from this lecture series, especially of the Q&A session that doesn't have an end time after the class. I've seen the Q&A last 2 hours, before I dropped off the call for dinner. The point being, there is tons of value in the voice notes, and unlocking that something I'm looking forward to. For more command-line text processing techniques, check out Recursively search and replace text in all files in MacOS and Extracting information out of logseq.