This blogpost was co-created with ChatGPT.
In this blogpost, I will describe the workflow I used to efficiently turn my talk videos into blogposts with machine learning. The heavy lifting is done by two relatively new machine learning tools: whisper.cpp and ChatGPT.
I use whisper.cpp, created by Georgi Gerganov, for audio transcripton. Whisper.cpp is a lightning fast open-source implementation of OpenAI’s automatic speech recognition (ASR) model Whisper. I love that it runs offline, on my 5 year old computer, without any third party dependencies. It transcribed 15-30 minute talks in mere minutes. The audio transcripts are edited by hand and then fed ChatGPT for further processing.
ChatGPT is a large language model by OpenAI – as it will repeatedly tell you if you feed it the wrong prompt. It can be used for a variety of natural language processing tasks, such as text completion, translation, and summarization. I use ChatGPT to help me rewrite my transcripts into more fluent and natural-sounding texts.
Here’s the workflow:
basemodel that is downloaded as part of the whisper.cpp demo worked fine for transcribing talks. The base model supports multiple languages. (Edit: for transcribing English talks I now use the medium-sized model for English:
youtube-dlp -x https://www.youtube.com/watch?v=dQw4w9WgXcQ --audio-format mp3
ffmpeg -i filename.mp3 -ar 16000 -ac 1 -c:a pcm_s16le filename.wav
./main -m models/ggml-base.bin -otxt filename.wav
[name]. Things like names, urls and hyperspecific jargon can be tricky to transcribe for Whisper.
Writing a blogpost this way can still be a lot of work, but it definitely beats transcribing 30 minutes of audio by hand. As added bonus, I now have text-searchable (and search engine discoverable!) textual versions of my past talks.
Judith van Stegeren, PhD is a Dutch computer scientist specialized in natural language processing and data science for two domains: investing and video games. Despite her expertise, no part of her PhD thesis was written by a computer program.