OpenAI Whisper is a state-of-the-art speech recognition model that we can run from the command line.
This post assumes macOS with Python >= 3.7 installed.
First we need to install FFmpeg for audio processing.
$ brew install ffmpeg
$ pip install openai-whisper
This will also install a binary command: whisper
Now, record a piece of audio using QuickTime or similar.
Save the file to file.m4a, for example.
Then, to run the speech recognition:
$ whisper file.m4a --model small
The output will look something like this:
Detecting language using up to the first 30 seconds. Use `--language` to specify the language Detected language: English [00:00.000 --> 00:01.420] Hello there.
We can also use the specific repo URI if brew does not work on a system:
$ pip install git+https://github.com/openai/whisper.git
We can use the medium or large models if the small model is not sufficiently accurate:
$ whisper file.m4a --model medium
$ whisper file.m4a --model large