Perform Speech Recognition in the Terminal with Whisper

OpenAI Whisper is a state-of-the-art speech recognition model that we can run from the command line.

This post assumes macOS with Python >= 3.7 installed.

First we need to install FFmpeg for audio processing.

$ brew install ffmpeg

Install Whisper:

$ pip install openai-whisper

This will also install a binary command: whisper

Now, record a piece of audio using QuickTime or similar.

Save the file to file.m4a, for example.

Then, to run the speech recognition:

$ whisper file.m4a --model small

The output will look something like this:

Detecting language using up to the first 30 seconds. 
Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:01.420] Hello there.

Notes

We can also use the specific repo URI if brew does not work on a system:

$ pip install git+https://github.com/openai/whisper.git

We can use the medium or large models if the small model is not sufficiently accurate:

$ whisper file.m4a --model medium
$ whisper file.m4a --model large

 

Leave a Reply

Your email address will not be published. Required fields are marked *