Iinsanely-fast-whisper — screenshot of github.com

Iinsanely-fast-whisper

This is a CLI tool for ridiculously fast on-device Whisper audio transcription. It leverages optimizations like Flash Attention 2 for rapid processing on GPUs and Apple Silicon Macs.

Visit github.com →

Questions & Answers

What is insanely-fast-whisper?
Insanely-fast-whisper is an opinionated command-line interface (CLI) tool that enables extremely fast on-device audio transcription using OpenAI's Whisper models. It is built on Hugging Face Transformers, Optimum, and Flash Attention 2.
Who should use insanely-fast-whisper?
It is intended for developers and users who need to transcribe large audio files quickly and locally, particularly those with NVIDIA GPUs or Apple Silicon Macs, due to its specialized performance optimizations.
How does insanely-fast-whisper achieve faster transcription than standard Whisper implementations?
It achieves its speed by integrating optimizations like Flash Attention 2, fp16 precision, and batching. Benchmarks show it can transcribe 2.5 hours of audio in less than 98 seconds on an Nvidia A100 GPU, significantly outperforming standard Transformer implementations and even Faster Whisper.
When is insanely-fast-whisper most suitable for use?
This tool is most suitable when high-speed, local audio transcription is critical, especially for large volumes of audio data. It is ideal for scenarios where cloud-based services are not preferred or where maximum transcription throughput on compatible hardware is desired.
What hardware is required to run insanely-fast-whisper effectively?
Insanely-fast-whisper is optimized for NVIDIA GPUs (using CUDA) and Macs with Apple Silicon (using MPS). Users need to specify "--device-id mps" for macOS or the device number for CUDA-enabled GPUs.