# insanely-fast-whisper **Repository Path**: for2cyfeng/insanely-fast-whisper ## Basic Information - **Project Name**: insanely-fast-whisper - **Description**: 一款针对音频文件的命令行工具,具有较强的自动转录能力。该工具还包括说话人分割和区分(例如,识别说话人1与说话人2)。 - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-02-14 - **Last Updated**: 2024-02-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Insanely Fast Whisper An opinionated CLI to transcribe Audio files w/ Whisper on-device! Powered by 🤗 *Transformers*, *Optimum* & *flash-attn* **TL;DR** - Transcribe **150** minutes (2.5 hours) of audio in less than **98** seconds - with [OpenAI's Whisper Large v3](https://huggingface.co/openai/whisper-large-v3). Blazingly fast transcription is now a reality!⚡️

Not convinced? Here are some benchmarks we ran on a Nvidia A100 - 80GB 👇 | Optimisation type | Time to Transcribe (150 mins of Audio) | |------------------|------------------| | large-v3 (Transformers) (`fp32`) | ~31 (*31 min 1 sec*) | | large-v3 (Transformers) (`fp16` + `batching [24]` + `bettertransformer`) | ~5 (*5 min 2 sec*) | | **large-v3 (Transformers) (`fp16` + `batching [24]` + `Flash Attention 2`)** | **~2 (*1 min 38 sec*)** | | distil-large-v2 (Transformers) (`fp16` + `batching [24]` + `bettertransformer`) | ~3 (*3 min 16 sec*) | | **distil-large-v2 (Transformers) (`fp16` + `batching [24]` + `Flash Attention 2`)** | **~1 (*1 min 18 sec*)** | | large-v2 (Faster Whisper) (`fp16` + `beam_size [1]`) | ~9.23 (*9 min 23 sec*) | | large-v2 (Faster Whisper) (`8-bit` + `beam_size [1]`) | ~8 (*8 min 15 sec*) | P.S. We also ran the benchmarks on a [Google Colab T4 GPU](/notebooks/) instance too! P.P.S. This project originally started as a way to showcase benchmarks for Transformers, but has since evolved into a lightweight CLI for people to use. This is purely community driven. We add whatever community seems to have a strong demand for! ## 🆕 Blazingly fast transcriptions via your terminal! ⚡️ We've added a CLI to enable fast transcriptions. Here's how you can use it: Install `insanely-fast-whisper` with `pipx` (`pip install pipx` or `brew install pipx`): ```bash pipx install insanely-fast-whisper ``` *Note: Due to a dependency on [`onnxruntime`, Python 3.12 is currently not supported](https://github.com/microsoft/onnxruntime/issues/17842). You can force a Python version (e.g. 3.11) by adding `--python python3.11` to the command.* ⚠️ If you have python 3.11.XX installed, `pipx` may parse the version incorrectly and install a very old version of `insanely-fast-whisper` without telling you (version `0.0.8`, which won't work anymore with the current `BetterTransformers`). In that case, you can install the latest version by passing `--ignore-requires-python` to `pip`: ```bash pipx install insanely-fast-whisper --force --pip-args="--ignore-requires-python" ``` If you're installing with `pip`, you can pass the argument directly: `pip install insanely-fast-whisper --ignore-requires-python`. Run inference from any path on your computer: ```bash insanely-fast-whisper --file-name ``` *Note: if you are running on macOS, you also need to add `--device-id mps` flag.* 🔥 You can run [Whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) w/ [Flash Attention 2](https://github.com/Dao-AILab/flash-attention) from this CLI too: ```bash insanely-fast-whisper --file-name --flash True ``` 🌟 You can run [distil-whisper](https://huggingface.co/distil-whisper) directly from this CLI too: ```bash insanely-fast-whisper --model-name distil-whisper/large-v2 --file-name ``` Don't want to install `insanely-fast-whisper`? Just use `pipx run`: ```bash pipx run insanely-fast-whisper --file-name ``` > [!NOTE] > The CLI is highly opinionated and only works on NVIDIA GPUs & Mac. Make sure to check out the defaults and the list of options you can play around with to maximise your transcription throughput. Run `insanely-fast-whisper --help` or `pipx run insanely-fast-whisper --help` to get all the CLI arguments along with their defaults. ## CLI Options The `insanely-fast-whisper` repo provides an all round support for running Whisper in various settings. Note that as of today 26th Nov, `insanely-fast-whisper` works on both CUDA and mps (mac) enabled devices. ``` -h, --help show this help message and exit --file-name FILE_NAME Path or URL to the audio file to be transcribed. --device-id DEVICE_ID Device ID for your GPU. Just pass the device number when using CUDA, or "mps" for Macs with Apple Silicon. (default: "0") --transcript-path TRANSCRIPT_PATH Path to save the transcription output. (default: output.json) --model-name MODEL_NAME Name of the pretrained model/ checkpoint to perform ASR. (default: openai/whisper-large-v3) --task {transcribe,translate} Task to perform: transcribe or translate to another language. (default: transcribe) --language LANGUAGE Language of the input audio. (default: "None" (Whisper auto-detects the language)) --batch-size BATCH_SIZE Number of parallel batches you want to compute. Reduce if you face OOMs. (default: 24) --flash FLASH Use Flash Attention 2. Read the FAQs to see how to install FA2 correctly. (default: False) --timestamp {chunk,word} Whisper supports both chunked as well as word level timestamps. (default: chunk) --hf_token Provide a hf.co/settings/token for Pyannote.audio to diarise the audio clips ``` ## Frequently Asked Questions **How to correctly install flash-attn to make it work with `insanely-fast-whisper`?** Make sure to install it via `pipx runpip insanely-fast-whisper install flash-attn --no-build-isolation`. Massive kudos to @li-yifei for helping with this. **How to solve an `AssertionError: Torch not compiled with CUDA enabled` error on Windows?** The root cause of this problem is still unknown, however, you can resolve this by manually installing torch in the virtualenv like `python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121`. Thanks to @pto2k for all tdebugging this. **How to avoid Out-Of-Memory (OOM) exceptions on Mac?** The *mps* backend isn't as optimised as CUDA, hence is way more memory hungry. Typically you can run with `--batch-size 4` without any issues (should use roughly 12GB GPU VRAM). Don't forget to set `--device-id mps`. ## How to use Whisper without a CLI?
All you need to run is the below snippet: ``` pip install --upgrade transformers optimum accelerate ``` ```python import torch from transformers import pipeline from transformers.utils import is_flash_attn_2_available pipe = pipeline( "automatic-speech-recognition", model="openai/whisper-large-v3", # select checkpoint from https://huggingface.co/openai/whisper-large-v3#model-details torch_dtype=torch.float16, device="cuda:0", # or mps for Mac devices model_kwargs={"attn_implementation": "flash_attention_2"} if is_flash_attn_2_available() else {"attn_implementation": "sdpa"}, ) outputs = pipe( "", chunk_length_s=30, batch_size=24, return_timestamps=True, ) outputs ```
## Acknowledgements 1. [OpenAI Whisper](https://github.com/openai/whisper) team for open sourcing such a brilliant check point. 2. Hugging Face Transformers team, specifically [Arthur](https://github.com/ArthurZucker), [Patrick](https://github.com/patrickvonplaten), [Sanchit](https://github.com/sanchit-gandhi) & [Yoach](https://github.com/ylacombe) (alphabetical order) for continuing to maintain Whisper in Transformers. 3. Hugging Face [Optimum](https://github.com/huggingface/optimum) team for making the BetterTransformer API so easily accessible. 4. [Patrick Arminio](https://github.com/patrick91) for helping me tremendously to put together this CLI. ## Community showcase 1. @ochen1 created a brilliant MVP for a CLI here: https://github.com/ochen1/insanely-fast-whisper-cli (Try it out now!) 2. @arihanv created an app (Shush) using NextJS (Frontend) & Modal (Backend): https://github.com/arihanv/Shush (Check it outtt!) 3. @kadirnar created a python package on top of the transformers with optimisations: https://github.com/kadirnar/whisper-plus (Go go go!!!)