MLX Whisper

Type: mlx.automatic_speech_recognition.Whisper

Namespace: mlx.automatic_speech_recognition

Description

Transcribe an audio asset using MLX Whisper. whisper, mlx, asr, speech-to-text

- Uses MLX for efficient Apple Silicon acceleration
- Returns transcript and segments with optional word-level timestamps

Property	Type	Description	Default
model	`Enum['mlx-community/whisper-tiny-mlx', 'mlx-community/whisper-tiny.en-mlx', 'mlx-community/whisper-base-mlx', 'mlx-community/whisper-base.en-mlx', 'mlx-community/whisper-small-mlx', 'mlx-community/whisper-small.en-mlx', 'mlx-community/whisper-medium-mlx', 'mlx-community/whisper-medium.en-mlx', 'mlx-community/whisper-large-v3-mlx']`	Model to use for transcription	`mlx-community/whisper-tiny.en-mlx`
audio	`audio`	The input audio to transcribe.	`{'type': 'audio', 'uri': '', 'asset_id': None, 'data': None}`
compression_ratio_threshold	`Optional[float]`	Threshold for gzip compression ratio; above this, the result is treated as failed.	`2.4`
logprob_threshold	`Optional[float]`	Average log probability threshold; below this, the result is treated as failed.	`-1.0`
no_speech_threshold	`Optional[float]`	Threshold for no-speech probability; if exceeded and logprob is low, the segment is considered silent.	`0.6`
condition_on_previous_text	`bool`	If True, the previous output is used as a prompt for the next window, improving consistency.	`True`
word_timestamps	`bool`	If True, extracts word-level timestamps using cross-attention and dynamic time warping.	`False`

Output	Type	Description
text	`str`
segments	`List[Any]`

Browse other nodes in the mlx.automatic_speech_recognition namespace.