Type: mlx.automatic_speech_recognition.Whisper

Namespace: mlx.automatic_speech_recognition

Description

Transcribe an audio asset using MLX Whisper. whisper, mlx, asr, speech-to-text

- Uses MLX for efficient Apple Silicon acceleration
- Returns transcript and segments with optional word-level timestamps

Properties

Property Type Description Default
model Enum['mlx-community/whisper-tiny-mlx', 'mlx-community/whisper-tiny.en-mlx', 'mlx-community/whisper-base-mlx', 'mlx-community/whisper-base.en-mlx', 'mlx-community/whisper-small-mlx', 'mlx-community/whisper-small.en-mlx', 'mlx-community/whisper-medium-mlx', 'mlx-community/whisper-medium.en-mlx', 'mlx-community/whisper-large-v3-mlx'] Model to use for transcription mlx-community/whisper-tiny.en-mlx
audio audio The input audio to transcribe. {'type': 'audio', 'uri': '', 'asset_id': None, 'data': None}
compression_ratio_threshold Optional[float] Threshold for gzip compression ratio; above this, the result is treated as failed. 2.4
logprob_threshold Optional[float] Average log probability threshold; below this, the result is treated as failed. -1.0
no_speech_threshold Optional[float] Threshold for no-speech probability; if exceeded and logprob is low, the segment is considered silent. 0.6
condition_on_previous_text bool If True, the previous output is used as a prompt for the next window, improving consistency. True
word_timestamps bool If True, extracts word-level timestamps using cross-attention and dynamic time warping. False

Outputs

Output Type Description
text str  
segments List[Any]  

Metadata

Browse other nodes in the mlx.automatic_speech_recognition namespace.