Whisper

Type: huggingface.automatic_speech_recognition.Whisper

Namespace: huggingface.automatic_speech_recognition

Description

Convert speech to text asr, automatic-speech-recognition, speech-to-text, translate, transcribe, audio, huggingface

**Use Cases:**
- Voice input for a chatbot
- Transcribe or translate audio files
- Create subtitles for videos

**Features:**
- Multilingual speech recognition
- Speech translation
- Language identification

**Note**
- Language selection is sorted by word error rate in the FLEURS benchmark
- There are many variants of Whisper that are optimized for different use cases.

**Links:**
- https://github.com/openai/whisper
- https://platform.openai.com/docs/guides/speech-to-text/supported-languages

Properties

Property	Type	Description	Default
model	`hf.automatic_speech_recognition`	The model ID to use for the speech recognition.	`{'type': 'hf.automatic_speech_recognition', 'repo_id': '', 'path': None, 'variant': None, 'allow_patterns': None, 'ignore_patterns': None}`
audio	`audio`	The input audio to transcribe.	`{'type': 'audio', 'uri': '', 'asset_id': None, 'data': None}`
task	`Enum['transcribe', 'translate']`	The task to perform: ‘transcribe’ for speech-to-text or ‘translate’ for speech translation.	`transcribe`
language	Enum['auto_detect', 'spanish', 'italian', 'korean', 'portuguese', 'english', 'japanese', 'german', 'russian', 'dutch', 'polish', 'catalan', 'french', 'indonesian', 'ukrainian', 'turkish', 'malay', 'swedish', 'mandarin', 'finnish', 'norwegian', 'romanian', 'thai', 'vietnamese', 'slovak', 'arabic', 'czech', 'croatian', 'greek', 'serbian', 'danish', 'bulgarian', 'hungarian', 'filipino', 'bosnian', 'galician', 'macedonian', 'hindi', 'estonian', 'slovenian', 'tamil', 'latvian', 'azerbaijani', 'urdu', 'lithuanian', 'hebrew', 'welsh', 'persian', 'icelandic', 'kazakh', 'afrikaans', 'kannada', 'marathi', 'swahili', 'telugu', 'maori', 'nepali', 'armenian', 'belarusian', 'gujarati', 'punjabi', 'bengali']	The language of the input audio. If not specified, the model will attempt to detect it automatically.	`auto_detect`
timestamps	`Enum['none', 'word', 'sentence']`	The type of timestamps to return for the generated text.	`none`

Outputs

Output	Type	Description
text	`str`
chunks	`List[audio_chunk]`

Metadata

Browse other nodes in the huggingface.automatic_speech_recognition namespace.

Whisper

Description

Properties

Outputs

Metadata

Related Nodes