Providers

The NodeTool provider system offers a unified interface for interacting with various AI service providers. This abstraction allows you to switch between different AI backends (OpenAI, Anthropic, Gemini, HuggingFace, etc.) without changing your workflow logic.

Overview

Providers in NodeTool act as adapters that translate between NodeTool’s internal formats and the specific API requirements of different AI services. The system supports multiple modalities:

Language Models (LLMs) - Text generation and chat completions
Image Generation - Text-to-image and image-to-image creation
Video Generation - Text-to-video and image-to-video synthesis
Text-to-Speech (TTS) - Convert text to natural speech audio
Automatic Speech Recognition (ASR) - Transcribe audio to text
3D Generation - Text-to-3D and image-to-3D model creation

To select a provider, pick a model in the node property panel. Providers are grouped under model families: OpenAI, Anthropic, Gemini, Hugging Face, Ollama, vLLM.

Architecture

Provider Capabilities

The capability system uses introspection to automatically detect which features a provider supports:

Capability	Description	Method
`GENERATE_MESSAGE`	Single message generation	`generate_message()`
`GENERATE_MESSAGES`	Streaming message generation	`generate_messages()`
`TEXT_TO_IMAGE`	Generate images from text	`text_to_image()`
`IMAGE_TO_IMAGE`	Transform images with text	`image_to_image()`
`TEXT_TO_VIDEO`	Generate videos from text	`text_to_video()`
`IMAGE_TO_VIDEO`	Animate images into videos	`image_to_video()`
`TEXT_TO_SPEECH`	Convert text to speech	`text_to_speech()`
`AUTOMATIC_SPEECH_RECOGNITION`	Transcribe audio to text	`automatic_speech_recognition()`

Available Providers

Language Model Providers

OpenAI (`openai_provider.py`)

Capabilities: Language models (GPT-4, GPT-3.5), Image generation (DALL-E), Speech services

Configuration:

Features:

✅ Streaming responses
✅ Native tool/function calling
✅ System prompts
✅ Multimodal inputs (vision)
✅ JSON mode
✅ Image generation (DALL-E 2 & 3)
✅ Text-to-speech (TTS)
✅ Speech-to-text (Whisper)

Anthropic (`anthropic_provider.py`)

Capabilities: Claude language models, Advanced reasoning

Configuration:

Features:

✅ Streaming responses
✅ Native tool/function calling
✅ System prompts
✅ Multimodal inputs (vision)
✅ JSON mode (via tool use)

Google Gemini (`gemini_provider.py`)

Capabilities: Gemini language models, Multimodal AI, Video generation

Features:

✅ Streaming responses
✅ Native tool/function calling
✅ System prompts
✅ File input (via Blobs)
✅ JSON mode
✅ Video generation (Veo 2, Veo 3)

Ollama (`ollama_provider.py`)

Capabilities: Local/self-hosted models, Open-source models

Configuration:

Features:

✅ Streaming responses
✅ Tool calling (model dependent)
✅ System prompts
✅ Multimodal inputs (Base64 images)
✅ JSON mode (model dependent)
✅ No API key required
✅ Privacy-focused (runs locally)

Notes:

Tool use and JSON mode support depends on the specific model
Textual fallback mechanism available for incompatible models
Models must be pulled via ollama pull before use

vLLM (`vllm_provider.py`)

Capabilities: Self-hosted inference, OpenAI-compatible API

Features:

✅ Streaming responses
✅ Tool calling (model dependent)
✅ System prompts
✅ Multimodal inputs (OpenAI format)
✅ JSON mode (model dependent)
✅ High throughput inference

Image Generation Providers

HuggingFace (`huggingface_provider.py`)

Capabilities: Diverse model ecosystem, Multiple hosted services, 500,000+ models

Features:

27+ node categories for AI workflows
Supports multiple sub-providers (FAL.ai, Together, Replicate, etc.)
Text generation with streaming support
Text-to-image and image-to-image generation
Speech recognition and text-to-speech
Audio and video generation
Image classification and object detection
Zero-shot learning capabilities
LoRA model support for Stable Diffusion
Quantization support (FP16, FP4, INT4)
CPU offload for memory-constrained environments

Configuration:

Set HF_TOKEN environment variable for authentication. Some models require accepting terms on HuggingFace Hub.

For detailed information on all HuggingFace nodes, model recommendations, and usage examples, see the HuggingFace Integration Guide.

Video Generation Providers

Multiple providers now support advanced video generation capabilities through the unified interface. NodeTool supports text-to-video and image-to-video generation:

OpenAI Sora 2 Pro

Capabilities: Text-to-video, Image-to-video

Features:

✅ Realistic motion with refined physics simulation
✅ Synchronized native audio generation
✅ Up to 15 seconds of video generation
✅ 1080p output resolution
✅ Advanced scene understanding

Configuration: Set OPENAI_API_KEY environment variable or configure in Settings → Providers

Google Veo 3.1 (via Gemini)

Capabilities: Text-to-video, Image-to-video, Multi-image reference

Features:

✅ Upgraded realistic motion synthesis
✅ Extended clip length support
✅ Multi-image reference inputs for consistent generation
✅ Native 1080p with synchronized audio
✅ Advanced camera control

Configuration: Set GEMINI_API_KEY environment variable or configure in Settings → Providers

xAI Grok Imagine

Capabilities: Multimodal text/image-to-video, Text-to-image

Features:

✅ Coherent motion synthesis from text or image inputs
✅ Synchronized audio generation
✅ Short video generation with strong coherence
✅ Also supports high-quality text-to-image

Configuration: Access via kie.ai or other API aggregators (direct API key not currently registered in NodeTool)

Alibaba Wan 2.6

Capabilities: Multi-shot video generation, Reference-guided generation

Features:

✅ Affordable 1080p video generation
✅ Stable character consistency across shots
✅ Native audio synthesis
✅ T2V/I2V with reference-guided modes
✅ Cost-effective for high-volume workflows

Configuration: Access via kie.ai or other API aggregators (direct API key not currently registered in NodeTool)

MiniMax Hailuo 2.3

Capabilities: High-fidelity text-to-video and image-to-video

Features:

✅ Expressive character animation
✅ Complex motion and lighting effects
✅ High visual fidelity
✅ Natural movement patterns

Configuration: Set MINIMAX_API_KEY environment variable or configure in Settings → Providers

Kling 2.6

Capabilities: Video generation with audio

Features:

✅ Text/image to synchronized video
✅ Integrated speech synthesis
✅ Ambient sound generation
✅ Sound effects generation
✅ Strong audio-visual coherence

Configuration: Access via kie.ai or other API aggregators (direct API key not currently registered in NodeTool)

Image Generation Providers

Black Forest Labs FLUX.2

Capabilities: Advanced text-to-image generation

Features:

✅ Photorealistic image generation
✅ Multi-reference consistency
✅ Accurate text rendering in images
✅ Flexible control parameters
✅ High-quality output across diverse styles

Configuration: Available through HuggingFace provider or direct API access

Google Nano Banana Pro

Capabilities: High-resolution text-to-image

Features:

✅ Sharper 2K native output
✅ 4K upscaling
✅ Improved text rendering accuracy
✅ Better character consistency
✅ Detail preservation

Configuration: Access via Google’s Gemini API using GEMINI_API_KEY, or through kie.ai