Status: Draft
Date: 2026-04-26
Owner: mg
Scope: Add a local-models provider backed by @huggingface/transformers (and kokoro-js) covering chat, TTS, ASR, and embeddings.
Expose @nodetool/transformers-js-nodes capabilities to the rest of NodeTool through the standard BaseProvider interface, so that:
useTTSProviders, useASRProviders, useEmbeddingProviders) automatically surface transformers.js options.hasToolSupport(_) = false).New workspace package @nodetool/transformers-js-provider at packages/transformers-js-provider/:
packages/transformers-js-provider/
├── package.json
├── tsconfig.json
├── src/
│ ├── index.ts # exports + side-effect registration
│ ├── transformers-js-provider.ts # the BaseProvider subclass
│ ├── chat.ts # generateMessage / generateMessages impl
│ ├── tts.ts # textToSpeechEncoded impl
│ ├── asr.ts # automaticSpeechRecognition impl
│ ├── embeddings.ts # generateEmbedding impl
│ └── model-discovery.ts # union(recommendedFor + cache scan)
└── tests/
├── chat.test.ts
├── tts.test.ts
├── asr.test.ts
├── embeddings.test.ts
└── model-discovery.test.ts
Why a new package: @nodetool/runtime is the dependency root for the websocket server and chat CLI. Embedding the wasm/onnx/kokoro stack there pulls those into every server boot. Keeping the provider in its own workspace package preserves the option to lazy-load it (mirrors @nodetool/transformers-js-nodes itself).
Dependencies:
@nodetool/runtime — BaseProvider, types, registry@nodetool/transformers-js-nodes — getPipeline, loadTransformers, recommendedFor, scanTransformersJsCache, getTransformersJsCacheDir, KOKORO_VOICES (re-export)@nodetool/protocol — Chunkkokoro-js (transitive via transformers-js-nodes) — already a dep@huggingface/transformers import; everything routes through getPipeline/loadTransformers from transformers-js-nodes.provider: ProviderId = "transformers_js" — matches tjsRefToUnified()’s emitted provider field, so getAvailableProviderIds() and the existing transformers_js filter in useModelManagerStore pick it up without changes.requiredSecrets() = [] — local inference, no API key.hasToolSupport(_) = false — explicit, so agent loops do not feed it tool schemas.Single helper discoverTjsModels(taskTypes: string[]) (in model-discovery.ts):
recommendedFor(t) for each requested tjs.<task>.scanTransformersJsCache(getTransformersJsCacheDir()).repo_id, prefer downloaded: true for cached entries.LanguageModel | TTSModel | ASRModel | EmbeddingModel shapes (caller picks).Per-modality task type sets:
| Method | Tasks |
|---|---|
getAvailableLanguageModels |
tjs.text_generation |
getAvailableTTSModels |
tjs.text_to_speech |
getAvailableASRModels |
tjs.automatic_speech_recognition |
getAvailableEmbeddingModels |
tjs.feature_extraction |
Cached repos that are NOT in any recommended list (the tjs.cached bucket from the model manager work) are intentionally NOT exposed via the provider — they cannot be classified into a modality without metadata. The model manager surface still shows them.
TTSModel.voices is populated for Kokoro repos (uses the KOKORO_VOICES constant from transformers-js-nodes); other repos return voices: undefined.
generateMessage — non-streaming:
Message[] → transformers.js chat format [{role, content}]. Drop unsupported roles. Coerce non-string content (text-only).pipeline = await getPipeline({ task: "text-generation", model, dtype: "auto", device: "auto" }).pipeline(messages, { max_new_tokens: maxTokens ?? 512, temperature, top_p: topP, do_sample: temperature !== 0 }).out[0].generated_text (transformers.js convention when input is a chat array).Message { role: "assistant", content: <string> }.generateMessages — streaming:
TextStreamer and InterruptableStoppingCriteria from @huggingface/transformers (both confirmed exported in 3.8.x). Add both to the TransformersModule type surface in transformers-base.ts and re-export through loadTransformers().ProviderStreamItem chunk: { type: "chunk", chunk: { content_type: "text", content: <token> } }.{ type: "message", message: { role: "assistant", content: <full> } }.signal: AbortSignal wired to the streamer’s stop hook.Tools: if tools or toolChoice provided, log a warning and ignore (don’t throw).
Error mapping:
Error("Failed to load <model>: <orig>").(hint: try a smaller dtype like q4).DOMException("Aborted", "AbortError") (matches OpenAI/Anthropic providers).Override textToSpeechEncoded (returns full WAV) rather than the streaming textToSpeech. Mirrors what the workflow node already does.
text-to-speech.ts).KokoroTTS.from_pretrained(model, { dtype, device }) (cached), tts.generate(text, { voice }).getPipeline({ task: "text-to-speech", model })(text, opts). Pass speaker_embeddings only for SpeechT5 repos (mirror existing guard).encodeWav helper from the node into a shared util in transformers-js-nodes/src/wav.ts; both call sites import it).EncodedAudioResult { audio: Uint8Array, mimeType: "audio/wav" }.audioFormat hint: only "wav" is supported in v1; if the caller asks for mp3/opus, log and fall through to wav.
Override automaticSpeechRecognition:
pipeline = await getPipeline({ task: "automatic-speech-recognition", model }).Float32Array at the model’s expected sample rate. Decode the input Uint8Array (likely WAV) using a small WAV decoder util (write decodeWav.ts in transformers-js-nodes); resample to 16kHz with linear interpolation if input differs.pipeline(samples, { language, return_timestamps: word_timestamps ? "word" : false }).ASRResult { text, chunks?: [{timestamp, text}] }.Whisper-specific options (task: "transcribe"|"translate") — not exposing in v1; default transcribe.
Override generateEmbedding:
pipeline = await getPipeline({ task: "feature-extraction", model }).pipeline(text, { pooling: "mean", normalize: true }). Both single-string and array inputs are supported by transformers.js.number[][]. For a single input, wrap to [vec].dimensions arg: not honored — transformers.js does not support truncation at inference. Document this and ignore.
Provider takes no constructor args today. Reads:
getTransformersJsCacheDir() for cache scanning (already configurable via setTransformersJsCacheDir in the nodes package).getPipeline (auto everywhere).Future: per-instance overrides via constructor options ({ defaultDtype, defaultDevice }) — not in v1.
packages/transformers-js-provider/src/index.ts calls registerProvider("transformers_js", () => new TransformersJsProvider()). Registration is invoked from the websocket server’s provider bootstrap (matches how @nodetool/runtime’s built-in providers register today). One-line edit to packages/websocket/src/server.ts (or wherever provider modules are imported for side effects).
The availableProviderIds query (in getAvailableProviderIds(userId)) should include transformers_js unconditionally — it has no secrets to gate on. Verify the existing implementation does the right thing for secret-less providers; adjust if needed.
No code changes required. Existing hooks (useTTSProviders, useASRProviders, useEmbeddingProviders, useModelsByProvider) read providerCapabilities() and getAvailable*Models(). Once the provider is registered, it shows up automatically.
The model manager’s recent work (getAllModels scanning the tjs cache) continues to surface cached repos under tjs.<task> types — the provider integration is orthogonal and additive.
Vitest suites in packages/transformers-js-provider/tests/:
getPipeline to return a fake function that yields canned text. Asserts message conversion, streaming chunk shape, abort behavior, ignored-tools warning.KokoroTTS.from_pretrained and getPipeline. Asserts Kokoro path uses voice, non-Kokoro path uses speaker_embeddings only for SpeechT5, output is a valid WAV header.dimensions is ignored.recommendedFor and scanTransformersJsCache. Asserts union, dedup, downloaded flag propagation, voice population for Kokoro entries.No live model downloads in CI; everything mock-based. A separate tests/integration/ may run with RUN_INTEGRATION=1 against actual models for spot checks but is not required.
| Risk | Mitigation |
|---|---|
| First chat token latency on unmodelloaded repos (10s+ for 2B fp32) | Surface in UI via existing model-manager download flow; pipeline cache makes 2nd call fast |
| Memory: 2B fp32 ~8GB RAM | Document recommended dtypes (q4 for 4B+, q8 for 2B); future per-call dtype override |
| transformers.js streaming API surface changes | Centralize in chat.ts; version-pin @huggingface/transformers ^3.7 |
| WASM load order / env mutation race (the bug we just fixed for Kokoro) | All entry points await loadTransformers() first; provider does the same |
Net additive. No public API changes in @nodetool/runtime or @nodetool/transformers-js-nodes. The new package is opt-in until registered; once the websocket server imports it for side effects, the provider becomes available to all clients. No DB migrations.