HuggingFace Python Worker — Docker image, local run, and remote-worker wiring

Date: 2026-06-08 Status: Approved design, ready for implementation plan Scope of this iteration: Author a HuggingFace worker Docker image (layered on the existing nodetool-core image), provide local build/run ergonomics, add a minimal authentication handshake to the worker, and validate the full UI → TS server → remote worker path locally (CPU-only). Cloud/GPU deployment is out of scope here and is captured as recorded constraints for a follow-up iteration.


1. Background and problem

NodeTool runs Python nodes through a worker process. Today the TypeScript server most commonly spawns a local worker over a stdio bridge (PythonStdioBridge, length-prefixed msgpack over stdin/stdout).

There is a second, already-built topology: the worker can run as a long-lived WebSocket server, and the TS server attaches to it remotely. The pieces that already exist:

What is missing, and what this iteration delivers:

  1. A HuggingFace worker image. HF nodes need a large ML stack (torch 2.9, diffusers, transformers, accelerate, bitsandbytes, sam2, …) and, in production, a GPU. No HF Dockerfile exists yet.
  2. Local run ergonomics to build the two layered images, run the worker, point the server at it, and validate a graph end-to-end.
  3. Authentication on the worker. The WebSocket worker currently has no auth: anyone who can reach the port can execute arbitrary nodes (remote code execution on the GPU). A minimal token handshake is required before the image can ever be exposed beyond localhost.

This is a packaging + wiring + auth task. It is not a protocol change — the msgpack WebSocket protocol and the bridge already exist.


2. Goals / non-goals

Goals

Non-goals (this iteration)


3. Decisions (locked during brainstorming)

Decision Choice
Primary first target Local Docker (CPU-only validation on macOS)
What the iteration is about The HF worker image is the goal; core is the stepping stone
HF image base / CUDA strategy Layer on the core image + PyPI CUDA wheels (torch wheel bundles CUDA; host needs only the NVIDIA driver + container runtime)
“Done” boundary Dockerfile + local run + validation (no packages/deploy changes)
Package source Released PyPI packages (version via build-arg)
Worker auth Build NODETOOL_WORKER_TOKEN handshake now (worker + bridge)
Build wrappers Makefile in nodetool-huggingface
Validation Committed smoke DSL (SentenceSimilarity / all-MiniLM-L6-v2)

4. Architecture

4.1 Image layering

mambaorg/micromamba:jammy
        │  (nodetool-core/Dockerfile — exists)
        ▼
nodetool-core:local            FROM base; uv pip install nodetool-core==<NODETOOL_VERSION>
   EXPOSE 7777 / HEALTHCHECK / CMD ["python","-m","nodetool.worker","--host","0.0.0.0","--port","7777"]
        │  (nodetool-huggingface/Dockerfile — new)
        ▼
nodetool-hf:local              FROM ${CORE_IMAGE}; uv pip install nodetool-huggingface==<HF_VERSION>
   (EXPOSE / HEALTHCHECK / CMD inherited from core; HF has no worker module of its own)

The HF image adds only Python packages on top of core. python -m nodetool.worker comes from the inherited nodetool-core dependency, so the HF image needs no new CMD.

4.2 Runtime topology (local)

web UI ──ws──► TS server (host :7777) ──► createPythonBridge()
                                              │  NODETOOL_WORKER_URL set?
                                              ▼  yes → WebsocketPythonBridge
                                          ws://localhost:8787  ──►  hf-worker container (:7777 internal)
                                          Authorization: Bearer <NODETOOL_WORKER_TOKEN>

The TS dev server already binds host 7777, so the worker container must publish on a different host port. This design uses 8787NODETOOL_WORKER_URL=ws://localhost:8787.


5. Artifacts

All new files live in the nodetool-huggingface sibling repo, except the bridge change (nodetool2) and the worker auth change (nodetool-core).

5.1 nodetool-huggingface/Dockerfile (new)

# Layer the HuggingFace node stack on top of the core worker image.
# CORE_IMAGE: locally-built `nodetool-core:local`, or a published
#   ghcr.io/nodetool-ai/nodetool:<tag>. HF_VERSION pins the PyPI release.
ARG CORE_IMAGE=nodetool-core:local
FROM ${CORE_IMAGE}

ARG HF_VERSION=0.7.1
USER root

# torch 2.9 + the rest pull CUDA-enabled wheels with a bundled CUDA runtime;
# no nvidia/cuda base needed. The host supplies the NVIDIA driver at run time.
RUN uv pip install --python $VIRTUAL_ENV --index-url https://pypi.org/simple \
        "nodetool-huggingface==${HF_VERSION}" \
    && rm -rf /root/.cache/uv /root/.cache/pip /tmp/* /var/tmp/*

# EXPOSE 7777, HEALTHCHECK, CMD all inherited from the core image.

Notes:

5.2 nodetool-huggingface/docker-compose.yaml (new)

5.3 nodetool-huggingface/Makefile (new)

Target Action
build-core docker build -t nodetool-core:local ../nodetool-core (or document pulling ghcr.io/nodetool-ai/nodetool:<tag>)
build-hf docker build -t nodetool-hf:local --build-arg CORE_IMAGE=nodetool-core:local .
up docker compose up (CPU) — runs the worker on localhost:8787
down docker compose down

5.4 nodetool-huggingface/docs/worker-deployment.md (new)

Build → run → wire → validate steps; the cloud-constraints section (§8); the CPU-only-on-macOS and host-port-8787 caveats; the NODETOOL_WORKER_TOKEN setup.

5.5 nodetool-huggingface/examples/hf-worker-smoke.{ts,json} (new)

A minimal workflow using the HuggingFace SentenceSimilarity node with sentence-transformers/all-MiniLM-L6-v2 (~80 MB, CPU-fast) — a feature-extraction node that returns an np_array embedding per input string. Committed for repeatable validation: the .ts DSL form documents the graph; the exported .json form is the artifact loaded into the running TS server (UI import or the server’s run API). Fallback node: FillMask / distilbert-base-uncased.

Wiring note (updated): originally the remote WebsocketPythonBridge was created only by the websocket server, so the CLI/DSL local path couldn’t run Python nodes. That gap is now closed — connectPythonBridgeForGraph / resolvePythonNodeExecutor in @nodetool-ai/runtime wire the bridge into the in-process runners, so both nodetool workflows run <graph.json> and nodetool run <file.ts> execute Python (incl. HuggingFace worker) nodes: remote when NODETOOL_WORKER_URL is set, else a local stdio worker. The validation graph can therefore be driven by either a running server (web UI) or the CLI directly.


6. Authentication (NODETOOL_WORKER_TOKEN)

A shared-secret bearer token, opt-in, identical env name on both ends.

6.1 Worker side — nodetool-core

6.2 Client side — nodetool2 python-websocket-bridge.ts

6.3 Why a header (not first-message or subprotocol)


7. Validation (local, CPU-only)

Performed end-to-end on macOS Docker (CPU):

  1. Image buildmake build-core then make build-hf succeed.
  2. Image healthmake up; the container’s ws-handshake healthcheck reports healthy.
  3. Auth gate
  4. Bridge attach — start the TS server with NODETOOL_WORKER_URL=ws://localhost:8787 NODETOOL_WORKER_TOKEN=secret npm run dev:server; server logs a successful discover + worker.status; HF node metadata is present.
  5. Graph run — execute the smoke workflow either via the CLI (NODETOOL_WORKER_URL=ws://localhost:8787 NODETOOL_WORKER_TOKEN=secret nodetool workflows run examples/hf-worker-smoke.json) or by loading it into the running server (web UI). The HF node runs on the container and returns an embedding (np_array)SentenceSimilarity is a feature-extraction node (one embedding per input string), not a cross-string scorer, so the smoke graph previews the embedding(s). First run downloads the model into the hf-cache volume; subsequent runs reuse it.

CUDA-only nodes (most diffusers/3D) are expected to fail locally and are deferred to the GPU iteration.


8. Cloud targets — recorded constraints (NEXT iteration, not built here)

Validated against current (2026) RunPod and Vast.ai docs. The image design holds on both (PyPI CUDA wheels + host driver; CMD runs on boot). The deltas below are exposure/ops concerns that the deploy iteration must satisfy.

8.1 Transport: direct TCP, never an HTTP proxy

8.2 Dynamic endpoint resolution

8.3 Security

8.4 GPU / host selection, persistence, cost


9. Risks

Risk Mitigation
HF image is very large; slow first build Expected; layer on core for cache reuse; document build time.
macOS Docker can’t exercise GPU paths Validate CPU-capable nodes only; GPU validation deferred to the cloud iteration.
First-run model download latency Healthcheck start-period; hf-cache volume persists models across restarts.
Token mistakenly left unset in a cloud deploy Document that unset = open; the cloud iteration must require it (and a tunnel/TLS).
Bridge reconnects to a stale URL after instance change Known gap; re-resolution is deploy-iteration work, recorded in §8.2.
nodetool run couldn’t route to the worker (resolved) The CLI/DSL runners now wire the Python bridge (connectPythonBridgeForGraph), so nodetool workflows run / nodetool run execute Python nodes against a remote (or local stdio) worker — §5.5.

10. Out of scope / follow-up (the “cloud/GPU iteration”)