Execution Strategies

Overview

NodeTool supports multiple execution strategies through the JobExecutionManager:

Threaded — lowest overhead; runs inside the API process for fast dev feedback.
Subprocess — isolates Python state per job while staying on the host.
Docker — strongest isolation with configurable CPU/GPU/memory limits.

Use ENV=test to automatically select in-memory storage and test-ready defaults.

Use for strongest isolation, GPU scheduling, and clean dependency boundaries.
Resource limits: set DOCKER_CPU_LIMIT, DOCKER_MEMORY_LIMIT, DOCKER_SHM_SIZE, and DOCKER_PIDS_LIMIT for each job.
Security: run with non-root users when possible (DOCKER_USER), drop NET_ADMIN, mount only required volumes, and avoid host networking for untrusted jobs.
Images: use minimal base images and pin tags; ComfyUI images should include compatible CUDA/cuDNN versions.
Pull policy: cache images locally for speed; tag per workflow to avoid drift.

Threaded/Subprocess: control concurrency with MAX_WORKFLOW_THREADS and per-job timeout values.
Docker: combine CPU/memory/pid/shm limits plus GPU device lists; prefer --gpus '"device=0"'-style constraints for predictable scheduling.
Storage: keep asset buckets private or signed; clean temp dirs after each job.

Require auth (AUTH_PROVIDER=static or supabase) in shared environments.
Disable terminal WebSocket in production (NODETOOL_ENABLE_TERMINAL_WS unset).
Ensure proxy TLS termination for all non-public endpoints.
Rotate server and proxy tokens via your secrets manager.
For Docker, never mount the host Docker socket into jobs; prefer a thin supervisor.

Threaded/Subprocess: pytest -q tests/workflows/test_threaded_job_execution.py and tests/workflows/test_subprocess_job_execution.py.
Docker: pytest -q tests/workflows/test_docker_job_execution.py and tests/workflows/test_docker_runners_e2e.py (requires Docker and optional GPU).
Use pytest --cov=src for coverage; add workflow fixtures in tests/conftest.py.

Streaming stalls: check WebSocket/SSE logs and ensure stream=true or WebSocket connection is open.
Resource exhaustion: raise per-job timeouts and lower concurrency; confirm Docker limits match hardware.
GPU issues: verify drivers and container runtime, and match CUDA versions between host and image.
Cleanup: ensure temp paths and volume mounts are pruned between runs.

The refactor consolidated job orchestration into JobExecutionManager with pluggable runners.
Execution runners emit typed events consumed by messaging layers for WebSocket and SSE streaming.
Future improvements: unified tracing, per-node resource hints, and sandbox profiles for untrusted workloads.