Overview
NodeTool supports multiple execution strategies through the JobExecutionManager:
- Threaded — lowest overhead; runs inside the API process for fast dev feedback.
- Subprocess — isolates Python state per job while staying on the host.
- Docker — strongest isolation with configurable CPU/GPU/memory limits.
Use ENV=test to automatically select in-memory storage and test-ready defaults.
Threaded Execution
- Enabled by default for local development.
- Shares process memory; best for lightweight notebook-style workflows.
- Configure max concurrency via
THREAD_POOL_SIZE(fallback to CPU count). - Avoid blocking calls in hot paths; prefer async nodes or background workers.
Subprocess Execution
- Starts a Python child process per job to isolate imports and state.
- Good for heavier CPU tasks without Docker overhead.
- Communicates progress via pipes; streaming is supported.
- Honor graceful shutdown by handling SIGTERM in long-running nodes.
Docker Execution
- Use for strongest isolation, GPU scheduling, and clean dependency boundaries.
- Resource limits: set
DOCKER_CPU_LIMIT,DOCKER_MEMORY_LIMIT,DOCKER_SHM_SIZE, andDOCKER_PIDS_LIMITfor each job. - Security: run with non-root users when possible (
DOCKER_USER), drop NET_ADMIN, mount only required volumes, and avoid host networking for untrusted jobs. - Images: use minimal base images and pin tags; ComfyUI images should include compatible CUDA/cuDNN versions.
- Pull policy: cache images locally for speed; tag per workflow to avoid drift.
Resource Limits
- Threaded/Subprocess: control concurrency with
MAX_WORKFLOW_THREADSand per-jobtimeoutvalues. - Docker: combine CPU/memory/pid/shm limits plus GPU device lists; prefer
--gpus '"device=0"'-style constraints for predictable scheduling. - Storage: keep asset buckets private or signed; clean temp dirs after each job.
Security
- Require auth (
AUTH_PROVIDER=staticorsupabase) in shared environments. - Disable terminal WebSocket in production (
NODETOOL_ENABLE_TERMINAL_WSunset). - Ensure proxy TLS termination for all non-public endpoints.
- Rotate worker and proxy tokens via your secrets manager.
- For Docker, never mount the host Docker socket into jobs; prefer a thin supervisor.
Testing
- Threaded/Subprocess:
pytest -q tests/workflows/test_threaded_job_execution.pyandtests/workflows/test_subprocess_job_execution.py. - Docker:
pytest -q tests/workflows/test_docker_job_execution.pyandtests/workflows/test_docker_runners_e2e.py(requires Docker and optional GPU). - Use
pytest --cov=srcfor coverage; add workflow fixtures intests/conftest.py.
Troubleshooting
- Streaming stalls: check WebSocket/SSE logs and ensure
stream=trueor WebSocket connection is open. - Resource exhaustion: raise per-job timeouts and lower concurrency; confirm Docker limits match hardware.
- GPU issues: verify drivers and container runtime, and match CUDA versions between host and image.
- Cleanup: ensure temp paths and volume mounts are pruned between runs.
Design Notes (previous refactoring memo)
- The refactor consolidated job orchestration into
JobExecutionManagerwith pluggable runners. - Execution runners emit typed events consumed by messaging layers for WebSocket and SSE streaming.
- Future improvements: unified tracing, per-node resource hints, and sandbox profiles for untrusted workloads.