Use these guidelines to keep multi-tenant or long-running NodeTool deployments stable. All examples reuse the mem_limit and cpus fields supported by the proxy and self-hosted deployment schemas.
Baseline Settings
- Set memory and CPU caps for every service to avoid noisy-neighbor issues. Start with
mem_limit: 8gandcpus: 4for GPU hosts, then tune fromdocker stats. - Keep Hugging Face caches mounted read-only; mount workspaces read/write only where necessary.
- Prefer
idle_timeouton the proxy to stop idle services instead of leaving them running indefinitely. - Keep
/tmpand workspace volumes on fast disks; avoid sharing/tmpacross unrelated services.
Proxy-Managed Services
Example excerpt in proxy.yaml (rendered by nodetool deploy apply):
services:
- name: nodetool-worker
path: /
image: nodetool:latest
mem_limit: 8g
cpus: 4
environment:
PORT: "8000"
HF_HOME: /hf-cache
volumes:
/data/nodetool/workspace:
bind: /workspace
mode: rw
/data/hf-cache:
bind: /hf-cache
mode: ro
- Increase
mem_limitfor heavy diffusion workloads; lower it for CPU-only chat nodes. - Use read-only mounts (
mode: ro) for shared caches to prevent accidental mutation. - When using
connect_mode: host_port, ensure published ports do not conflict with other services on the host.
Local Docker Runs
For standalone containers or quick tests:
docker run --gpus all \
--memory 8g --cpus 4 \
-v /data/nodetool/workspace:/workspace \
-v /data/hf-cache:/hf-cache:ro \
-e HF_HOME=/hf-cache \
-p 8000:8000 \
nodetool:latest
- Use
--memory-reservationto set soft limits when co-locating multiple workers. - Keep per-run tmp data on the workspace volume (e.g.,
TMPDIR=/workspace/tmp) so it persists through restarts.
Monitoring and Tuning
- Watch
docker statsornodetool proxy-status --followto catch containers hitting limits or restarting. - If runs are OOM-killed, lower batch sizes in your workflows or increase
mem_limitincrementally. - For storage-heavy jobs, pair limits with the guidance in Storage to ensure caches and outputs have enough space.