Skill nodes (ImageSkill, MediaSkill, BrowserSkill, etc.) in the TS backend do a single chat completion call and return text. The Python implementation runs a full agent loop — the LLM can call tools (execute_bash, set_output_image, etc.), see results, and iterate until it produces output files. Without tools, the LLM can only describe what it would do; it cannot actually process images, run ffmpeg, or produce files.
All 19 skill nodes run an agent loop with tool calling, matching Python behavior. The agent loop is extracted from the existing AgentNode.genProcess() in agents.ts and reused by SkillNode.process().
SkillNode.process(inputs, context)
├─ resolve provider + model from inputs
├─ collect assets (image, audio, video, document)
├─ call this.getTools(workspaceDir) → ToolLike[]
├─ build user message with prompt + multimodal assets
└─ runAgentLoop({context, providerId, modelId, systemPrompt, prompt, tools, contentParts})
├─ resolve provider via context.getProvider()
├─ loop:
│ ├─ stream provider response (chunks + tool calls)
│ ├─ execute each tool call → append result to messages
│ └─ continue if tools were called, else break
└─ return { text, messages }
├─ read output sinks → load files as asset refs
└─ return { text, image?, audio?, video?, document? }
The AgentNode.genProcess() at agents.ts:1772-2051 already implements the complete agent loop:
streamProviderMessages()isChunkItem() / isToolCallItem()Rather than duplicating this or adding a dependency on @nodetool/agents, we extract the loop logic from AgentNode.genProcess() into a standalone runAgentLoop() function in the same agents.ts file. Both AgentNode and SkillNode call it.
runAgentLoop() — Extracted from AgentNodeLocation: packages/base-nodes/src/nodes/agents.ts
interface AgentLoopOptions {
context: ProcessingContext;
providerId: string;
modelId: string;
systemPrompt: string;
prompt: string;
tools: ToolLike[];
contentParts?: MessageContentPart[]; // pre-built multimodal parts (images, audio, video, documents)
maxTokens?: number; // default 4096
maxIterations?: number; // default 10
}
interface AgentLoopResult {
text: string; // accumulated assistant text
messages: Message[]; // full message history
}
async function runAgentLoop(options: AgentLoopOptions): Promise<AgentLoopResult>
Behavior:
context.getProvider(providerId)[system, user(prompt + contentParts)] — the contentParts array can contain images, audio, video, and document parts built by the caller via buildAssetContentParts()toProviderTools()maxIterations, default 10 — sufficient for focused skill tasks; individual skills can override):
{ text, messages }This is the same logic as AgentNode.genProcess(), minus streaming yields and thread persistence. AgentNode.genProcess() keeps its inline loop for streaming support.
Exported: Yes, so skills.ts can import it.
SkillNode.process() — UpdatedLocation: packages/base-nodes/src/nodes/skills.ts
class SkillNode extends BaseNode {
/** Override in subclasses to provide skill-specific tools. */
getTools(workspaceDir: string): ToolLike[] {
return [makeExecuteBashTool(workspaceDir)];
}
async process(inputs, context): Promise<Record<string, unknown>> {
// 1. Resolve provider/model
// 2. Determine workspace dir
// 3. Build tools via this.getTools(workspaceDir)
// 4. Collect multimodal assets from inputs + this
// 5. Call runAgentLoop({...})
// 6. Read output sinks from tools
// 7. Load output files as asset refs
// 8. Return { text, image?, audio?, video?, document? }
}
}
Skill tools conform to the existing ToolLike type from agents.ts, which has optional process and toProviderTool fields. Skill tool factories always set process (required for execution) and name/description/inputSchema (used by toProviderTools() to convert to provider format). The toProviderTool method is not needed since the default conversion in toProviderTools() handles plain objects.
makeExecuteBashTool(workspaceDir)Runs a shell command in the workspace directory. Returns { success, stdout, stderr }.
{
name: "execute_bash",
description: "Execute a bash command in the workspace directory.",
inputSchema: {
type: "object",
properties: {
command: { type: "string", description: "Bash command to execute" }
},
required: ["command"]
},
process: async (context, params) => {
const { execFile } = await import("node:child_process");
// Execute in workspaceDir, capture stdout/stderr, timeout
return { success, stdout, stderr };
}
}
makeSetOutputTool(name, outputSink, workspaceDir)Factory for set_output_image, set_output_audio, set_output_video, set_output_document.
function makeSetOutputTool(
toolName: string, // e.g. "set_output_image"
description: string,
outputSink: string[], // mutable array, tool pushes path here
workspaceDir: string
): ToolLike
The tool validates the path exists in workspace, then pushes it to outputSink. After the loop, SkillNode.process() reads outputSink[0], loads the file, and returns it as the corresponding asset ref.
Each skill overrides getTools():
| Skill | Tools | Output Sinks |
|---|---|---|
| ShellAgentSkill | execute_bash |
— |
| ImageSkill | execute_bash, set_output_image |
image |
| MediaSkill | execute_bash, set_output_audio, set_output_video |
audio, video |
| FfmpegSkill | execute_bash, set_output_audio, set_output_video |
audio, video |
| FilesystemSkill | execute_bash |
— |
| BrowserSkill | execute_bash |
— |
| DocumentSkill | execute_bash, set_output_document |
document |
| DocxSkill | execute_bash, set_output_document |
document |
| PdfLibSkill | execute_bash, set_output_document |
document |
| PptxSkill | execute_bash, set_output_document |
document |
| SpreadsheetSkill | execute_bash, set_output_document |
document |
| HtmlSkill | execute_bash, set_output_document |
document |
| HttpApiSkill | execute_bash |
— |
| GitSkill | execute_bash |
— |
| EmailSkill | execute_bash |
— |
| SQLiteSkill | execute_bash |
— |
| SupabaseSkill | execute_bash |
— |
| VectorStoreSkill | execute_bash |
— |
| YtDlpDownloaderSkill | execute_bash, set_output_video |
video |
Skills need a workspace for file operations. The workspace path comes from:
context.workspaceDir if availableos.tmpdir() + '/nodetool-skill-' + jobIdCreated automatically if it doesn’t exist. Cleaned up by the workflow runner after job completion (not the skill’s responsibility). For the temp directory fallback, cleanup happens when the OS clears temp files.
After the agent loop completes, SkillNode.process() checks each output sink:
if (imageSink.length > 0) {
const absPath = path.resolve(workspaceDir, imageSink[0]);
const bytes = await readFile(absPath);
result.image = { type: "image", data: Buffer.from(bytes).toString("base64"), uri: pathToFileURL(absPath).toString() };
}
Same pattern for audio, video, document.
AgentNode.genProcess() is refactored to use shared helpers but keeps its streaming behavior. The agent loop helpers become module-level functions:
buildUserMessage() — already existstoProviderTools() — already existsserializeToolResult() — already existsisChunkItem() / isToolCallItem() — already existrunAgentLoop() — new, extracted from genProcess loopAgentNode.genProcess() can either call runAgentLoop() (losing streaming) or continue using the inline loop for streaming support. The pragmatic choice: keep AgentNode.genProcess() as-is (it works and streams), and have runAgentLoop() be a non-streaming version of the same logic for skill nodes.
runAgentLoop() requires a registered provider via context.getProvider(providerId). The existing callChatCompletionDirect() HTTP fallback in skills.ts is removed — it was a workaround for the lack of tool support and is no longer needed. All providers must be registered in the context. If provider resolution fails, the error propagates to the user with a clear message.
Output sinks (mutable string[] arrays) are created by the caller (SkillNode.process()) and passed into tool factories. Tools mutate the sinks during the agent loop. After the loop returns, the caller reads sink[0] to load output files. Sinks are not part of AgentLoopResult — the loop is agnostic to output semantics.
The following functions in skills.ts are removed after migration:
callChatCompletion() — replaced by runAgentLoop()callChatCompletionDirect() — HTTP fallback no longer neededtoOpenAIContent() / toAnthropicContent() — provider-specific formatting handled by provider layerThe asset collection helpers (collectAssets, buildAssetContentParts, getAssetBytes, etc.) are kept — they build the contentParts passed to runAgentLoop().
execute_bashThe execute_bash tool runs arbitrary shell commands provided by the LLM. This matches the Python implementation which also runs unsandboxed bash. Known risks:
Mitigations for v1:
This is acceptable for the current use case (local desktop application with user-initiated workflows).
setTimeout wrapper around the loop, configurable via timeout_seconds propertyTests follow the established createMockProvider(responseSequence) pattern from agents.test.ts.
runAgentLoop()describe("runAgentLoop", () => {
it("returns text from single LLM call (no tools)", async () => {
// Provider returns text chunks, no tool calls
// Verify: result.text contains accumulated text
});
it("executes tool and loops for second LLM call", async () => {
// Provider call 1: returns tool call for execute_bash
// Provider call 2: returns final text
// Verify: tool was called, result.text is from second call
});
it("handles multiple tool calls in one iteration", async () => {
// Provider returns two tool calls
// Both executed, results appended, loop continues
});
it("stops after maxIterations", async () => {
// Provider always returns tool calls
// Verify: loop exits after maxIterations
});
it("throws when provider resolution fails", async () => {
// context.getProvider throws
// Verify: error propagates
});
});
describe("makeExecuteBashTool", () => {
it("executes command and returns stdout", async () => {
const tool = makeExecuteBashTool("/tmp/test-workspace");
const result = await tool.process({}, { command: "echo hello" });
expect(result).toMatchObject({ success: true, stdout: "hello\n" });
});
it("returns error for failing command", async () => {
const tool = makeExecuteBashTool("/tmp/test-workspace");
const result = await tool.process({}, { command: "false" });
expect(result).toMatchObject({ success: false });
});
});
describe("makeSetOutputTool", () => {
it("records path in output sink", async () => {
const sink: string[] = [];
const tool = makeSetOutputTool("set_output_image", "Set output image", sink, "/tmp/ws");
// Create a test file
await writeFile("/tmp/ws/out.png", Buffer.from("fake-png"));
const result = await tool.process({}, { path: "out.png" });
expect(result).toMatchObject({ success: true });
expect(sink).toEqual(["out.png"]);
});
it("rejects path outside workspace", async () => {
const sink: string[] = [];
const tool = makeSetOutputTool("set_output_image", "Set output image", sink, "/tmp/ws");
const result = await tool.process({}, { path: "../../etc/passwd" });
expect(result).toMatchObject({ success: false });
});
});
describe("SkillNode agent loop integration", () => {
it("ImageSkill runs agent loop and produces image output", async () => {
// Mock provider: call 1 returns execute_bash tool call (creates image)
// call 2 returns set_output_image tool call
// call 3 returns final text
// Verify: result has { text, image } with image data
});
it("ShellAgentSkill runs bash and returns text", async () => {
// Mock provider: call 1 returns execute_bash tool call
// call 2 returns final text
// Verify: result.text contains response
});
it("skill with no tools still returns text from LLM", async () => {
// Provider returns text directly, no tool calls
// Verify: result.text is set
});
});
| File | Change |
|---|---|
packages/base-nodes/src/nodes/agents.ts |
Extract runAgentLoop(), export it + helper functions |
packages/base-nodes/src/nodes/skills.ts |
Replace callChatCompletion with runAgentLoop(), add getTools(), add skill tools, add output sink loading |
packages/base-nodes/tests/skills.test.ts |
Add agent loop + tool tests |
packages/base-nodes/tests/agent-loop.test.ts |
New: unit tests for runAgentLoop() |
@nodetool/agents and could be imported later. For now, BrowserSkill gets execute_bash only..nodetool/agent-runs/. TS skips this; assets are passed directly via multimodal messages and output sinks.process(), not genProcess(). The agent loop runs to completion.