ID: P-2026-05-20-shared-shader-pool-phase-0-compositor
State: done (PR #3228, branch claude/unify-compositor)
Tags: shader-pool, webgpu, compositor, blend-modes, phase-0
The compositor-unification work landed a cross-runtime GPU package this plan now builds on. It was hand-rolled (not TypeGPU) and scoped to blend/compositing, but it already establishes the package home, the cross-runtime split, the canonical blend catalog, and a shared multi-layer compositing engine. Phase 1 grows it into the typed shader pool rather than starting from web/src/lib/gpu/.
Shipped in packages/compositor/ (renamed to packages/gpu/ in Phase 1):
blendModes.ts — canonical BlendMode union + BLEND_MODE_TUPLE, stable gpuIds, and Canvas2D / Sharp-libvips mappings. Single source of truth, consumed by the sketch editor, the timeline preview, the CompositorNode (server-side Sharp via blendModeToSharpBlend), and packages/protocol’s timeline Zod schema (BLEND_MODE_TUPLE).wgsl.ts — WGSL_BLEND_FUNCTIONS: the shared applyBlendMode switch keyed on gpuId, injected into both sketch and timeline fragment shaders.webgpu/compositor.ts — WebGPULayerCompositor: owns the blend + blit pipelines, ping-pong accumulation textures, sampler, and a per-frame uniform-buffer ring. Drives both sketch (nearest filtering, pixel-exact paint) and timeline (linear + rounded-rect mask). This is the “composite layer” primitive the original plan slotted for Phase 4 Batch 4 — it shipped early as shared host-runtime, which is why the thesis boundary below now puts layer compositing on the shared side.webgpu/shaders.ts — fullscreen-quad vertex + blend/blit fragment shaders.Cross-runtime split already in place. The package root export is pure (catalog + WGSL strings, no WebGPU runtime), Node-importable by base-nodes; the ./webgpu export is browser/WebGPU-only. This is exactly the boundary Phase 4’s Node.js/Dawn goal needs, so that becomes a context-adapter detail rather than a port.
Two deltas from the original plan, resolved in Phase 1:
web/src/lib/gpu/. → rename packages/compositor/ → packages/gpu/, fold the blend catalog under blend/ and the layer engine under compositor/.@webgpu/types + manual uniform packing). → still adopt TypeGPU per the original plan; WebGPULayerCompositor is retrofitted onto TypeGPU schemas/layouts in Phase 1.ID: P-2026-05-12-shared-shader-pool-phase-1
State: in progress
Tags: shader-pool, webgpu, typegpu, phase-1, sketch, timeline
Repo: default (packages/gpu/)
Depends on: Phase 0.
Progress (branch claude/typegpu-gpu-shaders-XvH3O): TypeGPU adopted as a hard dependency. Shipped: the blend/ + compositor/ restructure; the TypeGPU-backed pool primitives (module.ts defineModule/ShaderModule, texture.ts LabeledTexture, context.ts GPUContext + bucketed scratch + device adapters, registry.ts, executor.ts fragment arm, fullscreenQuad.ts); the _canary.passthrough@1 module + ALL_SHADERS barrel; the WebGPULayerCompositor retrofit onto a TypeGPU d.struct uniform schema + typed bind group layouts (byte-identity locked by a unit test). Tests: registry/module/canary unit suites + a GPU smoke test that runs the canary against a 4×4 texture when a device is available (skips otherwise). One deviation from the plan: the pool primitives live behind a dedicated ./pool export rather than folded into the package root. The root stays pure (blend catalog only, no TypeGPU import) so Node consumers (base-nodes) keep their zero-GPU import path, and so web’s Jest — which mocks ESM deps rather than transforming them — only needs a TypeGPU mock on the ./webgpu (compositor) path, not on every blend-catalog importer. Remaining: confirm the full repo typecheck/lint/test matrix in CI (web/electron deps weren’t fully buildable in the authoring sandbox).
Phase 1 of the Shared Shader Pool migration. Renames packages/compositor/ → packages/gpu/ and grows it into the shared home for WGSL source bundled with its typed bind group layout, typed uniform schema, sampler config, and I/O contract. The blend catalog, shared WGSL, and WebGPULayerCompositor from Phase 0 move under it (blend/, compositor/). No migration of timeline/sketch effect code yet — Phase 1 ships the primitives (TypeGPU-backed ShaderModule/Executor/registry), retrofits WebGPULayerCompositor onto the typed layout, and proves the pattern with one end-to-end canary.
Thesis: Share things that have one correct answer regardless of consumer. Don’t share things whose correctness depends on the consumer.
WebGPULayerCompositor, shared as of Phase 0).Goal: Establish the ShaderModule artifact, LabeledTexture wrapper, GPUContext adapter, and registry — all TypeGPU-backed — so later phases never re-derive uniform layouts or bind group layouts by hand.
Direction: Shared pool is a GPU operation catalog with a tiny runtime, not a universal renderer. The runtime covers single-pass texture-in/texture-out (and later multi-pass recipes); scheduling, lifetime, and presentation stay with each host.
ShaderModule, not a .wgsl file. A module bundles WGSL + a TypeGPU bind group layout + sampler descriptor(s) + a TypeGPU d.struct uniform schema + a declared I/O contract. There is no way to import “just the WGSL.”LabeledTexture carrying colorSpace, alpha, format, dimensions, bindingKind. The Executor validates inputs against the module’s declared I/O at bind time; mismatches fail loud or route through a registered conversion.GPUContext adapter holding a TgpuRoot obtained from tgpu.initFromDevice({ device }). The same module code runs against browser navigator.gpu, Electron, and Node.js Dawn (webgpu npm package) devices.surface: "internal" (free to churn) vs surface: "published" (frozen, additive-only, what workflow nodes are allowed to see in Phase 4). Promotion is a deliberate review, never a side effect of writing a shader.(id, version). Saved workflows pin a version; renames/splits ship as new versions, never in-place.source — required main texture input. The string "source" is reserved across IOContract, recipes (in: { foo: "source" }), and dimension propagation ("same-as:source").mask — optional mask texture input. Modules in these categories should expose a mask slot by default. Region-constrained effects (“blur only the background”, “desaturate everything except this layer”, “vignette inside a clipping shape”) are how filters and color ops are actually used in real workflows; a catalog of 50 modules with no mask slot is a catalog that needs retrofitting. Treat omitting mask as the deliberate exception (e.g. color.invert on a thumbnail), not the default.When a module exposes a mask slot:
mask.a by default; modules may declare mask.r for single-channel masks.output = mix(source, processed, coverage * effectAmount).mask returns 1.0 everywhere via a 1×1 white texture the host supplies. Modules write the WGSL as if the mask is always present.Source modules (io.inputs = {}) and recipes that internally generate textures are exempt. Multi-input modules (composite, channel-merge) declare their own input names; the mask slot, if used, keeps the same semantics.
*-srgb formats. Alpha is premultiplied between modules. LabeledTexture.meta records storage encoding; modules declare per-input colorSpace and alpha (what they expect); mismatches insert a registered conversion (under color/convert/*, auto-inserted by the Executor) or fail loud.[0,1], top-left origin, explicit output dimensions.Executor.encode/RecipeRunner.encode encode GPU work only — they never call copyTextureToBuffer, mapAsync, or any path that pulls pixels to CPU as part of a normal FX path. ImageRef materialization remains a deliberate host-side step at Phase 4 workflow boundaries (decode on the way in, encode/readback on the way out). That invariant is required for realtime: the display path stays GPU-resident at frame rate; CPU readback, when needed, happens only at explicit boundaries (e.g. sampling one frame into a workflow, or feeding an inference node at inference rate — both owned by the realtime host layer, see Phase 5).ALL_SHADERS barrel (not import.meta.glob). Node-bundling-safe, grep-able, tree-shakeable.A future cache key is (module.id, module.version, params_hash, inputs_content_hash, output_spec_hash). This forces three constraints on every module today, even though no cache exists yet:
LabeledTexture exposes a content version counter, incremented on every write. That is the inputs_content_hash source; full pixel hashing is not required.The cache itself lands when Phase 3’s timeline preview needs it. Specifying the key shape now is what makes the eventual cache cheap.
ctx.scratch allocates by bucket: (format, usage, ceilToBucket(width), ceilToBucket(height)) where bucket dims round up to the next multiple of 64 (or next power of 2 above 256). The caller sees only the requested viewport; the underlying texture may be larger. Phase 1 ships the canary using a single scratch texture; the bucketing rule is what Phase 3’s RecipeRunner will lean on so timelines with many clip resolutions (1920×1080, 1920×800 letterboxed, 1280×720, 3840×2160, plus thumbnails) reuse rather than fragment.
Adopted in Phase 1. Reasons:
d.struct) per module is the source of truth for both the WGSL *Uniforms struct and the host-side packed buffer. As modules grow (vec3 padding, struct arrays, mixed scalar/vec fields) hand-rolled Float32Array/DataView math gets fragile; today’s timeline code is straightforward typed-array writes, so this is preventative, not corrective.root.unwrap(layout) returns the GPUBindGroupLayout that the WGSL was authored against.tgpu.initFromDevice({ device }) accepts any GPUDevice — browser, Dawn, Electron — so the cross-runtime story is inherited rather than designed.tgpu.resolve(...) connects it to the typed layout. The TS-to-WGSL compiler ('use gpu') is not adopted.Starting point. Phase 0 shipped
packages/compositor/(canonical blend catalog, shared WGSL,WebGPULayerCompositor, cross-runtime entry split) hand-rolled against@webgpu/types. Phase 1 renames it topackages/gpu/and layers the TypeGPU-backed contracts on top. Existing consumers (web,packages/timeline,packages/base-nodes,packages/protocol) import the package; the rename is a coordinated@nodetool-ai/compositor→@nodetool-ai/gpuupdate plus the Jest/Vitest module-mapper entries.
packages/compositor/ → packages/gpu/ (@nodetool-ai/gpu); move existing files to src/blend/ (blendModes.ts, wgsl.ts) and src/compositor/ (WebGPULayerCompositor, shaders). Keep the dual entry points: pure root (./ — catalog + WGSL, Node-safe) and ./webgpu (engine). Update all consumers and test module mappers.typegpu dependency to packages/gpu/package.json.packages/gpu/src/:
module.ts — defineModule() builder + ShaderModule<P> typetexture.ts — LabeledTexture wrapper + creation helperscontext.ts — GPUContext interface + default browser adapter using navigator.gpuregistry.ts — register(), get({ id, version, surface }), list({ surface, category })index.ts — public barrelWebGPULayerCompositor onto TypeGPU. Replace its hand-packed 4×vec4f uniform Float32Array and hand-built bind group layout with a TypeGPU d.struct schema + typed layout (root.unwrap(layout) returns the GPUBindGroupLayout the WGSL is authored against). Behavior, the gpuId-keyed blend switch, and the per-frame uniform-buffer ring are preserved; this is the first real proof the typed contracts hold for a non-trivial existing shader, not just the canary.src/shaders/_canary/passthrough/v1/module.ts that copies an input texture to an output texture. Proves: TypeGPU schema → bind group → WGSL resolution → executor → labeled output, end-to-end, in one file.GPUDevice (or skips if unavailable), run the canary against a 4×4 texture and assert pixel equality. Keep the existing Phase 0 tests (blend catalog, inverse-affine) green through the rename.ID: P-2026-05-13-shared-shader-pool-phase-2-compute-effects
State: draft
Tags: gpu, shaders, timeline, typegpu, executor, phase-2
Repo: default (packages/gpu/)
Depends on: Phase 1
Scope: Convert the five timeline compute effects into ShaderModule form (TypeGPU-backed), introduce the shared Executor, and refactor WebGPUEffectsProcessor to dispatch through it. Hand-written uniform packing in timeline goes away.
Promote the timeline’s existing WebGPU compute effects into the shared pool so they become reusable operations for timeline, sketch, and future export paths. Preserve timeline behavior end-to-end. The payoff is that all five effects run through one Executor with one typed packer per module — no host-side ArrayBuffer arithmetic survives this phase.
color.grade)filters.blur.gaussian)filters.sharpen.unsharpMask)filters.vignette)keyer.chromaKey)All five remain surface: "internal" in Phase 2. Promotion to published happens deliberately in Phase 3 once parameter schemas are stable.
web/src/components/timeline/preview/gpu/shaders.ts (per-clip effect compute passes; distinct from the layer-composite pass, which already runs through WebGPULayerCompositor after Phase 0).web/src/components/timeline/preview/gpu/effectsProcessor.tspackages/gpu/src/interface Executor {
encode<P>(args: {
ctx: GPUContext;
module: ShaderModule<P>;
encoder: GPUCommandEncoder;
inputs: Record<string, LabeledTexture>;
output: LabeledTexture;
params: P;
dispatch:
| { kind: "fragment" }
| { kind: "compute"; x: number; y: number; z?: number };
}): void;
}
The dispatch discriminator is set now even though Phase 2 only uses the compute arm — making it a discriminated union from day one means fragment passes (the default for image-in/image-out from Phase 3 onwards) drop in without an Executor signature change.
Single shared implementation. Steps per call: resolve variant (Phase 3 expands this), validate input labels against module.io.inputs, get-or-compile pipeline via ctx.pipelineCache, pack params via the module’s TypeGPU schema, build the bind group from the module’s typed layout, encode the compute dispatch. The host owns when to submit and where the output texture lives.
interface ShaderModule<P> {
id: string; // "color.grade"
version: number; // 1
surface: "internal" | "published";
category: ShaderCategory;
kind: "fragment" | "compute" | "snippet"; // fragment is default for image-in/image-out
params: TgpuStructSchema<P>; // TypeGPU schema
paramDefaults: P;
paramUi?: Record<keyof P, UiHint>; // min/max/step/notes; no runtime semantics
bindGroupLayout: TgpuBindGroupLayout; // TypeGPU layout
samplers: Record<string, GPUSamplerDescriptor>; // bundled with module
wgsl: string; // hand-written
entryPoint: string;
workgroupSize: [number, number, number];
io: IOContract;
}
IOContract declares per-input colorSpace, alpha, bindingKinds, and an output spec with dimensions: "same-as:<inputName>" | "host-specified" | "derived". The Executor checks LabeledTexture.meta against this.
Default kind for new modules is fragment. Filters, color, keyer, sources, transform, and mixer ops use a full-screen triangle render pass — no workgroup boundary handling, free interpolation, easier blending. Phase 2’s five timeline effects stay on compute because that’s what the existing shaders use; this is a migration accident, not a design choice. New modules from Phase 3 onward default to fragment. compute is reserved for ops that genuinely need shared memory or reductions (histogram, summed-area tables, FFT-style passes).
RoD propagation. Every module’s IOContract declares an rod field:
rod: "same-as:source" | "expand:<paramName>" | "union-of-inputs" | "explicit"
Phase 1–2: every module declares "same-as:source". The Executor does nothing with it yet. Phase 3 onwards, blur declares "expand:radius", crop declares "explicit", composite declares "union-of-inputs". RoI propagation, tiling, and partial-region re-cooks are explicitly out of scope until a real consumer demands them — only the RoD declaration is required now, because retrofitting it across a 50-module catalog is the kind of thing that gets perpetually deferred.
Executor in packages/gpu/src/executor.ts. One implementation, shared.packages/gpu/src/shaders/<category>/<id>/v1/module.ts using defineModule. WGSL bodies move verbatim; bindings and the uniform struct are translated into TypeGPU schema/layout once.WebGPUEffectsProcessor to call Executor.encode(...) per effect. Effect ordering, host-owned scratch textures, and pass scheduling stay unchanged. Hand-written ArrayBuffer packing is removed.web/src/components/timeline/preview/gpu/shaders.ts.No variant routing in Phase 2. Variants land in Phase 3 when a concrete second binding kind appears (texture_external for the timeline video fast path is the first real case).
*Uniforms struct in each module’s WGSL and compares field names + WGSL types to the TypeGPU schema. Fails CI on mismatch. This is the same idea as the original plan’s struct-name check, made authoritative.WebGPUEffectsProcessor stops creating its own samplers and layouts for migrated ops; the Executor uses the module’s declarations exclusively.kind: "compute", surface: "internal", valid params schema, valid io, non-empty wgsl.*Uniforms struct.WebGPUEffectsProcessor path on the same input + params and assert byte equality of the output for each effect.npm run typecheck, npm run lint, npm run test clean from web/.ID: P-2026-05-13-shared-shader-pool-phase-3-shader-catalog-expansio
State: draft
Tags: gpu, shaders, catalog, typegpu, recipes, phase-3
Repo: default (packages/gpu/)
Depends on: Phase 1, Phase 2
Scope: Grow the catalog in reviewable batches, introduce RecipeRunner for multi-pass operations when Batch 4 needs it, formalize variant resolution against host capabilities, and start promoting stable modules to surface: "published".
Turn the shared pool from a five-effect internal catalog into a useful operation library for sketch, timeline preview/export, and future workflow nodes. Every new op is a ShaderModule from day one — no hand-rolled packers, no host-side bind group construction.
Prioritize operations that are:
ShaderModule shape.Defer operations that need text shaping, SVG parsing, brush/spline UI, ML, graph-level packing decisions, or CPU flood fill / selection bookkeeping. Those are Phase 4+ orchestration concerns, not shaders.
Filters (pixelate, blur variants, emboss, high pass, edge detect, normal map, seamless blend), color (invert, brightness/contrast, hsb, exposure, posterize), keyer (luma key), mask ops (fromImage, apply, boolean, refine erode/dilate, refine feather). Sketch starts consuming color.* and filters.* modules through the Executor during this batch.
Zero-input, one-output. Solid canvas, gradients (linear/radial/angular/diamond), noise (white/value/perlin/simplex/voronoi/clouds/cellular/curl/domainWarp/fbm/ridged/turbulence), patterns (checkerboard/stripes/dots/hexagons/brick), shapes (circle/rectangle/polygon/star/ring).
Source modules declare:
io.inputs = {}io.output.dimensions = "host-specified" (required host params width, height)seed, scale, offset, rotation, repeat, backgroundColorrgba8unorm, straight alpha, SDR sRGBNoise:
sources/noise/v1 module + user-friendly style enum param.utils/noise.*.wgsl.ts and are imported via tgpu.resolve() only when shared by 2+ modules.sources/noiseVector.curl) when the output contract differs.style, scale, layers, detail, roughness, seed, contrast, warp, output. lacunarity/gain hidden behind an advanced foldout.Crop, pad, mirror, offset, tile, resize (nearest/bilinear/bicubic), rotate90, affine, cornerPin, polarRemap, displace, spherize.
Module-level defaults:
io.output.dimensions: same-as:source for filters/color/keyer/most masks; host-specified for sources; derived for crop/pad/some resize.addressMode: clamp-to-edge for convolution; clamp-to-edge with shader-side alpha-out for transforms exposing empty space; repeat/mirror-repeat only on explicit tiling/repetition modules.linear general, nearest for pixel-art/rotate-90.Defer arrange, spriteSheet, trim — they need graph-level layout decisions outside the shader.
Composite layer, color overlay, drop shadow, outline, channel merge/split/shuffle.
Composite layer already exists as
WebGPULayerCompositor(Phase 0), retrofitted onto TypeGPU in Phase 1. Batch 4’s job here is not to build it but to decide whether the multi-layer blend/ping-pong engine is exposed as a singlemixer.compositeShaderModule(one pass per layer driven by the Executor) or stays a bespoke runtime the recipe layer calls into. Lean: keepWebGPULayerCompositoras the runtime and add a thinmixer.compositemodule that delegates, so the blend math (already shared viablend/wgsl.ts) isn’t forked again.
This batch introduces RecipeRunner:
interface Recipe<P> {
intermediates: Record<string, TextureSpec>;
passes: Array<{
op: string; // "internal.blur.gaussianH@1"
in: Record<string, "source" | string>; // "source" or intermediate name
out: "output" | string;
params: Record<string, JsonPath<P> | unknown>; // "$.radius" or literal
}>;
}
interface RecipeRunner {
encode<P>(args: {
ctx: GPUContext;
module: ShaderModule<P>; // must have `recipe`
encoder: GPUCommandEncoder;
inputs: Record<string, LabeledTexture>;
output: LabeledTexture;
params: P;
}): void;
}
Default RecipeRunner walks the passes, acquires intermediates from ctx.scratch, calls Executor.encode per pass, releases in reverse. Intermediate formats are declared in the module (not chosen by host) — that’s the only way filters.glow produces the same output everywhere.
Example: filters.glow recipe = extract → blur H → blur V → composite with rgba16float intermediates declared in the module.
Channel ops live under color/ if the UI treats them as color tools. Pick one namespace before implementation.
Levels, curves, gradient map, black-and-white, color balance, color remap, local contrast, tonemap, dither (ordered/floydSteinberg), quantize.
Before starting Batch 5, decide how LUTs / curves / palettes / gradients are supplied:
texture_2d with height 1).mode = none | reinhard | aces | filmic. Full HDR pipeline is a separate plan.Variants become first-class this phase. A module may declare multiple variants with different bindGroupLayout (e.g. texture_external vs texture_2d for source) and different requires: { textureExternal?: boolean; f16Storage?: boolean }.
The Executor resolves a variant by matching inputs[*].meta.bindingKind against each variant’s bindGroupLayout and filtering by ctx.capabilities. Timeline gets the texture_external fast path automatically; Node.js Dawn falls back to texture_2d; sketch picks whichever its uploads produce.
texture_external and camera/video (realtime alignment). A registered color/convert/srgbExternalToLinear module (exact id TBD) is the canonical consumer of texture_external for sRGB-encoded VideoFrame/camera surfaces: bind the external texture in the same command-buffer submission where it was imported, emit linear premultiplied rgba8unorm (texture_2d), everything downstream binds regular textures. Matches WebGPU’s rule that imported external texture handles expire at submit. Realtime capture spikes and minimum-viable paths should use this module as the first FX stage.
Variants are added only when a concrete second consumer or binding kind exists. interactive/export quality variants stay deferred until a real quality/perf split is demonstrated.
io.0 outside, 1 inside, fractional for feather.mask.apply writes mask coverage into output alpha, preserves RGB by default.mask.fromImage modes: alpha, luminance, R/G/B, max-channel.overlays.*; mask ops produce data, not UI indicators.Optional mask inputs follow the canonical-inputs contract from Phase 1. Filter/color/keyer/transform/mixer modules in this phase ship with the mask slot exposed unless there’s a clear reason not to. Timelines that lack mask UI simply pass none — the executor’s missing-mask semantics handle it.
By end of Batch 3, the five Phase 2 modules plus the high-confidence subset (HSB, brightness/contrast, invert, blur, vignette, lumaKey, mask.apply) promote to surface: "published". Promotion is a deliberate review per module:
published modules cannot be removed or have their params reordered/removed/retyped. Adding optional params is allowed. Breaking changes ship as a new version.
type ParamAnimation = "lerp" | "step" | "none";
interface TimeContract {
usesTime: boolean;
timeParam?: "seconds" | "frame";
}
Defaults: numeric/vec/color → lerp; bool/enum/seed/mode → step; runtime-only (texture handles) → none. Angle params lerp; hosts choose shortest-path wrapping. Host supplies time/frame as a uniform field declared in the module’s TypeGPU schema. Sketch passes a stable value for static rendering.
Batch 1 begins replacing sketch’s per-op WebGPU plumbing with Executor.encode calls. By end of Phase 3, sketch and timeline both consume the same Executor and the same ShaderModule objects. Sketch retains its own scheduling, texture pool, and presentation surface.
Most modules start with one default variant. Add additional variants only with a proven binding-kind, capability, or quality split. Likely future candidates: blur, resize, glow, denoise, quantize/dither, keying.
Not Phase 3 shaders: imageIn (decode/upload), svgImport, text (font shaping), inpaint/denoise (ML or complex multi-pass), edgeDetect.canny (multi-stage with feedback), blur.median/bilateral/edgePreserving (require non-trivial work), smartScale, meshWarp, arrange, spriteSheet, trim (needs readback/bounds detection), magicWand, colorRange (eyedropper UI), roto (spline UI), paint/maskPaint/freehand, palette workflows.
Each added module:
*Uniforms field-name checkAvoid adding large batches without tests. Catalog grows in reviewable groups.
Decision rule: only add another dep if it solves a concrete current-phase problem that TypeGPU + the in-pool helpers don’t already cover.
ID: P-2026-05-13-shared-shader-pool-phase-4-shader-nodes
State: draft
Tags: gpu, shaders, workflow, nodes, nodejs, dawn, sharp-replacement, phase-4
Repo: default (packages/gpu/)
Depends on: Phase 1, Phase 2, Phase 3
Scope: Expose published-surface modules as workflow graph nodes that chain on the GPU, ship the Node.js Dawn GPUContext adapter, and migrate Node-side pixel operations off sharp onto the shared catalog.
Two outcomes:
ImageRef only at workflow boundaries.GPUContext adapters, different I/O codecs.The first concrete consumer is texture FX chains: workflow-graph subgraphs of shader nodes that share GPU textures, materialize an ImageRef only at the chain’s downstream boundary, and run the same modules sketch and timeline already use. Frame-loop realtime hosts build on Phase 5 of this doc (pool contract); session lifecycle and workflow partitioning belong in the separate realtime plan (draft).
Ideal path:
ImageRef → decode/upload → LabeledTexture → shader node(s) → encode → ImageRef
Shader nodes accept one logical image input:
import type { ImageRef } from "@nodetool-ai/protocol";
type ShaderImageInput = ImageRef | GPUTextureRef;
Runtime behavior:
ImageRef input: codec decodes to raw pixels, then uploads into a LabeledTexture.GPUTextureRef input: reuse the existing labeled texture directly.GPUTextureRef for chaining.ImageRef only when a downstream non-GPU node or workflow output needs persisted image data.LabeledTexture, not a persisted asset.GPUTextureRef output to ImageRef input or workflow boundary.WebGPU is required; unsupported environments fail clearly with no silent CPU fallback.
webgpu (npm) — Dawn bindings for Node.js, ^0.4.0GPUAdapter, GPUDevice, GPUTexture, GPUCommandEncoder, etc.A NodeGPUContext is a GPUContext built from a Dawn-provided device:
import { create } from "webgpu";
import tgpu from "typegpu";
async function createNodeGPUContext(): Promise<GPUContext> {
const gpu = create([]);
const adapter = await gpu.requestAdapter();
if (!adapter) throw new Error("No WebGPU adapter (Node/Dawn)");
const device = await adapter.requestDevice();
const root = tgpu.initFromDevice({ device });
return {
root,
device,
capabilities: detectCapabilities(adapter, device),
createTexture: nodeCreateTexture,
scratch: makeRefcountedScratchPool(device),
pipelineCache: makePipelineCache(),
};
}
Every Phase 1–3 module runs against this context unchanged because TypeGPU accepts any GPUDevice via initFromDevice. The Executor, RecipeRunner, registry, variant resolution, validation — all shared code.
| Concern | Browser | Node.js (Dawn) |
|---|---|---|
| Adapter acquisition | navigator.gpu.requestAdapter() |
import { create } from "webgpu" |
| Canvas | HTMLCanvasElement, OffscreenCanvas |
None; headless texture targets |
| Image decode | createImageBitmap, ImageDecoder, VideoFrame |
codec library (see image pipeline below) |
| Image encode | canvas.convertToBlob, ImageEncoder |
codec library |
| Texture lifetime | Browser GC drives release | Explicit texture.destroy() via refcounted scratch |
| Worker | OffscreenCanvas in Worker |
Separate Node process; no Worker canvas |
WGSL, TypeGPU schemas, Executor, RecipeRunner are identical across both.
sharp currently does two distinct jobs in Node: codecs (decode/encode PNG/JPEG/WebP/AVIF/TIFF, EXIF, ICC) and pixel ops (resize, blur, sharpen, modulate, composite, gamma, …). Only the second is in scope for the shader pool.
Target end state:
codec.decode(bytes) ─► LabeledTexture
│
▼
Executor / RecipeRunner running shared ShaderModules
│
▼
LabeledTexture ─► codec.encode(...) ─► bytes
All pixel-in/pixel-out work that sharp currently handles:
These become (or already are) published modules. No host re-implements them.
A thin @nodetool-ai/image-codecs package handles:
Codec choice per format is implementation detail. Candidates: native bindings (libvips slim build, libjpeg-turbo, libpng, libwebp, libavif) or pure-JS where acceptable (@jsquash/*). The codec layer has no GPU dependency and can ship CPU-only.
For workflows that do “decode → one resize → encode” with no further GPU work, a pure-CPU codec+resize path may beat GPU upload/download cost. Keep a @nodetool-ai/image-codecs resize that bypasses the pool for that single-op case. Chains of two or more operations always go through the GPU pipeline.
@nodetool-ai/image-codecs with decode/encode for the formats currently used.sharp from nodetool-core and node packs once all consumers are off it. Keep sharp as a transitional dev dep in any pack still mid-migration.Generated from published-surface registry entries:
Shader node "color.hsb" (version 1):
input: image
params: hue, saturation, brightness
output: image
Param UI is driven by the module’s TypeGPU schema + paramUi hints (min/max/step). Execution maps params → uniform → Executor.encode → labeled texture out. Versioned IDs are saved with the workflow; loading a workflow uses registry.get({ id, version, surface: "published" }).
The workflow node’s nodeType (the NodeRegistry key in packages/node-sdk) encodes the module version — e.g. shader.color.hsb@v2, not bare shader.color.hsb. Without this, upgrading a published module silently changes node behavior in saved workflows. The @v<n> suffix is the bridge between the workflow NodeRegistry’s string-keyed identity and the shader registry’s (id, version) identity.
Workflow-load failure mode. Loading a workflow that references (id, version) not present in the running registry fails loud at load time, naming the missing module. No silent fallback to a different version. Migrating saved workflows to a newer version is a deliberate user action with a diff preview. Removing a published module requires a deprecation window with a documented replacement — this is the difference between “old workflows still work in two years” and “every release silently breaks somebody’s graph.”
Avoid readback between shader nodes. The runner inspects graph edges and only materializes ImageRef at GPU-to-non-GPU boundaries or final outputs.
Run as:
ImageRef → upload → texture → HSB → Blur → Vignette → readback → ImageRef
Not as:
ImageRef → upload/readback → ImageRef → upload/readback → ImageRef
Already shared, deterministic, published by end of Phase 3:
color.hsbcolor.brightnessContrastcolor.invertfilters.blur (gaussian variant)filters.vignettekeyer.lumaKeymask.applyAvoid text, SVG, smart scale, inpaint, paint, magic wand, palette extraction in Phase 4.
Shader nodes consume the same registry entries as sketch and timeline. They do not import sketch or timeline host code.
Shared (already in the pool by this point):
Host-specific to workflow nodes:
ImageRef materialization)published subset.Phase succeeds when:
ImageRef, in both browser and Node.js/Dawn.sharp for a multi-op chain runs on the shared pool with byte-equivalent (or documented-difference) output.ShaderModule objects are executed by sketch, timeline, and Node workflow runner — verified by import-graph analysis.Example chain to prove first:
Image In → HSB → Gaussian Blur → Vignette → Image Out
Prove in browser, then validate the same pipeline in Node.js/Dawn. Broad catalog coverage and full sharp parity come in follow-up work.
ID: P-2026-05-13-shared-shader-pool-phase-5-realtime
State: draft
Tags: gpu, shaders, realtime, frame-loop, phase-5
Repo: default (packages/gpu/)
Companion (orchestration + param store): __PLANS__/feat-realtime/PLAN-REALTIME-2.md — workflow ↔ renderer bridge, runner third mode, RealtimeLoop, inference subprocess. Kept separate so this repo’s shader plan stays in docs/ and evolves at a different cadence.
Depends on: Phase 1, Phase 2, Phase 3, Phase 4
Scope (this document): What the shared pool must provide so any realtime or frame-loop consumer — minimal webcam→FX→canvas, texture bus with async inference, feedback/trails, future NDI/Syphon — can plug in without forking WGSL or duplicating bind layouts. Out of scope here: session lifecycle, workflow vs realtime partitioning, param store, kernel runner “third mode”, Python inference subprocess, zero-copy verification checklists — see PLAN-REALTIME-2. Phase 5 below is the GPU catalog + executor contract side; PLAN-REALTIME-2 is the orchestration side.
Run the same ShaderModule catalog at frame rate inside a dedicated host (tick loop), not only in request/response workflow runs. The pool stays host-agnostic; the realtime host owns capture, scheduling, and when (if ever) CPU readback happens. Phase 5 ensures the foundation laid in Phases 1–4 is sufficient for that class of hosts — including optional feedback and sinks beyond ImageRef.
ShaderModule, Executor, RecipeRunner, registry, variant resolution, LabeledTexture, IOContract (including rod), TimeContract, bucketed ctx.scratch, pipeline cache on GPUContext.sample_frame-style operations, inference input at model rate).texture_external path — Phase 3 variant + color/convert/srgbExternalToLinear (or equivalent registered module) so camera/video can enter the catalog as linear texture_2d in one submit.ctx.persistent (or equivalent): refcounted textures that survive across ticks, distinct from frame-scoped scratch. Simple realtime paths may use only scratch + ping-pong between two bucketed textures; persistent is for when feedback cannot be expressed as a single ping-pong pair.previousFrame (or named prior-output) input — optional graph concept: a source slot bound to last tick’s output for a named handle. Keeps the per-tick graph a DAG while still allowing feedback; host advances the handle after submit. Catalog modules stay unchanged; wiring is host responsibility.| Aspect | Phase 4 (texture FX chain) | Phase 5 (frame-loop host) |
|---|---|---|
| Execution model | Run-once, request/response | Continuous tick (source-driven or capped) |
| Texture lifetime | Per-run scratch + boundary ImageRef |
Per-tick scratch plus optional ctx.persistent for feedback/state |
| Feedback | DAG only | previousFrame / named prior-output + optional persistent textures |
| Sources | ImageRef / GPUTextureRef |
Same modules; live sources (webcam, file, later NDI/Syphon) enter via texture_external + convert pass, then identical FX |
| Outputs | ImageRef at boundaries |
Primary: WebGPU canvas / surface the host owns. Optional later: NDI/Syphon/Spout as presentation sinks — not required for pool acceptance |
| Params | Workflow node inputs | Per-tick uniforms packed from snapshot (PLAN-REALTIME-2 param store → TypeGPU-backed ShaderModule.params) |
| Time | Static or keyframed | time / frame / deltaTime from host each tick |
These guide PLAN-REALTIME-2; the pool does not implement them, but should not contradict them:
set_param-style), not textures every frame over WebSocket. Realtime snapshots the param store once per tick → packs Executor uniforms against each node’s TypeGPU schema. ImageRef/bulk images occasional (set_image-style); never implicit every-frame edges on the workflow DAG (see PLAN-REALTIME-2 “Graph boundary”).texture_2d in the same submission, never store the external handle across frames (aligns with Phase 3 module above).RealtimeLoop, param store API, realtime_session_start / server routing, or kernel third execution mode.nodetool-realtime).texture_external (or uploaded texture_2d) → shared catalog module chain (e.g. HSB → blur) → canvas, at source frame rate, with no CPU read on the FX chain (Executor-only path).previousFrame and/or ctx.persistent runs for a sustained session without texture leaks.ShaderModule binaries (import graph) are used by sketch, timeline, Phase 4 workflow chains, and the Phase 5 frame-loop host.Full end-to-end realtime acceptance (multi-platform zero-copy capture, async inference, workflow integration) is gated on PLAN-REALTIME-2, not on this document alone.
These were considered during plan review and intentionally left out. Each lists the trigger that should reopen it. Do not design ahead of the trigger.
tileSafe: boolean field at that point.buildPasses(params, ctx) → Pass[]). Trigger: first op that needs runtime-determined pass count — large-radius blur via mip pyramid, bloom with N downsample levels, iterative dilate/erode, conditional passes (tonemap mode = none as recipe-level passthrough).params may need to become { value, expression?, binding? } per field.wantsMips and derived resources. Trigger: first module that needs a mip chain, gaussian pyramid, or summed-area table. vvvv’s WantsMips is the model.bindingKind. Trigger: variant count for any module exceeds ~3, or first module needing array textures, 3D textures, multi-sampled, or format-discriminated variants. Replace string enum with { dimension, sampleType, multisampled, viewDimension } matching GPUTextureBindingLayout.max-diff < ε (the honest version).