Status: Vision · Branch
claude/nodetool-video-creation-scope-Q7LQf· Owner: matti
You describe the video. Studio makes it. Type a topic, paste a script, drop a raw recording, or hand it a URL — Studio writes, voices, illustrates, captions, cuts, and reframes a finished video you can publish. Then you edit it like a document: change the words, and the video changes with them.
Studio is built for short-form video, three minutes max — Shorts, Reels, TikToks, ads, clips, micro-lessons, and explainers that fit the formats people actually watch and share. This is a deliberate constraint, not a limitation:
Long-form sources are still first-class inputs — a 90-minute podcast or webinar goes in, but what comes out is short clips. The output is always ≤3 minutes.
This is the thing creators currently assemble from five tools — a scriptwriter, a voice generator, a stock/b-roll library, a captioner, and a timeline editor. Studio collapses them into one surface where the script is the timeline and AI fills every beat. That bundle is what justifies a €50/month creator subscription: it replaces a stack that costs more than that and takes a day of work per video.
The solo creator, marketer, educator, and agency operator who ships short-form video on a schedule — weekly YouTube explainers, daily TikTok/Reels, course modules, product clips, ad variants. They don’t want a node graph. They want to go from idea to publish-ready MP4 in minutes, then tweak by editing text. They’ll pay because Studio is faster than hiring an editor and cheaper than the tool stack they’re juggling today.
The moat is that NodeTool already has the hard parts in main — a real sequence
engine, AI clip generation, version/staleness tracking, a WebGPU compositor, and
in-browser MP4 export. Studio is the opinionated creator surface on top of
that engine. Competitors are either text-editors that bolt on AI (Descript) or AI
generators with no real editor (the clip-farm tools). Studio is both: generative
and editable, with the timeline as the substrate.
Every entry point lands on the same artifact — a transcript-bound sequence:
The assembly agent is the headline. It doesn’t just generate — it produces a fully bound sequence: beats → voiceover clips, b-roll clips, captions, music bed, transitions. Everything it emits is editable by hand afterward.
The transcript panel is the editor. The script is a list of beats; each beat owns the clips that realize it.
No scrubbing, no razor tool, no keyframe hunting for the 90% case. Power users can still drop to the full timeline tracks when they want frame control.
A video should sound and look like you:
One source, many outputs — this is where the subscription earns its keep for people publishing everywhere:
Studio is a creator surface over NodeTool’s existing infrastructure. The mapping:
| Studio capability | Engine it rides |
|---|---|
| Transcript-bound editing | TimelineSequence document (persisted, autosaved) |
| Voiceover & b-roll | AI clip generation (text-to-audio, text-to-video) |
| Reword → regenerate | dependencyHash staleness + ClipVersion history |
| Captions & overlays | sceneModel.computeActiveLayers (preview/export parity) |
| Auto-reframe & variants | sequence width/height (any ratio) + subject tracking |
| Export | in-browser WebCodecs MP4 |
| Assembly agent | NodeTool agent system (planner → steps → tools) |
The agent’s output target is the transcript-bound sequence the manual editor produces — generation and hand-editing share one data model, so anything the agent makes, you can refine, and anything you build, the agent can extend.
Studio’s surface is deliberately simple — a transcript and a preview. But it is not a black box. Underneath the surface is the full NodeTool engine, and experts can look — and reach — under the hood:
The biggest unsolved design problem — and the thing we need to figure out — is how the assembly agent is harnessed and which node workflows it drives to turn an idea into a bound sequence. This is the engineering heart of the product and is not yet specified. Open questions:
TimelineSequence?text-to-audio), b-roll
(text-to-video + stock search), transcription/highlight-finding for the clip
machine, caption timing, auto-reframe with subject tracking? Which already exist
in base-nodes and which must be built?Resolving this — designing the agent harness and the underlying node workflows — is the first real engineering milestone before any of the surface gets built.
A creator opens Studio, types “weekly update on our launch, punchy, 45 seconds, vertical.” Ninety seconds later there’s a captioned, voiced, b-rolled vertical video on screen. They delete one sentence, reword the hook, swap the voice to their clone, hit a brand preset, and export — then generate the 16:9 cut for YouTube from the same project. Start to publish: under three minutes. That’s the €50/month.
Six use cases that each, on their own, justify the subscription. Studio wins because one person does all of them in one tool.
Who. Solo creators running educational / niche short-form channels without showing their face — finance, history, science, “how X works,” top-10 lists on TikTok, Reels, and Shorts. Often a side hustle scaling toward full-time. No editing skills, no on-camera presence, publishing daily.
Pain today. A single 60–90 second explainer Short is still hours of work: write the script in one tool, generate voiceover in ElevenLabs, hunt stock b-roll, drag everything into a timeline, hand-time captions, render, then redo the framing for each platform. The tool stack alone runs past €50/month and the workflow doesn’t scale to a daily posting habit.
In Studio. Type the topic. The assembly agent researches, writes, voices, fills b-roll per beat, and captions a tight ≤90s Short — out comes a bound sequence. They read the transcript, cut two weak sentences, reword the hook, reroll one b-roll shot, and export. Hours become minutes — fast enough to post every day.
Why they pay. Studio is the difference between one video a week and one a day — it directly grows the channel that pays their rent.
Who. Podcasters, interviewers, webinar hosts, and the agencies/VAs who manage their socials. They already have hours of long-form audio/video; their growth channel is short clips on TikTok, Reels, Shorts, and LinkedIn.
Pain today. Finding the 30 clip-worthy moments in a 90-minute episode is manual and soul-crushing. Tools like Opus Clip find moments but give you no real editor; Descript edits but you’re still scrubbing. Captioning and reframing each clip vertically is per-clip busywork. Output: maybe 5 clips per episode when 20 are sitting in there.
In Studio. Drop the episode. Studio transcribes, surfaces the high-retention moments as candidate clips, and assembles each as an editable transcript-bound sequence — already captioned and reframed to vertical with the speaker tracked. The editor trims by deleting words, tightens the hook, and ships. Twenty clips from one episode in the time it used to take to make five.
Why they pay. Clip volume is reach, and reach is the whole reason they podcast. Studio multiplies the output of content they’ve already recorded.
Who. Early-stage founders, indie hackers, and one-person marketing teams who have to be the content department. Product launches, feature announcements, demo clips, “build in public” updates, founder POV videos for LinkedIn and X.
Pain today. They have product knowledge but zero editing time. Hiring an editor is slow and expensive for the cadence they need; doing it themselves means a video ships once a month instead of weekly. Consistency across videos (brand, voice, captions) is nonexistent because each one is improvised.
In Studio. Paste the changelog or a rough script. Studio turns it into a captioned, voiced, on-brand video using their saved brand kit — logo, colors, fonts, intro/outro baked in. Reword for tone, pick the cloned founder voice, export landscape for LinkedIn and vertical for Reels. Ship a launch video the same hour the feature ships.
Why they pay. It’s an editor + a brand designer for €50/month, available at 2am the night before launch. The alternative costs ten times that and can’t keep up.
Who. Online instructors, corporate L&D teams, bootcamps, and teachers building bite-sized lessons — a course delivered as a series of short, single-concept videos, plus the short “tip of the day” content they post to social to fill the top of their funnel.
Pain today. Short lesson video is repetitive production work: same intro, same lower-thirds, same caption style, every clip. Updating a course when content changes means re-recording and re-editing. Keeping 40 micro-lessons visually consistent by hand is error-prone and slow.
In Studio. Each lesson is a transcript — write or paste it, Studio voices and illustrates it against a saved course template so every module matches. When the curriculum changes, edit the script line and regenerate just that beat instead of re-shooting. Templates make lesson #40 as fast as lesson #1.
Why they pay. Course revenue depends on shipping the whole curriculum and keeping it current. Studio turns lesson production from a bottleneck into a text-editing task.
Who. Social media agencies and in-house teams running many client or product brands at once. Their unit of work is “20 posts this week across 6 brands, each on 3 platforms,” and their margin is throughput.
Pain today. Every brand needs its own look, voice, and caption style, and every platform needs its own aspect ratio. Multiplied across clients, that’s a combinatorial explosion of manual reformatting. Maintaining brand consistency across a team of junior editors is a constant QA fire.
In Studio. Brand kits enforce each client’s voice, colors, fonts, and caption preset automatically. Build once per concept, then fan out: every platform’s aspect ratio and a length variant from one source sequence. Templates standardize formats so a new team member produces on-brand output day one.
Why they pay. This is pure margin — Studio cuts the per-deliverable cost across hundreds of deliverables a month. At agency volume, €50/month is a rounding error against the labor it removes.
Who. Creators and brands expanding into multiple language markets — a YouTuber going from English to Spanish/Portuguese/Hindi, or a company localizing product and training videos for regional teams.
Pain today. Localization means re-recording voiceover in each language, re- timing everything, and re-burning captions per language. It’s so expensive that most creators simply don’t do it and leave entire markets on the table.
In Studio. Translate the transcript; Studio regenerates voiceover and captions in the target language and re-flows the timing automatically — same b-roll, same brand, new language. One source video becomes five market-ready videos without touching a timeline.
Why they pay. Localization unlocks audiences worth far more than €50/month, and Studio makes it a button instead of a project.
Every hero user is publishing on a cadence and bottlenecked on production, not ideas. Studio attacks the bottleneck the same way for all of them: idea or source in, edit-by-text in the middle, many formats out. The transcript-bound sequence is the one model that serves the faceless creator, the repurposer, and the agency alike — which is why one subscription covers all six.