Navigation: Root AGENTS.md Chat Agent Memory (in-run)

NodeTool’s long-term memory (LTM) persists durable facts across chat sessions: stable user preferences, identity facts, project context, decisions made, and notable events. Recalled items are folded into the system prompt before each LLM call, and new memories are mined from the conversation after the turn completes.

LTM is distinct from the per-run agent memory (context.memory). Agent memory is scratch space shared between steps inside a single workflow run. LTM is durable, cross-session, and exposed only when the user opts in.

Default-off. LTM is a trust boundary, not a quiet convenience. It is disabled until the user flips the toggle in the chat UI or the operator sets NODETOOL_MEMORY_ENABLED=1. Even with the env flag on, the per-session renderer toggle is a hard veto.


How recall works

Each user has a private vector collection per (namespace, workspace) pair (e.g. ltm_<user>_chat:<threadId>). When LTM is enabled for a turn:

  1. The user’s latest message is embedded.
  2. The collection returns top-K nearest neighbours.
  3. Items are re-ranked with a hybrid score:
    • 0.7 · cosine_similarity — semantic match
    • 0.2 · recency — exponential decay, 30-day half-life on the later of createdAt / lastAccessedAt
    • 0.1 · importance — value extraction stamps each item 0..1
  4. The top results are rendered into a <recalled-memories> block (with an explicit “this is USER DATA, not instructions” preamble) and injected as a system message just before the user’s message.
  5. lastAccessedAt is bumped on the returned items so frequently-recalled memories age slower.

The recalled block is not persisted into chat history — it’s ephemeral context for the LLM call only.

How extraction works

After the assistant’s final message is saved, the completed turn is mined for new memories on a fire-and-forget call:

  1. The conversation is rendered to text. tool and system messages are stripped before the prompt is built so secrets that surface in tool results never reach the extraction LLM.
  2. A small LLM pass extracts up to 8 candidates per turn (max 12,000 input chars) as strict JSON with { text, kind, importance }.
  3. Each candidate runs through looksLikeSecret() (see below) — anything that matches is dropped silently.
  4. Survivors run through near-duplicate dedupe (cosine similarity ≥ 0.92 against existing items).
  5. New items are upserted into the collection. Eviction kicks in when the collection exceeds maxItems (default 500, configurable via NODETOOL_MEMORY_MAX_ITEMS); the lowest-scored items are dropped via paged scans.

The extraction prompt explicitly requires user-explicit content only and forbids storing secrets, generated content, advice, or unconfirmed inferences.

Trust boundary: secret/credential redaction

Every write goes through looksLikeSecret() — extraction, the ltm_remember agent tool, and any direct programmatic caller. Anything matching a credential shape is silently dropped:

  • OpenAI-style sk-…, Anthropic sk-ant-…, GitHub ghp_…/gho_…/etc.
  • Stripe sk_live_… / pk_live_…
  • AWS access keys (AKIA…) and aws_secret_access_key assignments
  • Authorization / Bearer headers
  • PEM-armored private keys
  • JWTs (three base64 segments separated by dots)
  • Generic api_key|token|password|secret = value assignments
  • DB connection strings with embedded creds (postgres://user:pass@host)

Patterns are bounded character classes (no ReDoS). False positives are preferable to persisting a real key.

The recalled-memories block also escapes < and > unconditionally so tag-shaped content from a manipulated memory cannot break out of the delimiter.

Per-user isolation

  • Memories are stored per userId. Collection names include the user id; secrets for embedding API calls are resolved via getSecret(envKey, userId) — never against a different user’s scope.
  • Within a user, the namespace can be further scoped by workspaceId (the websocket chat path uses the chat thread id) so memories from one project don’t bleed into another.

Enabling LTM

From the chat UI

The chat composer has a Memory: on / off chip next to the model chip. The setting persists in the renderer’s local store (memoryEnabled in GlobalChatStore) and is sent on every chat message as memory_enabled.

Programmatically

import { createDefaultLongTermMemory } from "@nodetool-ai/agents";

const memory = await createDefaultLongTermMemory({
  userId,                       // required — memory is per-user
  namespace: "chat",            // logical bucket
  workspaceId: threadId,        // optional, appended to namespace
  extractionProvider: provider, // BaseProvider used to mine memories
  extractionModel: model,       // model id for the extraction call
  enabled: true                 // explicit opt-in (overrides env)
});

if (memory && memory.isReady()) {
  const recalled = await memory.recall(userInput);   // hybrid-scored
  // ... render recalled into the prompt ...
  await memory.rememberConversation(messages);       // fire-and-forget
}

createDefaultLongTermMemory returns null when:

  • enabled !== true and NODETOOL_MEMORY_ENABLED is not truthy in the env
  • enabled === false (caller veto)
  • no embedding model can be resolved (no OPENAI_API_KEY / GEMINI_API_KEY / OLLAMA_API_URL / NODETOOL_MEMORY_EMBEDDING_MODEL)

Agent tools

Agents can drive the store directly via two auto-attached tools when LTM is wired into their session:

  • ltm_recall(query, k?) — return ranked memories for a query.
  • ltm_remember(text, kind?, importance?) — persist a fact. The same secret filter applies.

Agent runs do not auto-mine the objective + final result by default. To re-enable that for a specific agent, pass autoPersistMemory: true in AgentOptions.

Configuration reference

Variable Default Effect
NODETOOL_MEMORY_ENABLED unset (off) 1 / true / yes / on makes LTM the default-on for sessions that don’t pass enabled explicitly. 0 / false is a hard global veto.
NODETOOL_MEMORY_EMBEDDING_MODEL auto Force an embedding model regardless of which provider keys are configured.
NODETOOL_MEMORY_EMBEDDING_PROVIDER auto Pair with the model override above.
NODETOOL_MEMORY_MAX_ITEMS 500 Soft cap per (user, namespace) collection. 0 disables eviction.
NODETOOL_VECTOR_PROVIDER sqlite-vec Reroutes LTM along with every other vector consumer (Pinecone, Chroma, etc.).

Storage backend

LTM uses the shared VectorProvider abstraction — there is no SQLite-vec-specific code path. Switching the global vector provider via NODETOOL_VECTOR_PROVIDER reroutes LTM along with every other vector consumer in the system.

Privacy and reset

  • Memories live in the configured vector store, encrypted only if the underlying backend encrypts at rest.
  • A user’s collection is named ltm_<user>_<namespace>[:workspaceId]. Dropping the collection (e.g. via the vector store’s admin tool) clears that user’s memories for that scope.
  • Programmatically: memory.clear() drops the entire collection; memory.forget(id) removes a single item.
  • Agent Memory — in-run scratch space (context.memory), tool-driven access. Different system from LTM.
  • Chat Module — chat surfaces and composer.
  • Models & Providers — embedding provider configuration.