NodeTool ships a lightweight ingestion pipeline for semantic search and retrieval-augmented generation (RAG) tasks. The indexing logic is split across @nodetool-ai/vectorstore (store and embedding) and @nodetool-ai/deploy (collection routes).
Overview
- Collection metadata (
CollectionResponsein@nodetool-ai/protocolpackages/protocol/src/api-types.ts) stores ingest configuration, including an optional workflow ID. - Vector store – the default backend is SQLite-vec (
@nodetool-ai/vectorstorepackages/vectorstore/src/sqlite-vec-store.ts). Embeddings flow through theVectorProviderabstraction — see Vector Storage for swapping backends (Pinecone, Supabase/pgvector). - Indexing route –
indexFileToCollection()(@nodetool-ai/deploypackages/deploy/src/collection-routes.ts) orchestrates ingestion based on collection metadata.
Default Flow
indexFileToCollection()resolves the target collection viagetCollection()(@nodetool-ai/vectorstorepackages/vectorstore/src/index.ts).- If the collection specifies a custom workflow ID, the service executes it by constructing a
RunJobRequest(@nodetool-ai/protocolpackages/protocol/src/api-types.ts) withCollectionInputandFileInputnodes populated. - Otherwise, it falls back to the default ingestion path, which splits the document with
splitDocument()(@nodetool-ai/vectorstore), embeds it, and stores embeddings in SQLite-vec.
Messages & Progress
While custom workflows run, the service streams JobUpdate, NodeUpdate, and progress messages (from @nodetool-ai/protocol packages/protocol/src/messages.ts). Tests under packages/deploy/tests/collection-routes.test.ts cover expected message sequences.
Configuring the vector store
The default backend is local SQLite-vec. Switch backends with NODETOOL_VECTOR_PROVIDER.
| Variable | Description | Default |
|---|---|---|
NODETOOL_VECTOR_PROVIDER |
sqlite-vec, pinecone, or supabase |
sqlite-vec |
VECTORSTORE_DB_PATH |
Local SQLite-vec database file | ~/.local/share/nodetool/vectorstore.db |
PINECONE_API_KEY |
Required when provider is pinecone |
— |
SUPABASE_URL / SUPABASE_SERVICE_ROLE_KEY |
Required when provider is supabase |
— |
See Vector Storage for backend-specific setup.
Custom Ingestion Workflows
Collections can reference bespoke workflows to process files before embedding. The workflow should expect:
- A
CollectionInputnode receivingCollection(name=…). - A
FileInputnode receivingFilePath(path=…).
Return values can include summaries, metadata, or alternate embeddings. Review packages/deploy/tests/collection-routes.test.ts for a template.
CLI & API Integration
POST /collections/{name}/index(see@nodetool-ai/websocketpackages/websocket/src/collection-api.ts) triggers ingestion via HTTP.- The MCP server (
@nodetool-ai/websocketpackages/websocket/src/mcp-server.ts) exposes commands for IDE plug-ins to index assets. - Admin routes under
@nodetool-ai/deploypackages/deploy/src/admin-routes.tsprovide remote ingestion endpoints for deployed servers.
Troubleshooting
- Missing collection metadata – ensure the collection exists and includes the required
workflowentry when using custom workflows. - Remote backend errors – for
pineconeorsupabase, verify credentials and network reachability; fall back to local SQLite-vec by settingNODETOOL_VECTOR_PROVIDER=sqlite-vec. - Large files – ensure
VECTORSTORE_DB_PATHhas disk headroom, or move to a remote backend; the default ingestion workflow streams chunks to reduce memory usage.
Related Documentation
- Providers – selecting embedding models for ingestion nodes.
- Workflow API – details on
RunJobRequest. - Storage Guide – configuring persistent storage for uploaded documents.