Collections bundle related documents into a single searchable unit. Connect a collection to an index node and any file in it β€” PDFs, Markdown notes, HTML, transcripts β€” becomes queryable from your workflows.

Collections Explorer


Opening Collections

Navigate to Collections from the left sidebar, or go directly to /collections. The explorer shows every collection you’ve created with counts of documents and the last time they were indexed.


Creating a Collection

  1. Click New Collection in the top-right.
  2. Give it a name and an optional description.
  3. (Optional) Associate a default embedding model.
  4. Drag documents into the collection β€” or import a folder from the Asset Explorer.

New Collection

Collections accept any file type, but only text-extractable formats are indexed:

Format Extracted
PDF Text + layout (per page)
DOCX / DOC Body text
Markdown / TXT Raw text
HTML Stripped body text
CSV / TSV Rows as records
EPUB Chapter text

Unsupported formats are stored but not indexed β€” still handy for reference inside a collection.


Managing Documents

Click a collection tile to open its details. You can:

  • Add documents by drag-and-drop or the Upload button.
  • Remove documents from the collection (doesn’t delete the underlying asset).
  • Re-index after adding new documents or changing the embedding model.
  • Preview any document inline with the built-in viewer.

Collection Details


Using Collections in Workflows

Collections shine in RAG pipelines:

  1. Add an IndexDocuments or HybridSearch node to your workflow.
  2. Connect the collection to its collection input β€” the node menu will suggest the selector.
  3. Run the workflow. The first run indexes the collection (embeddings + keyword index); subsequent runs reuse the index.

See the full pattern in the Cookbook β†’ RAG.


Indexing Options

By default a collection uses the embedding model set in Settings β†’ Default Models. Override per-collection from its settings:

  • Embedding model β€” any embedding model from HuggingFace, OpenAI, Gemini, Cohere.
  • Chunk size β€” tokens per passage (default: 512).
  • Overlap β€” tokens shared between chunks (default: 64).
  • Hybrid β€” enable BM25 alongside vector search for better recall on proper nouns.

Collection Settings

See Indexing for deeper tuning notes.


Storage

Index data is stored alongside your workflow database in SQLite (via sqlite-vec). Nothing leaves your machine unless you’ve opted into a cloud provider for the embedding model.

For multi-user deployments, Supabase-backed collections are an option β€” see Supabase Deployment.