Collections bundle related documents into a single searchable unit. Connect a collection to an index node and any file in it β PDFs, Markdown notes, HTML, transcripts β becomes queryable from your workflows.

Opening Collections
Navigate to Collections from the left sidebar, or go directly to /collections. The explorer shows every collection youβve created with counts of documents and the last time they were indexed.
Creating a Collection
- Click New Collection in the top-right.
- Give it a name and an optional description.
- (Optional) Associate a default embedding model.
- Drag documents into the collection β or import a folder from the Asset Explorer.
Collections accept any file type, but only text-extractable formats are indexed:
| Format | Extracted |
|---|---|
| Text + layout (per page) | |
| DOCX / DOC | Body text |
| Markdown / TXT | Raw text |
| HTML | Stripped body text |
| CSV / TSV | Rows as records |
| EPUB | Chapter text |
Unsupported formats are stored but not indexed β still handy for reference inside a collection.
Managing Documents
Click a collection tile to open its details. You can:
- Add documents by drag-and-drop or the Upload button.
- Remove documents from the collection (doesnβt delete the underlying asset).
- Re-index after adding new documents or changing the embedding model.
- Preview any document inline with the built-in viewer.
Using Collections in Workflows
Collections shine in RAG pipelines:
- Add an IndexDocuments or HybridSearch node to your workflow.
- Connect the collection to its
collectioninput β the node menu will suggest the selector. - Run the workflow. The first run indexes the collection (embeddings + keyword index); subsequent runs reuse the index.
See the full pattern in the Cookbook β RAG.
Indexing Options
By default a collection uses the embedding model set in Settings β Default Models. Override per-collection from its settings:
- Embedding model β any embedding model from HuggingFace, OpenAI, Gemini, Cohere.
- Chunk size β tokens per passage (default: 512).
- Overlap β tokens shared between chunks (default: 64).
- Hybrid β enable BM25 alongside vector search for better recall on proper nouns.
See Indexing for deeper tuning notes.
Storage
Index data is stored alongside your workflow database in SQLite (via sqlite-vec). Nothing leaves your machine unless youβve opted into a cloud provider for the embedding model.
For multi-user deployments, Supabase-backed collections are an option β see Supabase Deployment.
Related Docs
- Indexing β advanced chunking, hybrid search, maintenance
- Asset Management β add documents to the underlying asset library
- Cookbook β RAG β wire a collection into a workflow
- Chat with Docs example β end-to-end RAG workflow