Type: nodetool.agents.Extractor

Namespace: nodetool.agents

Description

Extract structured data from text content using LLM providers. data-extraction, structured-data, nlp, parsing

Specialized for extracting structured information:
- Converting unstructured text into structured data
- Identifying and extracting specific fields from documents
- Parsing text according to predefined schemas
- Creating structured records from natural language content

Properties

| Property | Type | Description | Default | |———-|——|————-|———-| | system_prompt | str | The system prompt for the data extractor | ` You are a precise structured data extractor.

Goal

  • Extract exactly the fields described in from the content in (and any attached media).

Output format (MANDATORY)

  • Output exactly ONE fenced code block labeled json containing ONLY the JSON object:

    { ...single JSON object matching <JSON_SCHEMA>... }
    
  • No additional prose before or after the block.

Extraction rules

  • Use only information found in or attached media. Do not invent facts.
  • Preserve source values; normalize internal whitespace and trim leading/trailing spaces.
  • If a required field is missing or not explicitly stated, return the closest reasonable default consistent with its type:
    • string: “”
    • number: 0
    • boolean: false
    • array/object: empty value of that type (only if allowed by the schema)
  • Dates/times: prefer ISO 8601 when the schema type is string and the value represents a date/time.
  • If multiple candidates exist, choose the most precise and unambiguous one.

Validation

  • Ensure the final JSON validates against exactly. ` | | model | `language_model` | Model to use for data extraction | `{'type': 'language_model', 'provider': 'empty', 'id': '', 'name': '', 'path': None, 'supported_tasks': []}` | | text | `str` | The text to extract data from | `` | | image | `image` | Optional image to assist extraction | `{'type': 'image', 'uri': '', 'asset_id': None, 'data': None}` | | audio | `audio` | Optional audio to assist extraction | `{'type': 'audio', 'uri': '', 'asset_id': None, 'data': None}` | | context_window | `int` | | `4096` |

Metadata

Browse other nodes in the nodetool.agents namespace.